Skip to main content

50 Top Open Source Tools for Big Data

Hadoop, NoSQL databases, development tools and many more open source big data projects.
(Page 1 of 3)

Whenever analysts or journalists assemble lists of the top trends for this year, "big data" is almost certain to be on the list. While the catchphrase is fairly new, in one sense, big data isn't really a new concept. Computers have always worked with large and growing sets of data, and we've had databases and data warehouses for years.
What is new is how much bigger that data is, how quickly it is growing and how complicated it is. Enterprises understand that the data in their systems represents a gold mine of insights that could help them improve their processes and their performance. But they need tools that will allow them to collect and analyze that data.
Not surprisingly, the big data market is growing very quickly in response to the growing demand from enterprises. According to IDC, the market for big data products and services was worth $3.2 billion in 2010, and they predict the market will grow to hit $16.9 billion by 2015. That's a 39.4 percent annual growth rate, which is seven times higher than the growth rate IDC expects for the IT market as a whole.
Interestingly, many of the best and best known big data tools available are open source projects. The very best known of these is Hadoop, which is spawning an entire industry of related services and products. This month, we're profiling Hadoop, as well as 49 other big data projects. Here you'll find a lot of Apache projects related to Hadoop, as well as open source NoSQL databases, business intelligence tools, development tools and much more.
If we've overlooked any important open source big data tools, please feel free to note them in the comments section below.

Big Data Analysis Platforms and Tools

1. Hadoop
You simply can't talk about big data without mentioning Hadoop. The Apache distributed data processing software is so pervasive that often the terms "Hadoop" and "big data" are used synonymously. The Apache Foundation also sponsors a number of related projects that extend the capabilities of Hadoop, and many of them are mentioned below. In addition, numerous vendors offer supported versions of Hadoop and related technologies. Operating System: Windows, Linux, OS X.
2. MapReduce
Originally developed by Google, the MapReduce website describe it as "a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes." It's used by Hadoop, as well as many other data processing applications. Operating System: OS Independent.
3. GridGain
GridGrain offers an alternative to Hadoop's MapReduce that is compatible with the Hadoop Distributed File System. It offers in-memory processing for fast analysis of real-time data. You can download the open source version from GitHub or purchase a commercially supported version from the link above. Operating System: Windows, Linux, OS X.
4. HPCC
Developed by LexisNexis Risk Solutions, HPCC is short for "high performance computing cluster." It claims to offer superior performance to Hadoop. Both free community versions and paid enterprise versions are available. Operating System: Linux.
5. Storm
Now owned by Twitter, Storm offers distributed real-time computation capabilities and is often described as the "Hadoop of realtime." It's highly scalable, robust, fault-tolerant and works with nearly all programming languages. Operating System: Linux.

Databases/Data Warehouses

6. Cassandra
Originally developed by Facebook, this NoSQL database is now managed by the Apache Foundation. It's used by many organizations with large, active datasets, including Netflix, Twitter, Urban Airship, Constant Contact, Reddit, Cisco and Digg. Commercial support and services are available through third-party vendors. Operating System: OS Independent.
7. HBase
Another Apache project, HBase is the non-relational data store for Hadoop. Features include linear and modular scalability, strictly consistent reads and writes, automatic failover support and much more. Operating System: OS Independent.
8. MongoDB
MongoDB was designed to support humongous databases. It's a NoSQL database with document-oriented storage, full index support, replication and high availability, and more. Commercial support is available through 10gen. Operating system: Windows, Linux, OS X, Solaris.
9. Neo4j
The "world’s leading graph database," Neo4j boasts performance improvements up to 1000x or more versus relational databases. Interested organizations can purchase advanced or enterprise versions from Neo Technology. Operating System: Windows, Linux.
10. CouchDB
Designed for the Web, CouchDB stores data in JSON documents that you can access via the Web or or query using JavaScript. It offers distributed scaling with fault-tolerant storage. Operating system: Windows, Linux, OS X, Android.
11. OrientDB
This NoSQL database can store up to 150,000 documents per second and can load graphs in just milliseconds. It combines the flexibility of document databases with the power of graph databases, while supporting features such as ACID transactions, fast indexes, native and SQL queries, and JSON import and export. Operating system: OS Independent.
12. Terrastore
Based on Terracotta, Terrastore boasts "advanced scalability and elasticity features without sacrificing consistency." It supports custom data partitioning, event processing, push-down predicates, range queries, map/reduce querying and processing and server-side update functions. Operating System: OS Independent.
13. FlockDB
Best known as Twitter's database, FlockDB was designed to store social graphs (i.e., who is following whom and who is blocking whom). It offers horizontal scaling and very fast reads and writes. Operating System: OS Independent.
14. Hibari
Used by many telecom companies, Hibari is a key-value, big data store with strong consistency, high availability and fast performance. Support is available through Gemini Mobile. Operating System: OS Independent.
15. Riak
Riak humbly claims to be "the most powerful open-source, distributed database you'll ever put into production." Users include Comcast, Yammer, Voxer, Boeing, SEOMoz, Joyent, Kiip.me, DotCloud, Formspring, the Danish Government and many others. Operating System: Linux, OS X.
16. Hypertable
This NoSQL database offers efficiency and fast performance that result in cost savings versus similar databases. The code is 100 percent open source, but paid support is available. Operating System

Comments

Popular posts from this blog

The Best Web Hosting Services

Are you looking for the best web  hosting  services for your needs? Whether you need a place to host your small personal blog or a major corporate website, the following list will help you identify the best hosts to use. Finding the best web hosting service isn’t quite as straightforward as searching Google and choosing the one with the lowest price. There are a lot of issues to consider, including the reasons for  why  you need hosting and  how  you intend to use it. Once you have a handle on that, finding the right host becomes much easier. Choose one that’s undersized and you’ll end up with website outages and slow page loads, but choose one that’s oversized and you’ll be throwing money away. Defining Your Web Hosting Needs Before choosing your web host, you’ll need to think about your requirements. Consider the following concerns and decide the importance of each item on a scale of 0 to 10 (with 0 being not at all important and 10 being critically important): Speed  — H

Google Photos can now stabilize all your shaky phone camera video

G oogle Photos is where all my photos are. Long ago I was a man of SmugMug, and then Flickr, and then at some point spent days and days copying years of images to iCloud Photo Library before eventually disregarding that and switching to Google. What can I say? I’m a simple person who can be easily delighted and swayed by automatic GIF creation and reliable backups. And Google Photos keeps getting better. Here’s the latest example: now the mobile app can automatically stabilize videos in your camera roll with a tap. A lot of flagship smartphones offer optical image stabilization when shooting video, a hardware feature that helps keep footage smooth. Others, like Google’s Pixel, use software to try and stabilize jerky movements. Putting stabilization inside the Google Photos app could enhance results further if you’re already working with hardware OIS, or improve recordings significantly if your phone lacks any means of steadying things out of the box. The stabilized video is croppe