Browse by year:
November - 2014 - issue > CXO View Point
Wither Hadoop?
M.C. Srivas
CTO & Founder-MapR Technologies Inc.
Tuesday, December 2, 2014
Hadoop came out of technology pioneered by Google, and this technology was used to index the World Wide Web.Since then, Hadoophas emerged as the platform of choice forenterprise-wide data processing. When I was at Google, I had the privilege of working at the center of such big data technology, and saw the potential it had to transform how businesses gain value from their data. Hadoop allows businesses to take advantage of their data to gain intelligence about their customers, optimize revenues, mitigate risks, track every little bit of information, and achieve all of this at a fraction of what it would cost with traditional database technology.

Some examples are worth mentioning. The largest cellphone manufacturer in the world uses Hadoop to track how well their phones are performing (e.g., what functionality is used, or how often the battery is recharged). The largest cancer center in the world uses Hadoop to understand which genes are indicated in which ailments. Farmers can buy insurance against bad weather ruining their crops from insurers that use Hadoop to forecast weather right down to a 4 square kilometer area anywhere in the United States. The Indian government uses biometric information served by Hadoop to provide identities and proof-of-residence for the 1.3 billion residents in India.

Why does Hadoop win? As Google has famously shownin the paper, "The Unreasonable Effectiveness of Data,"simple models and a lot of data trump more elaborate models based on less data. The key to Hadoop's success is that fact that it was designed to scale to tens of petabytes of storage.

There are manylessons that we can learn from Google about architectural advances, and here I present three.

First of all, you must start with a top-notch storage engine capable of holding all the data’you need a system that can hold an unlimited number of files, so you can connect as many data sources as possible todirectly deposit and update data continuously without fear of running out of files. Without the ability to continuously update, you are limiting yourself to batchprocessing.
Secondly, you need a top-notch database engine capable of indexing all of the data you just collected: a database that can handle BOTH semi-structured and multi-media data. You need a database that can run both online and analytics workloads on the same copy. Why? In a world that is increasingly demanding real time insights, there just isn't time to ETL the data to another system’there's just too much data and it's arriving too fast.
Thirdly, you must have a top-notch query system that's capable of searching it all.You need a system that fully supports ANSI SQL running directly on the Hadoop data. You needa query system that's capable of handling semi-structured, nested data structures, with imperfect and ambiguous schemas; one that can adjust to schema changes as the data flows in. You need a query system that can directly query the raw data, in place, without transformations or ETL, and without requiring a DBA's assistance.

One reason Hadoop is thriving is its passionate community. But it was clear to me that for Hadoop to become the foundation of running a business, it had to meet and exceed standards of quality and reliability set by other enterprise software. From the very outset when MapR was founded, we decided to be part of the community, to innovate and move Hadoop forward by developing a solid framework that simplifies the deployment of Hadoop in enterprise environments’and that takes into account the three lessons noted above that we learned from Google. When there's that much data involved, ultra-strong security is paramount, with full audit, tracking and alerting builtin. The Hadoop distribution that combines architectural advances with open source innovations is the one that will truly have an advantage over other distributions. The future of Hadoop really is all about easier integration and real-time access to new kinds of data, which will fundamentally change how businesses will operate.

About the author
Srivas ran one of the major search infrastructure teams at Google where GFS, BigTable and MapReduce were used extensively. He wanted to provide that powerful capability to everyone, and started MapR on his vision to build the next-generation platform for semi-structured big data. His strategy was to evolve Hadoop and bring simplicity of use, extreme speed and complete reliability to Hadoop users everywhere, and make it seamlessly easy for enterprises to use this powerful new way to get deep insights. That vision is shared by all at MapR. Srivas brings to MapR his experiences at Google, Spinnaker Networks, Transarc in building game-changing products that advance the state of the art.

Share on LinkedIn