July - 2015 - issue > VC talk

Is Big Data delivering Big Value ...Yet?

Dharmesh Thakker

General Partner-Battery Ventures

Friday, July 10, 2015

Forward

There has been tremendous buzz around "Big Data" fueled by the exploding volume, variety and velocity of data from mobile devices and social networks on one hand, and the ability to crunch this data using high density compute on the other hand. The impact of analyzing the data firehose could be very rewarding in many verticals. McKinsey estimates that over $300B in value can be extracted by the US health care system using big data, more than double the total annual health care spending of Spain, and $600B in consumer surplus can be potentially generated by using unstructured location data wisely. But, has big data delivered on the promise yet? Some early examples like Splunk and Palantir in the IT log analytics and fin-serv industry have emerged but they are few and far in between compared to the promise. It has now been half a decade since the talks around big data emerged - it seems that the technology building blocks needed are finally maturing, but we still need to address the talent gap via self-service analytics and converge the analytics and transaction tier to fully realize the data-driven business value.

Technology Building Blocks are maturing

Oracle, Teradata and others built an industry leading franchise over the last 20 years, generating over $50B in annual revenues managing "structured" relational data like customer purchases, inventory, user profiles and such. However, the vast troves of unstructured location data, log data from mobile app usage, conversations in social networks - often 10-50X the structured data volumes - need an architecturally different layer from the ground up. Google, Yahoo Facebook contributed key elements of their data management layer via open source projects Hadoop, Cassandra, MongoDB, Spark which have established the building blocks for data processing. Intel's $740M investment in Cloudera last year was a major milestone in endorsing Hadoop as being enterprise ready.

Self-Service Analytics Key to Addressing the Massive Talent Gap

There still exists a major talent gap in data scientists and data-savvy business analysts who can uncover the golden nuggets in vast troves of data. For instance, LinkedIn has over 300M professionals connected in an intricate graph, yet we are only scratching the surface with recruiting and ads based solutions. The ability to find indicators of US economic growth or optimizing career paths for aspiring student graduates exists in the LinkedIn data goldmine, but the data science talent gap could be an inhibitor. Data scientists also seem to be spending ï¿½ to 2/3rds of their time in cleansing and preparing the data for analysis which is major drag on their time. Solutions that automate the data collection, cleansing and handling schema drift could save significant resources and empower data scientists to focus on finding the golden nuggets in data. Also, as we move from data at rest in Hadoop to handling streaming data, a new class of tools to cleanse and prepare the data streams could empower analysts to detect new insights in near real-time.