Is Big Data delivering Big Value ...Yet?





Is Big Data delivering Big Value ...Yet? Date: Friday , July 10, 2015 There has been tremendous buzz around \"Big Data\" fueled by the exploding volume, variety and velocity of data from mobile devices and social networks on one hand, and the ability to crunch this data using high density compute on the other hand. The impact of analyzing the data firehose could be very rewarding in many verticals. McKinsey estimates that over $300B in value can be extracted by the US health care system using big data, more than double the total annual health care spending of Spain, and $600B in consumer surplus can be potentially generated by using unstructured location data wisely. But, has big data delivered on the promise yet? Some early examples like Splunk and Palantir in the IT log analytics and fin-serv industry have emerged but they are few and far in between compared to the promise. It has now been half a decade since the talks around big data emerged - it seems that the technology building blocks needed are finally maturing, but we still need to address the talent gap via self-service analytics and converge the analytics and transaction tier to fully realize the data-driven business value. Technology Building Blocks are maturing Oracle, Teradata and others built an industry leading franchise over the last 20 years, generating over $50B in annual revenues managing \"structured\" relational data like customer purchases, inventory, user profiles and such. However, the vast troves of unstructured location data, log data from mobile app usage, conversations in social networks - often 10-50X the structured data volumes - need an architecturally different layer from the ground up. Google, Yahoo Facebook contributed key elements of their data management layer via open source projects Hadoop, Cassandra, MongoDB, Spark which have established the building blocks for data processing. Intel\'s $740M investment in Cloudera last year was a major milestone in endorsing Hadoop as being enterprise ready. Self-Service Analytics Key to Addressing the Massive Talent Gap There still exists a major talent gap in data scientists and data-savvy business analysts who can uncover the golden nuggets in vast troves of data. For instance, LinkedIn has over 300M professionals connected in an intricate graph, yet we are only scratching the surface with recruiting and ads based solutions. The ability to find indicators of US economic growth or optimizing career paths for aspiring student graduates exists in the LinkedIn data goldmine, but the data science talent gap could be an inhibitor. Data scientists also seem to be spending � to 2/3rds of their time in cleansing and preparing the data for analysis which is major drag on their time. Solutions that automate the data collection, cleansing and handling schema drift could save significant resources and empower data scientists to focus on finding the golden nuggets in data. Also, as we move from data at rest in Hadoop to handling streaming data, a new class of tools to cleanse and prepare the data streams could empower analysts to detect new insights in near real-time. Convergence of Transactional and Analytical Stack Unlocks Significant Value A major shift in attitudes and perhaps org structures may be necessary for data practitioners to collaborate with business decision makers to exploit the full value of big data analytics. Organizations have been trained to run business workflows and transactions separately and periodically collect the data in a separate analytical tier to do customer segmentation, or discover claims fraud for instance. However, the promise of big data lies in the convergence of the transactional and analytical stack. For instance, I sit on the board of Reflektion, a company founded by the ex-Google AdSense team. Reflektion analyzes 125M users in real-time, while processing their purchase transactions, to personalize their entire journey through the site in the same way Google personalizes ads based on real-time click patterns. But that requires collaboration between the Marketing / ecommerce teams, Business Analysts, and Data Scientists so that personalization can be applied during the short user attention span, not off-line and a week later. Similarly, data science is the new frontier in cyber-security whereby user activity streams are analyzed in real-time rather than detecting \"signatures\" later to detect anomalous behavior. Digitization in Indian Market Represents a Massive Untapped Market for Big Data In data science lingo, the higher the volume and frequency of user data, the deeper the machine learning (training models without prior knowledge) and the richer the analytics. By that measure, 650M users on mobile phones by 2020, driving online commerce penetration from 4% to 25% and $220B in online spend by 2030 (Goldman Sachs 2015 report) represents a goldmine of fresh data. Analyzing this data using a maturing tech stack, and emerging self-service tools will unleash a major opportunity for data scientists. Ultimately, as the next billion online users in emerging economies like India engage in e-commerce, travel, payments, eHealth and finance, big data analytics represent potentially groundbreaking services in optimizing and personalizing the user journey throughout their online experience. At Battery Ventures, we strive to find and support these emerging vertical data-driven apps as well as the horizontal self-service building blocks that power the apps.