point
Menu
Magazines
Browse by year:
July - 2013 - issue > View Point
Big Data and Deep Learning , The next big oppurtunities?
Nandu Nandakumar
Founder & CEO-Razorthink
Monday, July 1, 2013
Razorthink, founded in 2008 and headquartered in Atlanta, is an outsourced software product development shop with a high R&D and analytics content focus.

Each day we create over 2.5 Million Terabytes of data, with over 90 percent created in the last two years alone. Among these are climate information, social media data, pictures and videos, purchase records, and our browsing behavior. The hiring analytics site iCrunchData estimates about 600,000 jobs currently advertised in this space, with numerous projections pointing to over 4 million open positions by 2015. Is this a fad, a genuine exponential trend, or a natural evolution of the analytics space? Should you, as an individual or as a business, rush to take advantage of this trend?

It is useful to understand two underlying fundamental technology trends before making this determination. The first is the MapReduce/Hadoop set of technologies, built to analyze massive data sets in parallel. The key bottleneck this addressed was the transfer rates between data storage and the processing power. This led to two critical innovations - the first where data is stored in a distributed fashion on commodity hardware (the Google File System/Hadoop File System). The second related innovation was to split up the processing so as to colocate the processing and data (MapReduce/Hadoop). The significant applications of this technology have been in search, indexing, and pre-defined trend analysis in large data sets.

The second stream, Deep Learning has been in the works for sometime, but recently shot to prominence evidenced by investments from Google, and appearance in MIT Tech Reviews Top 10 breakthroughs. Deep Learning uses a multi-layered learning model (typically neural networks), training or adjusting one layer at a time. This technique holds promise for analyzing the massive amounts of image and video data collected on a daily basis. For example, the Google Brain team was able to programatically recognize higher-level concepts, such as cats, from unlabeled images extracts of YouTube videos.

This trend in data analysis, therefore, is quite different from the existing paradigm of relational databases with well defined semantic layers and periodic reporting. Some of the most successful organizations are highly measurement driven - and find high RoI in successful rapid experimentation and analysis of resulting data. A successful technology professional has to be good at (a) programming and algorithms, ability to write data extraction and manipulation routines (b) identification of hypotheses to be tested (c) analyzing the resulting data, the use of mathematical and statistical methods. Welcome to the world of the Programmer Scientist.

As a professional who wants to specialize in data analytics on this scale, expect your daily work being answering questions such as (a) At a retail chain, estimate the effectiveness of an in-store promotion - you'll have to extract relevant data, identify control store locations, and estimate changes in buyer behavior (b) At a media outlet analyze the behavior of people passing by an ad, captured on video - use technology to measure attention span and identify demographics of passers by, and predict ad effectiveness (c) At a tech startup, analyze the effectiveness of a social media campaign in real time - process streaming data, and identify key characteristics of the responses to track a pre-defined metric in real time.

For a business, this is an area rife with opportunities for outsourced work - however, these will be strategic relationships where it will be important to understand your client's business thoroughly. And as a final note, as an individual or a business, it is the experience with the data that is more valuable than learning the tools themselves. Given a data set or stream in a domain, can you identify meaningful trends or answer specific questions about the data?

Twitter
Share on LinkedIn
facebook