The ABC Of Big Data
Gaurav Makkar is the Technical Director, Office of CTO, NetApp. He is instrumental in driving Big Data technical strategy within NetApp at a global level and focuses specifically on Hadoop in that context.
Prior to joining NetApp, Makkar worked on developing scalable infrastructure for telecommunications at Hughes, firmware for different verticals at Texas Instruments and developing high performance computing infrastructure at BARC, Mumbai. He has numerous awards and patents to his credit and has been awarded the Bhabha Award and a Gold Medal from the Prime Minister of India.
Last few years have been an era of nebulous buzz-words in our industry that have caught on like wild-fire. First it was “SaaS”, then “Cloud” and now “Big Data”, and not that the first two have gone away. It is also interesting to see how the latest kid-on-the-block (Big Data) plays in the context of the former ones. But first, let’s set some context.
Big Data covers a number of different dimensions and means different things to different organizations. “Big Data” has been defined as something which is as simple as “ABC”. The Big Data space has been segmented into Big Data Analytics, Big Bandwidth Applications and Big Content.
Big Bandwidth applications like media streaming, full motion video, and video editing are examples where the infrastructure needs to provide large bandwidth for data I/O. Due to the large number of end-devices onto which media can be streamed, each with its own rendering capability, the onus is on the streaming side of the infrastructure to stream the right media format. Thus, there are large number of formats into which media must be transcoded and streamed. Both these aspects are bandwidth intensive on the storage side. Simultaneous video editing too has similar needs in a digital production house, where multiple video frames are being accessed and modified.
Big Content is typically characterized by large amounts of data that once written are never modified. Immutable data of this nature occurs in the form of media objects (like pictures and videos), patient records (in medical diagnostics), seismic data (for oil exploration), call detail records (in the telecom industry) or simply click-stream data (in Internet companies). Some of this data will be read multiple times in the initial few days of its generation and will slowly become cold. Depending on retention policies of that dataset, this may be retained for a few years without being actively accessed. Storage solutions in this space need to provide reasonable access bandwidth but the focus is on providing online data storage at the lowest $/GB.
Analytics has traditionally been part of the Business Intelligence investments of an organization and that had been realized using traditional Data Warehouses. The technologies that allow enterprises to extract insights from this data have now evolved to a point such that the cost of producing a “unit of insight” per TB of raw data is now less than the value provided by that “unit of insight” to the enterprise. In other words, the ROI from processing enterprise data has now tipped in favor of processing, rather than throwing that data away. Parallel data processing technologies like Hadoop and NoSQL have allowed enterprises to march over this tipping point. This, in turn, has resulted in enterprises clamoring for more and they have started to retain almost all data that was earlier considered as “junk-drawer” data. Most of this data is either not actively being processed, or is being processed with very relaxed response-time SLAs. Thus, Big Data Analytics fuels the growth in demand of Big Data Content solutions.
Post your Comment
All form fields are required.
© 2013 SiliconIndia all rights reserved