The Smart Techie was renamed Siliconindia India Edition starting Feb 2012 to continue the nearly two decade track record of excellence of our US edition.

The ABC of Big Data

Gaurav Makkar
Technical Director-NetApp
Sunday, March 4, 2012
Gaurav Makkar
Headquartered in Sunnyvale, CA, NetApp (NASDAQ: NTAP), creates innovative storage and data management solutions that accelerate business breakthroughs and deliver outstanding cost efficiency. With a revenue of over $5 billion, the company has 150 plus offices across the globe with over 11,000 employees.

Last few years have been an era of nebulous buzz-words in our industry that have caught on like wild-fire. First it was "SaaS", then "Cloud” and now “Big Data", and not that the first two have gone away. It is also interesting to see how the latest kid-on-the-block (Big Data) plays in the context of the former ones. But first, let's set some context.

Big Data covers a number of different dimensions and means different things to different organizations. "Big Data" has been defined as something which is as simple as "ABC". The Big Data space has been segmented into Big Data Analytics, Big Bandwidth Applications and Big Content.

Big Bandwidth applications like media streaming, full motion video, and video editing are examples where the infrastructure needs to provide large bandwidth for data I/O. Due to the large number of end-devices onto which media can be streamed, each with its own rendering capability, the onus is on the streaming side of the infrastructure to stream the right media format. Thus, there are large number of formats into which media must be transcoded and streamed. Both these aspects are bandwidth intensive on the storage side. Simultaneous video editing too has similar needs in a digital production house, where multiple video frames are being accessed and modified.

Big Content is typically characterized by large amounts of data that once written are never modified. Immutable data of this nature occurs in the form of media objects (like pictures and videos), patient records (in medical diagnostics), seismic data (for oil exploration), call detail records (in the telecom industry) or simply click-stream data (in Internet companies). Some of this data will be read multiple times in the initial few days of its generation and will slowly become cold. Depending on retention policies of that dataset, this may be retained for a few years without being actively accessed. Storage solutions in this space need to provide reasonable access bandwidth but the focus is on providing online data storage at the lowest $/GB.

Share on Twitter
Share on LinkedIn
Share on facebook