BIG DATA: Making Sense in Real Time
Date: Wednesday , September 01, 2010
What if we found a ‘needle’ in a hay stack that is ten times the size of planet Jupiter, in less than a second? Think about it. Time is money. Information from various sources is increasing at a mind-boggling rate begging to be analyzed correctly and made sense of. Traditional way of running business is passé. To stay ahead of competition, you require tools that will handle these large volumes of data in real-time helping you take informed business decisions. You need to know the status as-is as of ‘now’. Not as of July 31, 2010.
Big Data. Slice n’ Dice!
‘Big Data’ is the buzz word from the high performance computing niche of the IT market. Big Data is suddenly the focus of most presentation from suppliers of processing virtualization and storage virtualization software. But what is Big Data?
According to a recent report from McKinsey, data are flooding in at rates never seen before—doubling every 18 months— this is all because of greater access to customer data from public, proprietary, and purchased sources, as well as new information gathered from Web communities and newly deployed smart assets. These trends are broadly known as ‘Big Data’. But if I were to explain this more simplistically, then any data is big when one has to really sit and take decisions on how to organize it, manage it and most importantly analyze it to get some desired results. In other words, the phrase refers to the tools, processes and procedures allowing any organization to create, manipulate, and manage very large data of the size of many gigabytes to terabyte, petabyte or even larger collections of data.
Data accumulated and not managed can pose a great problem. There are many reasons for data to grow. From government regulation needs where in data is stored for future reference to data needed for critical research and analysis for example in sectors like health/pharma, energy or weather environment. While on the other hand big data can really provide a huge competitive edge for those companies who can manage it, analyze it and use it for optimizing operations or any other beneficial task. There are examples cited by big companies like Google who clearly demonstrate how one can have an edge over its competitors simply by analyzing the data that gives you information on the ground you are operating.
Today Big Data management stands out as one point challenge for IT companies and increasingly the solution is moving from providing hardware to more manageable software solutions. But before actually talking about the solutions, let’s take a step back and understand where all this extra data is coming from. Well, the sources are many; the web itself is one source which is giving out a lot of valuable and critical information that needs to be analyzed and therefore stored as data. Facebook, in just over two short years, has quintupled in size to a network that touches more than 500 million users. Twitter, since its creation in 2006 has grown to over 100 million users worldwide and is now attracting 190 million visitors per month and generating 65 million tweets a day. Computing and sensor networks itself is becoming more dynamic and needs extra data. People are using more than before data to predict behavioral pattern and predict performances. Obviously the more analytical we are becoming in our approach, the more data we require to analyze.
Terabyte Trends in Technology
There is an exponential increase in the raw computing power that is available at our disposal today. The last decade especially has seen a drastic increase in the processing power which hit a road block about seven years back when the clock speed touched 3 gigahertz and could not run any faster. This brought out the need for multi core processing which is not only faster but highly reliable too. As a result, today, you can find a 32 core processor with half terabyte of main memory and 2 terabytes of solid state disk at our disposal at a highly affordable price. This kind of computing power was unthinkable in the 90’s.
With this kind of computing power at our disposal, software product development companies are now looking at way to harness this power and rewrite software that can process data tens of thousands of times faster.
We at SAP Labs have been quick to take on this challenge and our engineers are already involved in rewriting codes that will revolutionize the way big data is analyzed and processed and all this in real time.
A disruption called ‘In-Memory’
As mentioned earlier, one of the highest priorities for organizations of any size and across any industry is managing and analyzing the soaring quantity of data, and harnessing that information to improve their business. The world’s major oil companies, major governments, educational institutions, pharma companies, banks, internet portals deal with millions of transactions daily and are struggling to analyze and use this data which normally takes weeks and months. SAP has always understood this and has addressed this challenge and is currently developing in-memory solutions which will allow our customers (to cite an example, large enterprises in the FMCG space who have data of more than 5 - 6 terabytes) to explore business data at the speed of thought.
In-memory computing is considered as one of the largest disruptions of the 21st century that will enable business users to instantaneously access, explore, model and analyze transactional, analytical and Web-based data in real-time in a single environment, without impacting the data warehouse or other systems.
For example, a utility company employee could look at usage data to identify patterns over a particular region or time period, and then analyze that data comparatively enabling real-time planning based on immediate access to usage data. Employees at a manufacturing company could analyze asset utilization in real-time on transactional data, while a financial services firm could perform real-time risk management and measure market exposure by combining structured credit scoring data with unstructured data, including information from the Internet.
According to Information Technology Research major, Gartner, in-memory analytics is an emerging technology that will drive mainstream business intelligence, making it optimal and that will change the scenario of how people make more informed decisions while interacting with data.
To power the next generation of business intelligence, business planning and business analytic applications, SAP is now working with leading hardware partners to deliver an in-memory software and hardware appliance optimized for real-time analytics using data from operational systems, data warehouses, real-time events and
Concluding the future
The dynamics of handling data is only getting more complex. The appetite for getting real-time information is increasing at an unimaginable rate and catering this need is going to be exciting and challenging. I believe this is a great opportunity for all of us to rethink our approach to making sense of the ever-increasing amount of information available, no matter where it comes from. The world is waiting for the next ‘iPod moment’.
It’s time we gave the real world a real deal. Real-time products and applications. The only thing that will make good business sense. And this requires a real-time mindset shift. It’s possible!
The author is Managing Director, SAP Labs India