Big Data Market Review and Call for Solutions Innovation

Date:   Tuesday , October 21, 2014

Edison, NJ based Maestro, is a global full-service Big Data integration, application management, data analytic and consulting firm.

\"Big Data\" is a revolution that has already started transforming and mobilizing our thoughts by strength of Data Science, trends and patterns helping eliminate ambiguity, catching frauds and steering the businesses and consumers towards efficient marketplace. It helps analyze data via visual portrayal of information towards wise decision-making. It helps transform data into information, knowledge and experience. Using Big Data ecosystem and a right strategy, proper processes, resources and tools, a company can achieve a very high growth-rate.

Big Data Bang

There are 4.6 billion mobile-phone subscriptions worldwide and there are around 2 billion people accessing the internet. The world\'s effective capacity to exchange information through telecommunication networks was 281 petabytes(1015) in 1986, 471 petabytes in 1993, 2.2 exabytes(1018) in 2000, 65 exabytes in 2007 and it is predicted that traffic flowing over the internet will reach 667 exabytes annually in 2014. It is estimated that one third of the globally stored information is in the form of alphanumeric text and still image data, which is the format most useful for most Big Data applications.

Moore\'s law has held firm that processing power doubles every two years. Processing speed is no longer a problem; however, getting the data to the processors becomes a bottleneck. Innovations are needed for higher data transfer rates for local processing, so an improved data size can be handled over a distributed environment. Or maybe complete new computer architecture is needed that can handle machine learning at the data storage component level as an extension of the memory types.

Big Data Analytics Science and Technology

Business Intelligence uses descriptive statistics with data with high information density to measure data, detect trends and patterns;
Big Data uses inductive statistics and concepts from non-linear system identification to infer laws (regressions, nonlinear relationships, and causal effects) from large data sets to reveal relationships, dependencies, and to perform predictions of outcomes and behaviors.
Research on the effective usage of information and communication technologies for development suggests that Big Data technology can make important contributions but also present unique challenges. Advancements in Big Data analysis offer cost-effective opportunities to improve decision-making in critical development areas; however, longstanding challenges such as inadequate technological infrastructure, economic and human resource scarcity exacerbate existing concerns with Big Data such as privacy, imperfect methodology, and interoperability issues.
Big Data requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times. Suitable technologies include A/B testing, crowd-sourcing, data fusion and integration, genetic algorithms, machine learning, natural language processing, signal processing, simulation, time series analysis and visualization (McKinsey-2011). Multidimensional Big Data can be represented as tensors, which can be more efficiently handled by tensor-based computation, such as multi-linear subspace learning. Additional technologies applied to Big Data include massively parallel-processing (MPP) databases, search-based applications, data-mining grids, distributed file systems, distributed databases and cloud based infrastructure (applications, storage and computing resources).
The practitioners of Big Data analytic processes are generally hostile to slower shared storage, preferring Direct-Attached Storage (DAS) in its various forms from Solid State Drive (SSD) to high capacity SATA disk buried inside parallel processing nodes. The perception of shared storage architecture-SAN and NAS is that they are relatively slow, complex, and expensive. These qualities are not consistent with Big Data analytics systems that thrive on system performance, commodity infrastructure, and low cost.

Real or near real-time information delivery is one of the defining characteristics of Big Data analytics. Latency is therefore avoided whenever and wherever possible. Data in memory is good-data on spinning disk at the other end of SAN connection is not. The cost of SAN at the scale needed for analytics applications is very much higher than other storage techniques.
There are advantages as well as disadvantages to shared storage in Big Data analytics, but Big Data analytics practitioners do not favor it.

Opportunities and Use-Cases

Although Big Data represents a formidable challenge to those wishing to manage it; it also offers an incredible opportunity for the business to create value. There are so many things that organizations want or need. To do that would be possible if they could successfully tap into the value their data holds.

Airlines can provide flight-prices based on zip-code searches and user patterns.
Retailers can create more targeted offers to customers - Walmart handles more than 1 million customer transactions every hour, which are imported into databases estimated to contain more than 2.5 petabytes of data-the equivalent of 167 times the information contained in all the books in the US Library of Congress.
Amazon.com handles millions of back-end transactions and queries every day.
Broadcasters can deliver customized content to the viewers-Netflix.
Human genome took 10 years to process, and with Big Data technology ecosystem it can be achieved in less than a week.
Big Data can help solve crimes-watch Minority Report movie.

Solutions and Challenges

Many firms are Hadoop specific, providing tools to manage Hadoop clusters, integrate and model data to perform analytics with visualizations. Hadoop is only part of the complete Big Data solutions. The need drives the solution-type, where data management consulting is growing in demand. There are several articles stacking providers with their competitive edge.
Hadoop Solution providers have already crowded the market because the businesses have started remodeling their strategies around data. The trouble is that these providers want to capitalize on the trend by making availability quite expensive. Personally, I know of several niche vendors who train resources in their labs (including fresh graduates) and alter candidate\'s resume to get higher bill-rate(s). I want to alert buyers to be cautious. Yes, the technology has several complex components to it and implementation can be challenging; but know that there are solutions out there and there are genuine Solution providers like Amazon Web Services, Cloudera and Maestro Technologies.
Making a decision to gather which provider will be best suitable for your business; Forrester made it easy and grouped them in three high-level buckets -
Current offering-setup, management, monitoring tools, compatibility and data processing features including workload optimization. Does the vendor provide cross-domain solution or is product specific?
Strategy-how a vendor meets customer demands and fills gaps for deployments, do they have the ability to execute on your strategy?
Market Presence-how long has the vendor been contributing to the Big Data ecosystem, their global presence, management background, strategic partnerships, their finances and history. Are they willing to provide customer references?
Some of these factors are so relevant in decision-making.

Call for Innovation

Encrypted search with security enhancements & cluster formation in Big Data is an aspect that the enterprise businesses are looking for and innovations are needed with focus on the data security with encoding techniques towards an expedited search.
In March 2012, The White House announced a national \"Big Data Initiative\" that consisted of six Federal departments and agencies committing more than $200 million to Big Data research projects.
In order to make manufacturing more competitive in the United States (and globally), there is a need to integrate more American ingenuity and innovation into manufacturing ; Therefore, NSF has offered grants to focus on developing Big Data advanced predictive tools and techniques.