Browse by year:
September - 2013 - issue > CEO Spotlight
The Four C's for the "Right Data"
Prakash Nanduri
Thursday, September 5, 2013
Paxata, a stealth mode startup headquartered in Redwood City, CA, builds enterprise software by leveraging social and consumer technologies in the space of big data.

Right data is the data that helps answer your business questions. Business teams should start by making sure they know what sorts of questions they want to ask. For example, "is there a segment of my customer base that is spending more with my competitors?" With that in mind, you can start to zoom in on the right data.

As you do, I would suggest that you ensure your data be COMPLETE, CONTEXTUAL, CONSUMABLE, and CLEAN. Here is a quick explanation of what I mean:

Complete: this means you have all the data required to answer your question. Getting back to our example, we would need to combine our customer-spend data (how much do they buy from us) with third party data about what customers spend overall (what is their total wallet share). With this comprehensive view, we can determine how much of the market we are truly capturing.

Contextual: This is the ability to drill into the data as you explore different ways to answer the question. In our example, we may start with a customer segmentation exercise to identify which audience is spending the most on our products. Once we discover that is high net-worth individuals, we move into a customer targeting exercise, which requires us to bring in demographic data to give us additional properties for targeting that audience.

Consumable: This means making the data available in whatever tool the business person wants to use. Data should be delivered in a flexible format that can be brought into everything from standard Excel spreadsheets to applications used for ad-hoc analysis and visualization.

Clean: There are two elements to address here: syntactically and semantically clean data. Syntactically clean data ensures that certain rules are applied consistently to the data (i.e., acronyms for states, use of middle initial versus middle name to name a few), where semantically clean data ensures that information is accurately represented (i.e., a city in the data set is actually a city). In order to keep data clean, these elements require ongoing collaboration, not only among the business users but between the business teams and IT.

As you think about tacking your big data challenges, start with the questions you want to ask and consider the Four C's as a good guide for what you should have in place.

Share on LinkedIn