point
Menu
Magazines
Browse by year:
July - 2003 - issue > Technology
Actionable Intelligence
Ajay Sravanapudi
Tuesday, July 1, 2003
Every enterprise is a source and sink of data. This data exists in databases, paper forms, email, web sites, etc. Organizations have long realized that more information usually leads to better decisions. This led them to collect as much data as possible in electronic form, which could then—in theory—be analyzed by computer software. The Gartner Group estimates that over 80% of all the data generated inside an enterprise is unstructured. However most of the decision-making is driven by structured data that can be “crunched”. How good are decisions when they are based on 15% of data? We make a distinction between “data” and “information”. Data is raw numbers, text, etc. Information is any meaning derived from this data.

Enterprises have a variety of motivations driving their desire to extract as much information as possible from their data. Here are a few examples:

Efficiency—a grocery chain like Wal-Mart needs to understand consumer behavior and sales patterns, reduce inventory, improve efficiency, and control costs.
Competitive Intelligence—a pharmaceutical giant like Eli Lilly helps its researchers to discover new drugs faster, monitors the competition and so on.
Introduce operational efficiencies—a Security Federal Agency like the Department of Homeland Defense discover s patterns in communications that enables it to preempt future terrorist attacks.

Whether the motivation is money or security, the approach can be boiled down to a three simply stated requirements—collect, analyze, and inform. A few definitions first: Structured Data: Any data that can be broken down into its piece parts (fields) and put back together. This is typically stored in databases such Oracle, SQLServer, DB2, Access, etc. Unstructured Data: Everything else—documents, emails, web pages, images, audio, video—all of which don’t have a clearly discernible structure. Software to store and manage this is often referred to as Content Management. This problem, which was acute, has worsened after the Internet was woven into the fabric of our lives. As we look briefly at the three steps, note a few common characteristics:

• Analysis of structured data is computationally more definite, whereas analysis of unstructured content is fuzzier. For example, one can speak of a the total number of widgets sold as a precise number, but one only speaks of the relevance of a document as percentage that can be computed differently depending on one’s philosophy.
• Different companies have sprung up to solve these two very different problems.
• The companies that focused on structured data only are now targeting the unstructured data analysis market as they look for growth.

Collect & Store
This refers to the problem of storing and managing all the data. For a very long time, this has been the traditional IT problem. Since then the simple ability to access data—like looking up an order quickly—was a significant improvement over the status quo. Here are some of they key players.
Structured Data—Oracle, IBM, Microsoft, Sybase, Software AG
Unstructured Data (documents)—Documentum, Interwoven, Vignette, Microsoft
Unstructured Data (web sites)—The Apache Group, Microsoft, IBM
This problem has been addressed fairly well.

Analyze
How do you make sense of all the data collected? This effort is usually referred to as “data mining.” Enterprises turn to data mining when they realize that quick access to their data has a ceiling on its value, and they need to glean further insights from it. Here, the solutions are wildly different. Let us first put some structure on this problem. There are two main activities in the analysis effort: Search: You know what you want. You just want to find it quickly; and Browse: You are not quite sure what’s there or what you want. So you would rather look around and see if you like something you find.

The solution to each of these tasks is different depending on the type of data involved.

Task      Structured             Unstructured
Search       Databases            Search Engines
Browse      Multi-dimensional   Categorization, entity
                  analysis               extraction, link analysis,
                                               concept,extraction,  visualization

This is a particularly vexing problem that has no clear and definitive solution. One can think of structured data as a pile of bricks. Unstructured data is the mud, before any bricks were made. This problem has been researched for a long time, and has resulted in many point solutions. Consequently, there are many solutions offered to make sense of unstructured data:

Categorization—catalog a document under one or more specific category or subject area; Entity Extraction—extract “entities” or things of interest from a document; Link Analysis—find links or connections between entities in a document or set of documents (“corpus”). For example, a software that will analyze document and find that person A called 555-1212 at 8:30a.m. on Tuesday; Concept Extraction—extract meaningful phrases from a document; and Visualization—show a graphical representation of the data. This is a great way to get a bird’s eye view of what your data is all about. While this sort of compartmentalization is fine for problem solving, people don’t operate this way. Consequently, software companies do not confine their development efforts to such neat boundaries. Notice that the giants such as Microsoft, Oracle, and IBM produce products that span all aspects of data storage, and analysis.

Notification
Having analyzed, the data how do you get it to the right person at the right time? Depending on the location, there are several options available.
Desktop—Email, Instant Messaging, Audio/Video push technologies
Mobile—Paging, Voice notification, Text Messaging
We live with most of the mechanisms right now. One of the most interesting channels is the plain old phone call. Most of the newer mechanisms require newer devices—cell phones, pagers, PDAs—and newer services: wireless Internet access plans. The problem—how to communicate and collect data from a human at the other end? Enter speech recognition and speech synthesis. Speech is the ultimate solution for quick response and notification because almost everyone has a cell phone. Mobilizing a large force is much cheaper since there is no need to arm them with new devices. Most importantly, it is built on the most reliable network of all—the telephone network. This is why most federal agencies are considering voice notification as the primary mechanism to mobilize first responders in the event of an emergency.

Putting it all together
As you can see, the technologies to provide actionable intelligence exist. So, why is it not built into every enterprise? First, each piece is highly complex in its own right. Technologists have a habit of engineering just enough complexity to make one system to the next difficult. Second, people in each segment of this industry speak a completely different language. The database people can barely understand the text-mining people, and the notification technology people have their own vocabulary. Finally, it all comes down to sharing data. Altruism, as Ayn Rand says, is not human nature. Sharing data is complex problem fraught with as many technical as political problems. There is hope. Enterprises have recognized that there is no other way but to connect the dots and extract every ounce of intelligence from their data. For the federal and state governments, it is now imperative to safeguard their citizens. So, they are making a new effort. Watch this space!

Twitter
Share on LinkedIn
facebook