New Tool to Pick Out Hidden Patterns in Vast Data Sets

Monday, 19 December 2011, 14:39 IST
Printer Print Email Email
Washington: Researchers have developed a statistical tool that can pick out unsuspected or hidden patterns in mountains of data in a way that no other software programme can. Part of a suite of statistical tools called MINE, it can tease out multiple patterns hidden in health information from around the globe, data on the changing bacterial landscape of the gut, and much more. MINE, which stands for 'Maximal Information-based Nonparametric Exploration', is able to analyze a broad spectrum of patterns. "There are massive data sets that we want to explore, and within them, there may be many relationships that we want to understand. The human eye is the best way to find these relationships, but these data sets are so vast that we can't do that. This toolkit gives us a way of mining the data to look for relationships," said Pardis Sabeti, senior study author and assistant professor at the Centre for Systems Biology, Harvard University. The researchers tested their analytical toolkit on several large data sets, including one provided by Harvard researcher Peter Turnbaugh who is interested in identifying the trillions of microorganisms that live in the gut. Working with Turnbaugh, the research team harnessed MINE to make more than 22 million comparisons and narrowed in on a few hundred patterns of interest that had not been observed before. "The goal of this statistic is to take data with a lot of different dimensions and many possible correlations and pick out the top ones," said Michael Mitzenmacher, senior study author and professor of computer science at Harvard University. Other statistical tools work well for searching for a specific pattern in a large data set, but cannot score and compare different kinds of possible relationships.
Source: IANS