HP - IIT Bombay developing web search engine

By siliconindia   |   Thursday, 27 August 2009, 19:23 IST
Printer Print Email Email
Bangalore: Hewlett Packard (HP) Labs and IIT Bombay are working together on developing a search engine which can provide relevant information on the searched queries, within a short period of time, reports The Business Standard. The Computer Science Department of the IIT-B is among the few institutes around the world to receive grants which the HP Labs had initiated last year. The team along with Professor Soumen Chakrabarti at IIT-B used this grant to work on a new search engine which will take measures to crawl the web to provide relevant answers to queries. The team has already created billions of annotation links between a 500 million web page corpus and millions of entities known to Wikipedia. The data is being made on 42 high-end HP servers with over 350 gigabytes of RAM and over 150 terabytes of disks, which are donated by Yahoo. HP Labs and Microsoft Research have provided additional research funding. The initial results have shown exciting results, as Sayali Kulkarni, a student working on the project says, "The search for quantity queries get answered in 2-5 seconds." The search engine will even allow searching for entities like "how old is Feng Shui?", and the number of AIDS affected people in the world, adds Prof Chakrabarti. The search engine is designed to understand more queries and respond with information nuggets and tables, not just the links of the pages, making it different from the other search engines. Queries like "length of the Nile River" or "maximum speed of a Mercedez Benz SLR McLaren" would be answered using encyclopedia sources like Wikipedia, but in many cases the queries are not appropriate and will need the support from unstructured web text like news and blogs. The system can aggregate, for each query, tens of thousands of snippets into quantitative answers. To be successful, a search engine needs a robust mechanism that indexes web pages, as there are millions of pages on the internet at a time. Google has over eight billion pages indexed and over 1.1 billion images. Annotation is the backbone in the case of HP-IIT-B engine, indexing of annotations alongside ordinary text, and supporting a query language that can combine categories, annotations, quantities and regular text in creative ways, typically ending with evidence aggregation. "The key to moving up in the search value chain is to add semi-structured knowledge to the unstructured corpus, in the form of type, entity, category and relationship annotations, to index these annotations along with the text, and open up search application programming interfaces (APIs) and query languages to probe these indices and aggregate the resulting knowledge," says Prof. Chakrabarti. He adds that most of the popular search engines offer little or no support for at least two important kinds of queries: "For example, you cannot ask for a table of actors and the number of academy awards they won. Typing in 'actor number academy awards' is a shot in the dark, as the existing players do not expose to you any catalogue of actors that they know about, and let you implicitly expand actor into each known instance of that category," says Chakrabarti. Also, the existing engines are not very good with letting people question and manipulate physical quantities, says Chakrabarti. "Sure, you can go to an e-commerce vertical and ask for digital SLRs (cameras) priced between $700 and $1,010, but you won't be that successful asking a generic search engine for a laptop with battery life between four and six hours, or the typical driving time between Stuttgart and Mainz," he said. Analysts does not seems to be convinced, as Asheesh Raina, Principal Research Analyst, Gartner, opines that it will not make a difference even if it is launched for the mass. "First I would like to see the system. But this would just be an incremental enhancement to the already existing platforms. Even if you think that this might be useful for enterprises, there are very few who would want this and that also in selected departments," said Raina. The team plans to release the search engine to their key partners, which include several universities by the end of the year. The initial target set is to handle thousands of queries per day, which is very less from the hundreds of millions of queries processed by big search engines like Google, Yahoo and now Bing.