DARPA searches for Search

DARPA wants to totally revamp internet search and “revolutionise the discovery, organisation and presentation of search results”.

Dubbed the Memex programme, DARPA wants to develop software that will enable domain-specific indexing of public web content and domain-specific search capabilities. This will provide content discovery, information extraction, information retrieval, user collaboration, and other areas needed to address distributed aggregation, analysis, and presentation of web content.

The other big idea is to produce search results that are more immediately useful to specific domains and tasks, and to improve the ability of military, government and commercial enterprises to find and organise mission-critical publically available information on the internet. No word as to whether this technology will be used to sort through personal data looking for terrorists, but we are pretty sure that DARPA would not want the search engine to find muffin recipes.

DARPA said it intends to develop Memex to address human trafficking. Human trafficking is a factor in many types of military, law enforcement and intelligence investigations and has a significant web presence to attract customers. This requires searching through forums, chats, advertisements, job postings, and hidden services.

The Memex programme will address the need to move beyond a largely manual process of searching for exact text in a central index. If it manages it, it will fix problems that Google and Microsoft have finding data in the deep web such as temporary pages, pages behind forms, an impoverished index, which may not include shared content across pages, normalised content, automatic annotations, content aggregation, analysis.

The web suffers from the fact that there are basic search interfaces, where every session is independent, there is no collaboration or history beyond the search term, and nearly exact text input is required; standard practice for interacting with the majority of web content, which remains one-at-a-time manual queries that return federated lists of results.