分享

ibm open-sources new search technology

 accesine 2005-08-10
IBM Open-Sources New Search Technology
By John Pallatto


IBM plans to release as open-source a sophisticated new search and text analysis technology that is able to find relationships, trends and facts buried in a wide range of unstructured data, including e-mails, Web pages, text documents, images, audio and video.

Called the UIMA (Unstructured Information Management Architecture), the technology is able is able to go beyond the keyword analysis typically used by most search engines to discern the semantic meanings within text and other unstructured data, said Nelson Mattos, vice president of information integration with IBM in San Jose, Calif.

IBM implemented UIMA in its WebSphere Information Integrator OmniFind Edition as part of its enterprise search platform, which Mattos said was the first commercially available application for this technology. IBM announced UIMA at the start of the LinuxWorld Conference & Expo in San Francisco this week.

UIMA was the result of four years of development by IBM Research supported by The DARPA Advanced Research Projects Agency, which is the central research and development arm of the U.S. Defense Department.

PointerClick here to read about new spider technology in WebSphere Commerce 5.6.1 that is designed to efficiently index commerce Web pages that are updated frequently.

Major universities and private research organizations, including Carnegie Mellon University, Columbia University and the University of Massachusetts participated in the development of the technology and are now using UIMA in course work and research projects, according to IBM Officials.

BBN Technologies Inc., Science Applications International Corp., the Mayo Clinic and MITRE Corp. also contributed to the research.

eWEEK.com Special Report: LinuxWorld

"We are announcing that we are going to be open-sourcing that architecture to allow for a broad adoption in the marketplace," Mattos said.

Releasing the UIMA technology as open-source code will make it easier for commercial, government corporate and academic software developers to produce extensions and applications for the search technology, Mattos said. IBM will benefit from this when it gets opportunities to provide the computing and networking infrastructure to support these applications, he said.

UIMA will be presented to the Open Source Technology Group and be made available through the SourceForge online developer community by the end of 2005. Developers can also download the UIMA framework for free from IBM‘s Alpha Works division.

The search technology is particularly valuable for business intelligence applications that sift through e-mails or electronic documents to reveal trends that would otherwise be hidden from basic keyword searches, Mattos said.

For example, UIMA can be used to search through call center reports on problems about particular product such as a car to reveal mechanical or maintenance problems, Mattos said.

Such searches may reveal a product quality problem earlier in the production cycle so changes can be made before it damages the produces reputation or sales, he said.

PointerRead more here about the major Web search engines working on ways to ferret out more premium content that was locked away in Web sites that were restricted to paid subscribers.

IT also allows companies to analyze "sales verses maintenance cost of a product and realize that while you are doing very well selling certain products, the maintenance cost of those is very high" because there are so many complaints and service calls about them, said Mattos.

Offering UIMA as an open-source technology is a good move because it increases the chances that it can be accepted as an industry standard for searching and analyzing all types of unstructured data, said Dana Gardner, principal analyst with industry researcher Interarbor Solutions.

eWEEK.com Special Report: New Frontiers for Search

"There has been a mish mash approach to text analytics, and I think there is a real value to having an interoperable methodology" in the market that brings together many of the best ideas about analyzing unstructured data, Gardner said.

The search engines available today are able to find huge numbers of documents with keyword searches, but they are poor providing an overview of the information contained in those documents, Gardner said. "We‘ve had a bunch of trees, but no way of viewing the forest when it comes to text analytics."

If UIMA is widely accepted as an industry standard, "it could allow for real-time analysis of an entire corporate intranet, which could be extremely powerful and allows for knowledge to be much more attainable, recoverable and actionable," said Gardner.

It‘s also true that the technology could also be used as a powerful intelligence gathering tool by the National Security Agency or the Central Intelligence Agency to sift through e-mail messages, phone conversations, or many other kinds of data, Gardner observed.

However, "I think that the spooks at the NSA and that ilk probably have these kinds of capabilities already," Gardner said.

UIMA will be much more valuable by taking out of the cloistered domain of intelligence and making available to the much larger domains of business," he said.

PointerCheck out eWEEK.com‘s Search Center for the latest news, views and analysis on enterprise search technology.

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多