Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

Concept Classification and Search on Internet Using Machine Learning and Parallel Computing Techniques

Chen, Hsinchun and Schatz, Bruce R. and Lin, Chienting (1995) Concept Classification and Search on Internet Using Machine Learning and Parallel Computing Techniques.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

The problems of information overload and vocabulary differences have become more pressing with the emergence of the increasingly popular Internet services. The main information retrieval mechanisms provided by the prevailing Internet WWW software are based on either keyword search or hypertext browsing. Keyword search often results in low precision, poor recall, and slow response time due to the limitations of indexing and communication methods, controlled language based interfaces, and the inability of searchers themselves to articulate their needs fully. Hypertext browsing, on the other hand, allows users to explore only a very small portion of a large Internet information space. A large information space can also potentially confuse and disorient its user and it can cause the user to spend a great deal of time while learning nothing specific. This research aims to provide concept-based categorization and search capabilities for Internet WWW servers based on selected machine learning and parallel computing techniques. Our proposed approach, which is grounded on automatic textual analysis of Internet documents, attempts to address the Internet search problem by first categorizing the content of Internet documents and subsequently providing semantic search capabilities based on a concept space approach. As a first step, we propose a multi-layered neural network clustering algorithm employing the Kohonen self-organizing feature map to categorize the Internet homepages according to their content. The category hierarchies created could serve to partition the vast Internet services into subject-specific categories and databases. After individual subject categories have been created, we propose to generate domain-specific concept spaces for each subject category. The concept spaces can then be used to support concept-based information retrieval, a significant improvement over the existing keyword searching and hypertext browsing options for Internet resource discovery. As Internet information space continues to grow at the present pace, we believe this research would shed light on potentially robust and scalable solutions to the increasingly complex and urgent information access and sharing problems that are certain to emerge in the future Internet society.

EPrint Type:Conference Poster
Keywords:National Science Digital Library, NSDL, Artificial Intelligence Lab, AI Lab, Information Retrieval
Subjects:Internet
Information Seeking Behaviors
Classification
ID Code:480
Deposited On:20 September 2004
Alternative Locations:http://ai.bpa.arizona.edu/go/papers.html
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.
EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate