Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

An intelligent personal spider (agent) for dynamic Internet/Intranet searching

Chen, Hsinchun and Chung, Yi-Ming and Ramsey, Marshall C. and Yang, Christopher C. (1998) An intelligent personal spider (agent) for dynamic Internet/Intranet searching . Decision Support Systems 23(1):pp. 41-58.

Full text available as:
HTML

Abstract

As Internet services based on the World-Wide Web become more popular, information overload has become a pressing research problem. Difficulties with search on Internet will worsen as the amount of on-line information increases. A scalable approach to Internet search is critical to the success of Internet services and other current and future National Information Infrastructure (NII) applications. As part of the ongoing Illinois Digital Library Initiative project, this research proposes an intelligent personal spider (agent) approach to Internet searching. The approach, which is grounded on automatic textual analysis and general-purpose search algorithms, is expected to be an improvement over the current static and inefficient Internet searches. In this experiment, we implemented Internet personal spiders based on best first search and genetic algorithm techniques. These personal spiders can dynamically take a user's selected starting homepages and search for the most closely related homepages in the web, based on the links and keyword indexing. A plain, static CGI/HTML-based interface was developed earlier, followed by a recent enhancement of a graphical, dynamic Java-based interface. Preliminary evaluation results and two working prototypes (available for Web access) are presented. Although the examples and evaluations presented are mainly based on Internet applications, the applicability of the proposed techniques to the potentially more rewarding Intranet applications should be obvious. In particular, we believe the proposed agent design can be used to locate organization-wide information, to gather new, time-critical organizational information, and to support team-building and communication in Intranets.

EPrint Type:Journal Article (Paginated)
Keywords:National Science Digital Library, NSDL, Artificial Intelligence Lab, AI Lab, Agents; Machine learning; Spider; Evolutionary programming; Information retrieval; Semantic retrieval; Java; Intranet
Subjects:Internet
World Wide Web
Information Extraction
ID Code:461
Deposited On:04 September 2004
Alternative Locations:http://ai.bpa.arizona.edu/go/papers.html
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.

1. T. Berners-Lee, R. Cailliau, A. Luotonen, H.F. Nielsen and A. Secret, The World-Wide Web. Commun. ACM 37 8 (1994), pp. 76¯82.

2. C.M. Bowman, P.B. Danzig, U. Manber and F. Schwartz, Scalable internet resource discovery: research problems and approaches. Commun. ACM 37 8 (1994), pp. 98¯107.

3. E. Carmel, S. Crawford and H. Chen, Browsing in hypertext: a cognitive study. IEEE Trans. Syst., Man Cybernetics 22 5 (1992), pp. 865¯884.

4. H. Chen, Collaborative systems: solving the vocabulary problem, IEEE Computer, 27 (5) 58¯66, Special Issue on Computer-Supported Cooperative Work (CSCW), May 1994.

5. H. Chen, A. Houston, J. Yen and J.F. Nunamaker, Toward intelligent meeting agents. IEEE Computer 29 8 (1996), pp. 62¯70.

6. H. Chen, B.R. Schatz, Semantic retrieval for the NCSA Mosaic, Proceedings of the Second International World-Wide Web Conference 1994, Chicago, IL, October 17¯20, 1994.

7. H. Chen, B.R. Schatz, T.D. Ng, J.P. Martinez, A.J. Kirchhoff and C. Lin, A parallel computing approach to creating engineering concept spaces for semantic retrieval: the Illinois Digital Library Initiative Project. IEEE Trans. Pattern Anal. Machine Intelligence 18 8 (1996), pp. 771¯782.

8. F. Cheong, Internet Agents, New Riders Publishing, Indianapolis, IN, 1996.

9. P. DeBra, R. Post, Information retrieval in the World-Wide Web: making client-based searching feasible, Proceedings of the First International World-Wide Web Conference 1994, Geneva, Switzerland, 1994.

10. O. Etzioni and D. Weld, A softbot-based interface to the Internet. Commun. ACM 37 7 (1994), pp. 72¯79.

11. G.W. Furnas, T.K. Landauer, L.M. Gomez and S.T. Dumais, The vocabulary problem in human¯system communication. Commun. ACM 30 11 (1987), pp. 964¯971.

12. D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, MA, 1989.

13. J.R. Koza, Genetic Programming: on the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge, MA, 1992.

14. P. Maes, Agents that reduce work and information overload. Commun. ACM 37 7 (1994), pp. 30¯40.

15. Mauldin, Leavitt, Web-agent related research at the CMT, Proceedings of the ACM Special Interest Group on Networked Information Discovery and Retrieval (SIGNIDR-94), August 1994.

16. Z. Michalewicz, Genetic Algorithms+Data Structures=Evolution Programs, Springer-Verlag, Berlin, 1992.

17. J. Pearl, Heuristics: Intelligent Search Strategies for Computer Problem Solving, Addison-Wesley Publishing, Reading, MA, 1984.

18. E. Rasmussen. Clustering algorithms. In Information Retrieval: Data Structures and Algorithms, W.B. Frakes and R. Baeza-Yates, Editors, Prentice Hall, Englewood Cliffs, NJ, 1992.

19. D. Rieken. Intelligent agents. Communications of the ACM, 37(7):18¯21, July 1994.

20. B.R. Schatz, A. Bishop, W. Mischo, and J. Hardin. Digital library infrastructure for a university engineering community. In Proceedings of Digital Libraries '94, pages 21¯24, June 1994.

21. B.R. Schatz and H. Chen. Building large-scale digital libraries. IEEE COMPUTER, 29(5):22¯27, May 1996.

22. B.R. Schatz and J.B. Hardin. NSCA Mosaic and the World Wide Web: global hypermedia protocols for the internet. Science, 265:895¯901, 12 August 1994.

23. S. Spetka. The TkWWW robot: Beyond browsing. In Proceedings of the Second World Wide Web Conference, October 17¯20 1994.

24. M.M. Waldrop. Software agents prepare to sift the riches of cyberspace. Science, 265:882¯883, 12 August 1994.

EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate