Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

Comparison of Three Vertical Search Spiders

Chau, Michael and Chen, Hsinchun (2003) Comparison of Three Vertical Search Spiders. Computer 36(5):pp. 56-62.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

Spiders are the software agents that search engines use to collect content for their databases. We investigated algorithms to improve the performance of vertical search engine spiders. The investigation addressed three approaches: a breadth-first graph-traversal algorithm with no heuristics to refine the search process, a best-first traversal algorithm that used a hyperlink-analysis heuristic, and a spreading-activation algorithm based on modeling the Web as a neural network.

EPrint Type:Journal Article (Paginated)
Keywords:National Science Digital Library, NSDL, Artificial Intelligence Lab, AI Lab, Web spider
Subjects:Web Mining
Internet
Data Mining
ID Code:413
Deposited On:16 August 2004
Alternative Locations:http://ai.bpa.arizona.edu/go/papers.html
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.

1. A. Arasu et al., “Searching the Web,” ACM Trans.

Internet Technology, vol. 1, no. 1, 2001, pp. 2-43.

2. E. Amitay, “Using Common Hypertext Links to Identify

the Best Phrasal Description of Target Web Documents,”

Proc. ACM Special Interest Group on

Information Retrieval (SIGIR 98), Post-Conf. Workshop

on Hypertext Information Retrieval for the

Web, ACM Press, 1998; www.mri.mq.edu.au/

~einat/sigir/.

3. M. Najork and J.L. Wiener, “Breadth-First Search

Crawling Yields High-Quality Pages,” Proc. 10th

WWW Conf., 2001; www10.org/cdrom/papers/

208/.

4. J. Cho, H. Garcia-Molina, and L. Page, “Efficient

Crawling through URL Ordering,” Proc. 7th WWW

Conf., 1998; www7.scu.edu.au/programme/

fullpapers/1919/com1919.htm.

5. H. Chen and T. Ng, “An Algorithmic Approach to

Concept Exploration in a Large Knowledge Network

(Automatic Thesaurus Consultation): Symbolic

Brand-and-Bound Search vs. Connectionist Hopfield

Net Activation,” J. Am. Soc. Information Science,

vol. 46, no. 5, 1995, pp. 348-369.

6. K.L. Kwok, “A Neural Network for Probabilistic

Information Retrieval,” Proc. 12th ACM-SIGIR

Conf., ACM Press, 1989, pp. 21-30.

7. J.J. Hopfield, “Neural Networks and Physical Systems

with Emergent Collective Computational Abilities,”

Proc. Nat’l Academy of Science, vol. 79, no.

4, 1982, pp. 2554-2558.

8. H. Chen et al., “HelpfulMed: Intelligent Searching

for Medical Information over the Internet,” accepted

for publication in J. Am. Soc. for Information Science

and Technology.

9. K.M. Tolle and H. Chen, “Comparing Noun Phrasing

Techniques for Use with Medical Digital Library

Tools,” J. Am. Soc. Information Science, vol. 51, no.

4, 2000, pp. 352-370.

10. T.H. Haveliwala, Efficient Computation of Page-

Rank, tech. report, Stanford Univ., Stanford, Calif.,

1999; http://dbpubs.stanford.edu:8090/pub/1999-

31.

EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate