Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

Web Searching, Search Engines and Information Retrieval

Lewandowski, Dirk (2005) Web Searching, Search Engines and Information Retrieval. Information Services & Use 25(3).

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

This article discusses Web search engines; mainly the challenges in indexing the World Wide Web, the user behaviour, and the ranking factors used by these engines. Ranking factors are divided into query-dependent and query-independent factors, the latter of which have become more and more important within recent years. The possibilities of these factors are limited, mainly of those that are based on the widely used link popularity measures. The article concludes with an overview of factors that should be considered to determine the quality of Web search engines.

EPrint Type:Journal Article (On-line/Unpaginated)
Keywords:Search Engines
Subjects:World Wide Web
Web Mining
Information Retrieval
ID Code:1125
Deposited On:12 May 2006
Alternative Locations:http://www.durchdenken.de/lewandowski/doc/isu_preprint.pdf
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.

1. Acharya, A.; Cutts, M.; Dean, J.; Haahr, P.; Henzinger, M.; Hoelzle, U.; Lawrence, S.; Pfleger, K.; Sercinoglu, O.; Tong, S. (2005): Information retrieval based on historical data. Patent Application US 2005/0071741 A1 (published: 31.3.2005)

2. Bergman, M. K. (2001): The Deep Web: Surfacing Hidden Value. Journal of Electronic Publishing 7(1). http://www.press.umich.edu/jep/07-01/bergman.html [22.8.2005]

3. Broder, A. (2002): A taxonomy of web search. SIGIR Forum 36(2). http://www.acm.org/sigir/forum/F2002/broder.pdf [22.8.2005]

4. Chakrabarti, S. (2003): Mining the Web: Discovering Knowledge from Hypertext Data. Amsterdam (u.a.): Morgan Kaufmann

5. Clay, B. (2004): Search Engine Relationship Chart. http://www.bruceclay.com/searchenginechart.pdf [22.8.2005]

6. Fetterley, D.; Manasse, M.; Najork, M.: Spam, Damn Spam, and Statistics. Seventh International Workshop on the Web and Databases (WebDB 2004), June 17-18, 2004, Paris, France, pp. 1-6

7. Gee. K.R.: Using Latent Semantic Indexing to Filter Spam. Proceedings of SAC 2003, Florida, USA. pp. 460-464

8. Gulli, A.; Signorini, A. (2005): The Indexable Web is More than 11.5 billion pages. Proceedings of the Special interest tracks and posters of the 14th international conference on World Wide Web, May 10-14, 2005, Chiba, Japan. pp. 902-903

9. Gyögyi, Z.; Garcia-Molina, H.; Pedersen, J.: Combating Spam with TrustRank. Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004, pp. 576-587

10. Hamilton, N. (2003): The Mechanics of a Deep Net Metasearch Engine. http://turbo10.com/papers/deepnet.pdf [22.8.2005]

11. Jansen, B. J.; Spink, A.; Saracevic, T. (2000): Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web. Information Processing & Management 36(2), pp. 207-227

12. Kleinberg, J. (1999): Authoritative Sources in a Hyperlinked Environment. Journal of the ACM 46(5), pp. 604-632

13. Lawrence, S., Giles, C. L. (1998): Searching the World Wide Web. Science 280, pp. 98-100

14. Lawrence, S., Giles, C. L. (1999): Accessibility of information on the web. Nature 400(8), pp. 107-109

15. Lewandowski,, D. (2004): Abfragesprachen und erweiterte Funktionen von WWWSuchmaschinen. Information: Wissenschaft und Praxis 55(2), pp. 97-102

16. Lewandowski, D. (2005): Web Information Retrieval. Frankfurt am Main, DGI, 2005

17. Lewandowski, D. (2005): Yahoo - Zweifel an den Angaben zur Indexgröße, Suche in mehreren Sprachen. Password 20(9) [to appear]

18. Lewandowski, D.; Wahlig, H.; Meyer-Bautor, G.: The Freshness of Web Search Engines’ Databases. [to appear]

19. Machill, M.; Lewandowski, D.; Karzauninkat, S. (2005): Journalistische Aktualität im Internet. Ein Experiment mit den News-Suchfunktionen von Suchmaschinen. In: Machill, M.; Schneider, N. (Hrsg.): Suchmaschinen: Herausforderung für die Medienpolitik. Berlin: Vistas 2005, pp. 105-164

20. Machill, M.; Neuberger, C.; Schweiger, W.; Wirth, W. (2003): Wegweiser im Netz: Qualität und Nutzung von Suchmaschinen. In: Machill, M.; Welp, C. (Hrsg.): Wegweiser im Netz: Qualität und Nutzung von Suchmaschinen. Gütersloh: Verlag Bertelsmann Stiftung, pp. 13-490

21. Notess, G. (2003): Search Engine Statistics: Database Total Size Estimates. http://www.searchengineshowdown.com/stats/sizeest.shtml [7.7.2005]

22. Notess, G. (2003): Search Engine Statistics: Freshness Showdown. http://www.searchengineshowdown.com/stats/freshness.shtml [7.7.2005]

23. Ntoulas, A.; Cho, J.; Olston, C. (2004): What's New on the Web? The Evolution of the Web from a Search Engine Perspective. Proceedings of the Thirteenth WWW Conference, New York, USA. http://oak.cs.ucla.edu/~ntoulas/pubs/ntoulas_new.pdf [22.8.2005]

24. Page, L., Brin, S., Motwani, R., Winograd, T. (1998): The PageRank citation ranking: Bringing order to the Web. http://dbpubs.stanford.edu:8090/pub/1999-66 [22.8.2005]

25. Savoy, J.; Rasolofo, Y. (2001): Report on the TREC-9 Experiment: Link-Based Retrieval and Distributed Collections. http://trec.nist.gov/pubs/trec9/papers/unine9.pdf [22.8.2005]

26. Seuss, D. (2004): Ten Years Into the Web, and the Search Problem is Nowhere Near Solved. Computers In Libraries Conference, March 10-12, 2004. http://www.infotoday.com/cil2004/presentations/seuss.pps [22.8.2005]

27. Sherman, C. (2001): Search for the Invisible Web. Guardian Unlimited 6.9.2001. http://www.guardian.co.uk/online/story/0,3605,547140,00.html [22.8.2005]

28. Sherman, C.; Price, G. (2001): The Invisible Web: Uncovering Information Sources Search Engines Can't See. Medford, NJ: Information Today

29. Singhal, Amit (2004): Challenges in Running a Commercial Search Engine. http://www.research.ibm.com/haifa/Workshops/searchandcollaboration2004/papers/haifa.pdf [22.8.2005]

30. Smith, A. G. (2004): Web links as analogues of citations. Information Research 9(4). http://informationr.net/ir/9-4/paper188.html [22.8.2005]

31. Spink, A.; Jansen, B. J. (2004): Web Search: Public Searching of the Web. Dordrecht: Kluwer Academic Publishers

32. Stock, W. G. (2003): Weltregionen des Internet: Digitale Informationen im WWW und via WWW. Password Nr. 18(2), pp. 26-28

33. Thelwall, M. (2004): Link Analysis: An Information Science Approach. Amsterdam [u.a.]: Elsevier Academic Press

34. Vaughan, L. (2004): New measurements for search engine evaluation proposed and tested. In: Information Processing and Management 40(4), pp. 677-691

35. Vaughan, L.; Thelwall, M. (2004): Search Engine Coverage Bias: Evidence and Possible Causes. Information Processing & Management, 40(4), pp. 693-707

36. Wu, B.; Davison, B.D.: Identifying Link Farm Spam Pages. Proceedings of WWW 2005, May 10-14, Chiba, Japan, pp.820-829

EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate