Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

The freshness of Web search engine databases

Lewandowski, Dirk and Wahlig, Henry and Meyer-Bautor, Gunnar (2005) The freshness of Web search engine databases.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

This is a preprint of an article published in the Journal of Information Science Vol. 32, No. 2, 131-148 (2006). This study measures the frequency in which search engines update their indices. Therefore, 38 websites that are updated on a daily basis were analysed within a time-span of six weeks. The analysed search engines were Google, Yahoo and MSN. We find that Google performs best overall with the most pages updated on a daily basis, but only MSN is able to update all pages within a time-span of less than 20 days. Both other engines have outliers that are quite older. In terms of indexing patterns, we find different approaches at the different engines: While MSN shows clear update patterns, Google shows some outliers and the update process of the Yahoo index seems to be quite chaotic. Implications are that the quality of different search engine indices varies and not only one engine should be used when searching for current content.

EPrint Type:Preprint
Keywords:search engines; Online Information Retrieval; index freshness
Subjects:World Wide Web
Information Science
Information Retrieval
Internet
Information Systems
ID Code:1134
Deposited On:25 May 2006
Alternative Locations:http://www.durchdenken.de/lewandowski/doc/jis_preprint.pdf, http://jis.sagepub.com/cgi/content/abstract/32/2/131, DOI: 10.1177/0165551506062326., http://eprints.rclis.org/archive/00004619/
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.

[1] A. Acharya, A., M. Cutts, J. Dean, P. Haahr, M. Henzinger, U. Hoelzle, S. Lawrence, K. Pfleger, O. Sercinoglu, and S. Tong, Information retrieval based on historical data (Patent Application US 2005/0071741 A1, 2005)

[2] V. Cothey, Web-Crawling Reliability, Journal of the American Society for Information Science and Technology 55(14) (2004) 1228-1238.

[3] N. Ford, D. Miller and N. Moss, Web search strategies and retrieval effectiveness: an empirical study, Journal of Documentation 58(1) (2002) 30-48

[4] R. Fries, W. Schweibenz, J. Strobel and P. Wiland, Was indexieren Suchmaschinen? Eine Untersuchung zu Indexierungsmechanismen von Suchmaschinen im World Wide Web, BIT Online 4(1) (2001) 49-56.

[5] J. Griesbaum, Evaluation of three German search engines: Altavista.de, Google.de and Lycos.de (2004). Available at: http://informationr.net/ir/9-4/paper189.html (accessed 8 May 2005).

[6] J. Griesbaum, M. Rittberger and B. Bekavac, Deutsche Suchmaschinen im Vergleich: AltaVista.de, Fireball.de, Google.de und Lycos.de. In: R. Hammwöhner, C. Wolff, C. Womser-Hacker (eds.), Information und Mobilität. Optimierung und Vermeidung von Mobilität durch Information. Proceedings des 8. Internationalen Symposiums für Informationswissenschaft (UVK, Konstanz, 2002).

[7] S. Lawrence and C.L. Giles, Searching the World Wide Web, Science 280 (1998) 98-100.

[8] S. Lawrence and C.L. Giles: Accessibility of information on the web. Nature 400(8) (1999) 107-109.

[9] H. Leighton and J. Srivastava, First 20 Precision among World Wide Web Search Services (Search Engines), Journal of the American Society for Information Science 50(10) (1999) 870-881.

[10] D. Lewandowski, Date-restricted queries in web search engines, Online Information Review 28(6) (2004) 420-427.

[11] L. Lo Grasso and H. Wahlig, Google und seine Suchparameter: Eine Top 20-Precision Analyse anhand repräsentativ ausgewählter Anfragen. Information Wissenschaft und Praxis 56(2) (2005) 77-86. Accepted for Publication By the Journal of Information Science: http://jis.sagepub.co.uk The Freshness of Web search engines’ databases Journal of Information Science, © CILIP 2005

[12] M. Machill and C. Welp (eds.), Wegweiser im Netz: Qualität und Nutzung von Suchmaschinen (Verlag Bertelsmann Stiftung, Gütersloh, 2003).

[13] M. Machill, D. Lewandowski and S. Karzauninkat, Journalistische Aktualität im Internet. Ein Experiment

mit den “News-Suchfunktionen” von Suchmaschinen. In: M. Machill and N. Schneider (eds.), Suchmaschinen: Eine Herausforderung für die Medienpolitik, (Vistas, Berlin, 2005).

[14] A. Mowshowitz and A. Kawaguchi, Assessing bias in search engines, Information Processing & Management 38(1) (2001) 141-156.

[15] G. Notess, Search Engine Statistics: Freshness Showdown [Data from 17 May 2003] (2003). Available at: http://www.searchengineshowdown.com/stats/freshness.shtml (accessed 17 April 2005).

[16] G. Notess, Search Engine Statistics: Freshness Showdown [Data from 20 October 2002] (2002). Available at: http://www.searchengineshowdown.com/stats/0210freshness.shtml (accessed 17 April 2005).

[17] G. Notess, Search Engine Statistics: Freshness Showdown [Data from4 April 2002] (2002). Available at: http://www.searchengineshowdown.com/stats/0204freshness.shtml (accessed 17 April 2005).

[18] G. Notess, Search Engine Statistics: Freshness Showdown [Data from 7 March 2002] (2002). Available at: http://www.searchengineshowdown.com/stats/0203freshness.shtml (accessed 17 April 2005).

[19] G. Notess, Search Engine Statistics: Freshness Showdown [Data from 13 August 2001] (2001). Available at: http://www.searchengineshowdown.com/stats/0108freshness.shtml (accessed 17 April 2005).

[20] A. Ntoulas, J. Cho and C. Olston, What's New on the Web? The Evolution of the Web from a Search Engine

Perspective (2004). In: Proceedings of the Thirteenth WWW Conference, New York, USA. http://oak.cs.ucla.edu/~ntoulas/pubs/ntoulas_new.pdf (accessed 8 May 2005).

[21] A. Singhal, and M. Kaszkiel, A Case Study in Web Search using TREC Algorithms. In: Tenth World Wide Web Conference 2001: Proceedings of the 10th World Wide Web Conference (ACM Press, New York, 2001).

[22] D. Sullivan: Nielsen Net Ratings Search Engine Ratings, Searchenginewatch.com. http://searchenginewatch.com/reports/article.php/2156451 (accessed 22 April 2005).

[23] L. Vaughan and M. Thelwall, Search Engine Coverage Bias: Evidence and Possible Causes, Information Processing & Management 40(4) (2004) 693-707.

[24] C. Wolff, Effektivität von Recherchen im WWW: Vergleichende Evaluierung von Such- und Metasuchmaschinen. In: G. Knorz and R. Kuhlen (eds.), Informationskompetenz - Basiskompetenz in der Informationsgesellschaft, Proceedings des 7. Internationalen Symposiums für Informationswissenschaft (UVK, Konstanz, 2000).

EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate