Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

Exploring the Academic Invisible Web

Lewandowski, Dirk and Mayr, Philipp (2006) Exploring the Academic Invisible Web.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

Purpose: To provide a critical review of Bergman’s 2001 study on the Deep Web. In addition, we bring a new concept into the discussion, the Academic Invisible Web (AIW). We define the Academic Invisible Web as consisting of all databases and collections relevant to academia but not searchable by the general-purpose internet search engines. Indexing this part of the Invisible Web is central to scientific search engines. We provide an overview of approaches followed thus far. Design/methodology/approach: Discussion of measures and calculations, estimation based on infor-metric laws. Literature review on approaches for uncovering information from the Invisible Web. Findings: Bergman’s size estimation of the Invisible Web is highly questionable. We demonstrate some major errors in the conceptual design of the Bergman paper. A new (raw) size estimation is given. Research limitations/implications: The precision of our estimation is limited due to small sample size and lack of reliable data. Practical implications: We can show that no single library alone will be able to index the Academic Invisible Web. We suggest collaboration to accomplish this task. Originality/value: Provides library managers and those interested in developing academic search en-gines with data on the size and attributes of the Academic Invisible Web.

EPrint Type:Preprint
Keywords:Search engines, Worldwide Web, Indexing, Scholarly content, Digital library
Subjects:World Wide Web
Information Science
Web Metrics
Internet
ID Code:1127
Deposited On:17 May 2006
Alternative Locations:http://www.durchdenken.de/lewandowski/doc/LHT_Preprint.pdf
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.

Bergman, M.K. (2001), "The Deep Web: surfacing hidden value", Journal of Electronic Publishing, Vol. 7, No. 1, available at: www.press.umich.edu/jep/07-01/bergman.html (accessed 6 April 2006).

Brophy, J. and Bawden, D. (2005), "Is Google enough? Comparison of an internet search engine with academic library resources", Aslib Proceedings, Vol. 57, No. 6, pp. 498-512.

Jacsó, P. (2005), "Google Scholar: the pros and cons", Online Information Review, Vol. 29, No. 2, pp. 208-214.

Lawrence, S. and Giles, C.L. (1999), "Accessibility of information on the web", Nature, Vol. 400, No. 8, pp. 107-109.

Lewandowski, D. (2005a), "Google Scholar - Aufbau und strategische Ausrichtung des Angebots sowie Auswirkung auf andere Angebote im Bereich der wissenschaftli-chen Suchmaschinen", available at: www.durchdenken.de/lewandowski/doc/Expertise_Google-Scholar.pdf (accessed 13 December 2005).

Lewandowski, D. (2005b), Web Information Retrieval: Technologien zur Informationssu-che im Internet, DGI, Frankfurt am Main.

Lewandowski, D. (2005c), "Yahoo - Zweifel an den Angaben zur Indexgröße, Suche in mehreren Sprachen", Password, Vol. 20, No. 9, pp. 21-22.

Lewandowski, D. (2006), "Suchmaschinen als Konkurrenten der Bibliothekskataloge: Wie Bibliotheken ihre Angebote durch Suchmaschinentechnologie attraktiver und durch Öffnung für die allgemeinen Suchmaschinen populärer machen können", Zeitschrift für Bibliothekswesen und Bibliographie, Vol. 53, No. 2, pp. 71-78.

Lossau, N. (2004), "Search engine technology and digital libraries: libraries need to dis-cover the academic internet, D-Lib Magazine, Vol. 10, No. 6, available at: www.dlib.org/dlib/june04/lossau/06lossau.html (accessed 6 April 2006).

Lyman, P., Varian, H.R., Swearingen, K., Charles, P., Good, N., Jordan, L.L., et al. (2003), "How much information 2003?", available at: www.sims.berkeley.edu/research/projects/how-much-info-2003/ (accessed 6 April 2006).

Mayr, P. and Walter, A.-K. (2005), "Google Scholar - Wie tief gräbt diese Suchmaschi-ne?", Paper presented at the 11. IuK-Jahrestagung: In die Zukunft publizieren: Herausforderungen an das Publizieren und die Informationsversorgung in den Wissenschaften, Bonn, Germany, 9-11 May 2005. available at: www.ib.hu-berlin.de/~mayr/arbeiten/Mayr_Walter05-preprint.pdf (accessed 6 April 2006).

McKiernan, G. (2005), "E-profile: Scirus: for scientific information only, Library Hi Tech News, Vol. 22, No. 3, pp. 18-25.

Notess, G.R. (2005), "Scholarly Web searching: Google Scholar and Scirus", Online, Vol. 29, No. 4, pp. 39-41.

Ru, Y. and Horowitz, E. (2005), "Indexing the invisible web: a survey", Online Information Re-view, Vol. 29, No. 3, pp. 249-265.

"Scirus White Paper: how Scirus works" (2004), available at: www.scirus.com/press/pdf/WhitePaper_Scirus.pdf (accessed 6 April 2006).

Sherman, C. (2001), "Search for the Invisible Web", available at:. www.guardian.co.uk/online/story/0,3605,547140,00.html (accessed 8 March 2006).

Sherman, C. and Price, G. (2001), The Invisible Web: Uncovering Information Sources Search Engines Can't See, Information Today, Medford, NJ.

Stock, W.G. (2003), "Weltregionen des Internet: Digitale Informationen im WWW und via WWW", Password, Vol. 18, No. 2, pp. 26-28.

Williams, M.E. (2005), "The state of databases today: 2005", in Gale Directory of Data-bases, Vol. 2, pp. XV-XXV, Gale Group, Detroit, MI.

EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate