The Academic Web Link Database Project

Making available databases of academic web links to the world research community

This project was created in response to the need for research into web links: including web link mining, and the creation of link metrics. It is aimed at providing the raw data and software for researchers to analyse link structures without having to rely upon commercial search engines, and without having to run their own web crawler. This site will contain all of the following.

Slow Internet connection? We will send researchers the databases for free upon receipt of a self-addressed (unstamped) parcel containing an empty CD-case. We will pay postage and supply the CD without charge (it would be too little money to bother with anyway).

Databases - Tools for mining the data - Crawling methodology - Web link research - Research group

Database 9: UK university web sites June-July, 2002

Database 8: Taiwan university web sites July, 2002

Database 7: Mainland China university web sites July, 2002

Database 6: New Zealand university web sites January 2002 to February 2002

Database 5: Australian university web sites October, 2001 to January 2002 (slow crawl)

Database 4: UK university web sites July, 2001

Database 3: UK university web sites June-July, 2000

Database 2: New Zealand university web sites July-August, 2000

Database 1: Australian university web sites July-August, 2000

Tools

These programs should run on most versions of Windows. Please email if there is any problem. Some of the programs may take a long time to run (days if you have a slow computer and are processing the large database files). Expect a more comprehensive collection of tools soon. We are sorry for the awful interfaces provided on the programs but are happy to advise researchers on which programs will be useful to conduct the type of analysis that they are interested in.

Description of Crawling Methodology

A link to an online journal article is expected shortly, based on this preprint. Additional crawling issues and techniques are discussed in the following article.

Thelwall, M. (2001) A Web Crawler Design for Data Mining, Journal of Information Science 27(5), 319-326.

Web Link Research

For our publications, please see the Statistical Cybermetrics Research Group home page. There is a large list of related work available on the web site of the e-journal Cybermetrics. A much bigger Unix-based archive that is similar in spirit is available at http://www.archive.org/.

About this project

This project is run by the Statistical Cybermetrics Research Group at the University of Wolverhampton. We do not charge for any of the data or tools placed here because we feel that we have an obligation to make our raw data available for free since we collected it for free from the Web sites covered. The crawling is resource intensive and time-consuming so we are unfortunately not able to respond to requests such as "please crawl country X". If any bodies, such as national research agencies, would like to see their countries' universities included, then this will involve a charge. We would expect, but not insist, that the data resulting from such an arrangement would be subsequently made available on this site, also without charge. We are currently bidding for funding for Web link mining research projects that involve crawling countries and expect that this site will grow as a result.

For more information or to notify errors please email m.thelwall@wlv.ac.uk