Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

Bibliomining for Automated Collection Development in a Digital Library Setting: Using Data Mining to Discover Web-Based Scholarly Research Works

Nicholson, Scott (2003) Bibliomining for Automated Collection Development in a Digital Library Setting: Using Data Mining to Discover Web-Based Scholarly Research Works. Journal of the American Society for Information Science and Technology 54(12).

Full text available as:
HTML
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

This research creates an intelligent agent for automated collection development in a digital library setting. It uses a predictive model based on facets of each Web page to select scholarly works. The criteria came from the academic library selection literature, and a Delphi study was used to refine the list to 41 criteria. A Perl program was designed to analyze a Web page for each criterion and applied to a large collection of scholarly and non-scholarly Web pages. Bibliomining, or data mining for libraries, was then used to create different classification models. Four techniques were used: logistic regression, non-parametric discriminant analysis, classification trees, and neural networks. Accuracy and return were used to judge the effectiveness of each model on test datasets. In addition, a set of problematic pages that were difficult to classify because of their similarity to scholarly research was gathered and classified using the models. The resulting models could be used in the selection process to automatically create a digital library of Web-based scholarly research works. In addition, the technique can be extended to create a digital library of any type of structured electronic information.

EPrint Type:Journal (On-line/Unpaginated)
Keywords:Digital Libraries, Collection Development, World Wide Web, Search Engines, Bibliomining, Data Mining, Intelligent Agents
Subjects:Web Mining
Data Mining
Digital Libraries
ID Code:625
Deposited On:11 December 2004
Alternative Locations:http://bibliomining.com/nicholson/asisdiss.html, http://bibliomining.com/nicholson/nicholsonpdfs/asisdiss.pdf
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.

Banerjee, K. (1998). Is data mining right for your library? Computers in Libraries, 18(10), 28-31.

Basch, R. (1990). Databank software for the 1990s and beyond. Online, 14 (2),17-24.

Beaver, A. (1998, December). Evaluating search engine models for scholarly purposes. D-Lib Magazine. Retrieved November 23, 2002, from http://www.dlib.org/dlib/december98/12beavers.html.

Berry, M. J. and Linoff, G. (1997). Data Mining Techniques. New York: Wiley Computer Publishing.

Cassel, R. (1995). Selection criteria for Internet resources. C&RL News 56(2), 92-93.

Chau, M. (1999). Web mining technology and academic librarianship: Human-machine connections for the twenty-first century. First Monday 4(6) Retrieved November 23, 2002 from http://www.firstmonday.dk/issues/issue4_6/chau.

Cleverdon, C. (1962). Report on the Testing and Analysis of an Investigation into the Comparative Efficiency of Indexing System. Cranfield, U.K.: College of Aeronautics.

Collins, B. (1996). Beyond cruising: Reviewing. Library Journal, 121(3), 122-124.

Dickinson, J. (1984). Science and Scientific Researchers in Modern Society. (2nd ed.). Paris: Unesco.

Evans, G. E. (2000). Developing Library and Information Center Collections. (4th ed.). Englewood, CO: Libraries Unlimited.

Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37-54.

Futas, E., (Ed.). (1995). Library Acquisition Policies and Procedures. (3rd ed.). Phoenix: Oryx Press.

Information Market Observatory (IMO). (1995). The Quality of Electronic Information Products and Services. Retrieved November 23, 2002 from http://www.midas.gr/info2000/market/fn954ww.zip.

Hinchliffe, L. J. (1997). Evaluation of Information. Retrieved November 23, 2002 from http://alexia.lis.uiuc.edu/~janicke/Eval.html.

Hinton, G. (1992). How neural networks learn from experience. Scientific American, 267(3), 145-151.

Hofman, P., and Worsfold, E. (1999). A list for quality selection criteria: A reference tool for Internet subject gateways. Selection Criteria for Quality Controlled Information Gateways. Retrieved November 23, 2002 from http://www.ukoln.ac.uk/metadata/desire/quality/report-2.html.

Johnston, M. & Weckert, J. (1990). Selection Advisor: An expert system for collection development. Information Technology and Libraries, 9(3), 219-225.

Lawrence, S., and Giles, C. (1999). Accessibility of information on the Web. Nature, 400, 107-109.

Lawrence, S., Giles, C., and Bollacker, K. (1999). Digital libraries and autonomous citation indexing. IEEE Computer, 32(6), 67-71.

McCallum, A., Nigam, K., Rennie, J., and Seymore, K. (1999). Building domain-specific search engines with machine learning techniques. In Proceedings of the AAAI-99 Spring Symposium on Intelligent Agents in Cyberspace. Retrieved November 23, 2002 from: http://citeseer.nj.nec.com/cache/papers/cs/1785/http:zSzzSzwww.cs.cmu.eduzSz~mccallumzSzpaperszSzcora-aaaiss98.pdf/mccallum99building.pdf.

McGeachin, R. B. (1998). Selection criteria for Web-based resources in a science and technology library collection. Issues in Science and Technology Librarianship, 18. Retrieved November 23, 2002 from: http://www.istl.org/98-spring/article2.html.

Neill, S. D. (1989). The information analyst as a quality filter in the scientific communication process. Journal of Information Science, 15, 3-12.

Nentwich, M. (1999). Quality filters in electronic publishing. The Journal of Electronic Publishing, 5(1). Retrieved November 23, 2002 from: http://www.press.umich.edu/jep/05-01/nentwich.html.

Nicholson, S, and Stanton, J. (in press). Gaining strategic advantage through Bibliomining: Data mining for management decisions in corporate, special, digital, and traditional libraries. In H. Nemati & C. Barko (Eds.), Organizational Data Mining: Leveraging Enterprise Data Resources For Optimal Performance. Hershey, PA : Idea Group Publishing.

Nicholson, S. 2000. Creating an Information Agent through Data Mining: Automatic Indexing of Academic Research on the World Wide Web. Unpublished doctoral dissertation., University of North Texas, Denton. Retrieved November 23, 2002 from: http://www.scottnicholson.com/ scholastic/finaldiss.doc.

Nicholson, S. (2002). Bibliomining: Data Mining for Libraries. Retrieved November 22, 2002, from http://www.bibliomining.org.

Piontek, S. and Garlock, K. (1996). Creating a World Wide Web resource collection. Internet Research: Electronic Networking Applications and Policy, 6(4):20-26.

Pratt, G.F., Flannery, P., and Perkins, C. L. D. (1996). Guidelines for Internet resource selection. C&RL News, 57(3), 134-135.

Sharma, S. (1996). Applied Multivariate Techniques. New York: John Wiley & Sons.

Smith, A. (1997). Criteria for Evaluation of Internet Information Resources. Retrieved November 23, 2002 from: http://www.vuw.ac.nz/~agsmith/evaln.

Trybula, W. J. (1997). Data mining and knowledge discovery. In M. E. Williams (Ed.) Annual Review of Information Science and Technology, 32, 196-229. Medford, NJ: Information Today.

Yulan, H. and Cheung, H. (2000). Mining citation database for the retrieval of scientific publications over the WWW. Proceedings of Conference on Intelligent Information Processing, 64-72. Publishing House of Electron. Ind; Bejing, China.

EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate