Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

Folksonomies vs. Bag-of-Words: The Evaluation & Comparison of Different Types of Document Representations

Gruzd, Anatoliy A (2006) Folksonomies vs. Bag-of-Words: The Evaluation & Comparison of Different Types of Document Representations.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

This poster (2-page summary) was presented at The 17th Annual SIG/CR Classification Research Workshop, a part of the 2006 Annual Meeting of the American Society for Information Science and Technology (ASIST), November 4, 2006, Austin, Texas. Among the factors that influence the effectiveness of retrieval systems, the most influential is the quality of document representation (docrep) (Lancaster, 1998). Most Internet search engines rely on docreps automatically extracted from web pages (commonly called Bag-of-Words). Unfortunately, this automatic approach often introduces noise (items unrelated to the page’s core topic) to docreps. One way to reduce noise is to utilize user-created docreps which are less susceptible to it. Until recently, it was impractical to rely on user-created docreps on Internet-size collections. This all changed when online bookmarking web-services such as citeulike.org and del.icio.us started to appear. These bookmarking web-services made it easier for the vast Internet communities to collaborate and produce community-generated descriptors (known as folksonomies). Due to their multi-representational nature (from various community members), folksonomies provide retrieval systems with docreps that tend to be more user-oriented. With this observation in mind, I am investigating whether folksonomies-based retrieval systems would yield more relevant results than conventional systems.

EPrint Type:Conference Poster
Subjects:Indexing
ID Code:1663
Deposited On:19 November 2006
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.

Lancaster, F. W. (1998). Indexing and Abstracting in Theory and Practice (2nd ed.). Champaign, IL: GSLIS, University of Illinois at Urbana-Champaign.

Paijmans, H. (1993). Comparing the document representations of two IR-systems: CLARIT and TOPIC. Journal of the American Society for Information Science, 44(7), 383-392.

White, H. D., & Griffith, B. C. (1987). Quality of indexing in online data bases. Information Processing & Management, 23(3), 211-224.

EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate