Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

An exploratory study of human clustering of Web pages

Khoo, Christopher S.G. and Ng, Karen and Ou, Shiyan (2002) An exploratory study of human clustering of Web pages. In Lopez-Huertas, Maria J., Eds. Proceedings International Society for Knowledge Organization Conference, pages pp. 351-357, Granada, Spain.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

This study seeks to find out how human beings cluster Web pages naturally. 20 Web pages retrieved by the Northern Light search engine for each of 10 queries were sorted by 3 subjects into categories that were natural or meaningful to them. It was found that different subjects clustered the same set of Web pages quite differently and created different categories. The average inter-subject similarity of the clusters created was a low 0.27. Subjects created an average of 5.4 clusters for each sorting. The categories constructed can be divided into 10 types. About 1/3 of the categories created were topical. Another 20% of the categories relate to the degree of relevance or usefulness. The rest of the categories were subject-independent categories such as format, purpose, authoritativeness and direction to other sources. The authors plan to develop automatic methods for categorizing Web pages using the common categories created by the subjects. It is hoped that the techniques developed can be used by Web search engines to automatically organize Web pages retrieved into categories that are natural to users.

EPrint Type:Conference Paper
Keywords:human categorization, Web pages
Subjects:Cognitive Science
Knowledge Organization
User Studies
ID Code:1324
Deposited On:14 August 2006
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.

Jansen, B.J., Spink, A., Bateman, J., & Saracevic, T. (1998). Real life information retrieval: A study of user queries on the Web. SIGIR Forum, 32(1), 5-17.

Macskassy, S.A., Banerjee, A., Davison, B.D., & Hirsh, H. (1998). Human performance on clustering Web pages: A preliminary study. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (pp. 264-268). Menlo Park, CA: AAAI Press.

Medin, D.L., Wattenmaker, W.D., & Hampson, S.E. (1987). Family resemblance, conceptual cohesiveness, and category construction. Cognitive Psychology, 19, 242-279.

EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate