Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

Document clustering for electronic meetings: an experimental comparison of two techniques

Roussinov, Dmitri G. and Chen, Hsinchun (1999) Document clustering for electronic meetings: an experimental comparison of two techniques. Decision Support Systems 27(1-2):pp. 67-80.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

In this article, we report our implementation and comparison of two text clustering techniques. One is based on Ward’s clustering and the other on Kohonen’s Self-organizing Maps. We have evaluated how closely clusters produced by a computer resemble those created by human experts. We have also measured the time that it takes for an expert to ‘‘clean up’’ the automatically produced clusters. The technique based on Ward’s clustering was found to be more precise. Both techniques have worked equally well in detecting associations between text documents. We used text messages obtained from group brainstorming meetings.

EPrint Type:Journal Article (Paginated)
Keywords:National Science Digital Library, NSDL, Artificial Intelligence Lab, AI Lab, Group decision support systems; Text document clustering; Empirical study; Self-organizing maps; Neural networks; Cluster analysis
Subjects:World Wide Web
Classification
ID Code:454
Deposited On:04 September 2004
Alternative Locations:http://ai.bpa.arizona.edu/go/papers.html
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.

S.K. Card, G.G. Robertson, W. York, The WebBook and the Web Forager: An information workspace for the World-Wide Web, Proceedings of the ACMrSIGCHI Conference on Human Factors in Computing Systems, Vancouver, 1996, pp. 111–119.

H. Chen, Artificial intelligence techniques for emerging information systems applications: trailblazing path to semantic interoperability, Journal of the American Society for Information Systems 49 7 1998 579–581.

H. Chen, C. Schuffels, R. Orwig, Internet categorization and search: a self-organizing approach, Journal of Visual Communication and Image Representation 7 1 1996 88–102.

D.R. Cutting, D.R. Karger, J.O. Pedersen, J.W. Tukey, Scatterrgather: A cluster-based approach to browsing large document collections, Proceedings of the 15th Annual International ACM Conference on Research and Development in Information Retrieval, 1992, pp. 318–329.

S. Dumais, J. Platt, M. Sahami, D. Heckerman, Inductive Learning Algorithms and Representations for Text Categorization, 7th International Conference on Information and Knowledge Management, Bethesda, MD, 1998.

A. El-Hamdouchi, P. Willett, Hierarchical document clustering using Ward’s method, Proceedings of the 9th International Conference on Research and Development in Information Retrieval, Washington, DC, 1986, pp. 149–156.

B.S. Everitt, Cluster Analysis, Wiley, New York, 1974.

M.A. Hearst, J.O. Pedersen, Reexamining the cluster hypothesis: scatterrgather on retrieval results, Proceedings of the 19th Annual International ACM Conference on Research and Development in Information Retrieval, Zurich, 1996, pp. 76–84.

M.A. Hearst, Interfaces for searching the Web, Scientific American, March 1997 pp. 68–72.

T. Honkela, S. Kaski, K. Lagus, T. Kohonen, Newsgroup exploration with WEBSOM method and browsing interface, Report A32, Helsinki University of Technology, 1996.

A.K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Englewood Cliffs, NJ, 1988.

N. Jardine, C.J. van Rijsbergen, The use of hierarchic clustering in information retrieval, Information Storage and Retrieval 7 1971 217–240.

T. Kohonen, Self-Organization and Associative Memory, Springer, 1989.

T. Kohonen, Self-Organizing Maps, Springer, 1995.

X. Lin, D. Soergel, G. Marchionini, A self-organizing semantic map for information retrieval, Proceedings of the 14th Annual International ACMrSIGIR Conference on Research and Development in Information Retrieval, Chicago, IL, 1991, pp. 262–269.

F. Murtagh, Multidimensional Clustering Algorithm, Physica-Verlag, Vienna, 1995.

J.F. Nunamaker, A.R. Dennis, J.S. Valacich, D.R. Vogel, J.F. George, Electronic meeting systems to support group work: theory and practice at Arizona, Communications of the ACM 34 7 1991 40–61.

R.E. Orwig, H. Chen, J.F. Nunamaker, A graphical, selforganizing approach to classifying electronic meeting output, Journal of the American Society for Information Science 48 2 1997 157–170.

E. Rasmussen, Clustering algorithms, in: W.B. Frakes, R Baeza-Yates Eds. , Information Retrieval, Data Structures and Algorithms, Prentice-Hall, Englewood Cliffs, NJ, 1992, pp. 419–442.

H. Ritter, T. Kohonen, Self-organizing semantic maps, Biological Cybernetics 61 1989 241–254.

D. Roussinov, H. Chen, A scalable self-organizing map algorithm for textual classification: a neural network approach to thesaurus generation, Communication and Cognition — Artificial Intelligence 15 1r2 1998 81–112.

M. Sahami, S. Yusufali, Q.W. Baldonado, SONIA: A service for organizing networked information autonomously, Proceeding of the 3rd ACM International Conference on Digital Libraries, Pittsburgh, PA, 1998, pp. 237–246.

G. Salton, M.J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, New York, 1983.

B.R. Schatz, H. Chen, Building large-scale digital libraries, IEEE Computer 29 5 1996 22–27.

M.Q. Wang Baldonado, T. Winograd, SenseMaker: An information–exploration interface supporting the contextual evolution of a user’s interests, Proceedings of the ACMrSIGCHI Conference on Human Factors in Computing Systems, Atlanta, GA, 1997, pp. 11–18.

J. Ward, Hierarchical grouping to optimize an objection function, Journal of the American Statistical Association 58 1963 236–244.

O. Willet, Recent trends in hierarchical document clustering: a critical review, Information Processing and Management 24 1988 577–597.

Y. Yang, C.G. Chute, An example-based mapping method for text categorization and retrieval, ACM Transaction on Information Systems 12 3 1994 253–277.

O. Zamir, O. Etzioni, O. Madani, R.M. Karp, Fast and intuitive clustering of Web documents, Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 1997, pp. 287–290.

EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate