Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

Updateable PAT-Tree Approach to Chinese Key Phrase Extraction using Mutual Information: A Linguistic Foundation for Knowledge Management

Ong, Thian-Huat and Chen, Hsinchun (1999) Updateable PAT-Tree Approach to Chinese Key Phrase Extraction using Mutual Information: A Linguistic Foundation for Knowledge Management. In Proceedings Asian Digital Library Conference, pages pp. 63-84, Taipei, Taiwan.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

There has been renewed research interest in using the statistical approach to extraction of key phrases from Chinese documents because existing approaches do not allow online frequency updates after phrases have been extracted. This consequently results in inaccurate, partial extraction. In this paper, we present an updateable PAT-tree approach. In our experiment, we compared our approach with that of Lee-Feng Chien with that showed an improvement in recall from 0.19 to 0.43 and in precision from 0.52 to 0.70. This paper also reviews the requirements for a data structure that facilitates implementation of any statistical approaches to key-phrase extraction, including PATtree, PAT-array and suffix array with semi-infinite strings.

EPrint Type:Conference Paper
Keywords:National Science Digital Library, NSDL, Artificial Intelligence Lab, AI Lab, PAT-tree
Subjects:Knowledge Management
Information Extraction
ID Code:441
Deposited On:17 August 2004
Alternative Locations:http://ai.bpa.arizona.edu/go/papers.html
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.

Baeza-Yates, R., & Gonnet, G. (1996). Fast Text Searching for Regular Expressions or Automaton Searching on Tries, Journal of the ACM, 43 (6), pp. 915-936.

Bian, G-W & Chen, H-H (1998). A New Hybrid Approach for Chinese-English Query Translation. Proceedings of the First Asia Digital Library Workshop, pp. 156-167.

Blair, D. C., and Maron, M. E. (1985). An evaluation of retrieval effectiveness for a fulltext document-retrieval system. Communications of the ACM, 28(3), 289-299.

Brill, E. (1995). Transformation-Based Error-Driven Learning and Natural Language Processing. Computational Linguistics, 21 (4), 543-565.

Caglayan, A., Harrison., C. (1997). Agent Sourcebook, A Complete Guide to Desktop, Internet, and Intranet Agents.

Chen, A. et al. (1997). Chinese Text Retrieval without Using a Dictionary. in Proceedings of the ACM SIGIR 97, pp. 42-49.

Chen, H. (1998). The Illinois Digital Library Initiative Project: Federating Repositories and Semantic Research. Proceedings of the First Asia Digital Library Workshop, pp. 13-23.

Chen, H., Chung, Y., Ramsey, M., & Yang, C. (1998). “A Smart Itsy Bitsy Spider for the Web,” Journal of the American Society for Information Science, 49 (7), Pages 604-618.

Chen, H., Houston, A. L., Sewell, R. R. & Schatz, B. R. (1998), “Internet Browsing and Searching: User Evaluation of Category Map and Concept Space Techniques,” Journal of the American Society for Information Science, 49 (7), pp. 582-603.

Chen, H., J. Martinez, D. T. Ng, and B. R. Schatz. (1997). A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System. Journal of the American Society for Information Science 48 (1), pp. 17-31.

Chen, H., & Ng, D. T. (1995). An algorithmic approach to concept exploration in large knowledge network (automatic thesaurus consultation): Symbolic branch-andbound vs. connectionist Hopfield net activation. Journal of the American Society of Information Science, 46(5), pp. 348-369.

Chen, Y. M., Liao, C. C., & Prasad, B. (1998). A systematic approach to virtual enterprising through knowledge management techniques. Concurrent Engineering-Research and Applications, 6(3), 225-244.

Chien, L-F and Pu, H-T (1996). Important issues on Chinese information retrieval. Computational Linguistics and Chinese Language Processing, 1 (1), pp. 205-221.

Chien, L-F (1997). PAT-Tree-Based Keyword Extraction for Chinese Information Retrieval. Proceedings of the 1997 ACM SIGIR, Philadelphia, PA, USA, pp. 50-58.

Chien, L-F (1998). PAT-Tree-Based Adaptive Keyphrase Extraction for Intelligent Chinese Information Retrieval. in special issue on Information Retreival with Asian Languages, Information Processing and Management, Elsevier Press.

Church, K. (1988). A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. Proceedings of the Second Annual Conference on Applied Natural Language Parsing ACL, Austin, TX.

Church, K. (1997). Ngrams. Proceedings of the ACL-95, Cambridge, MA, USA.

Davenport, T. H. (1995). Business process reengineering: Where it’s been, where it’s going. In V. Grover & W. Kettinger (Eds.), Business Process Change: Reengineering Concepts, Methods and Technologies (pp. 1-13). Middletown, PA: Idea Publishing.

Davenport, T. H., & Prusak, L. (1998). Working Knowledge: How Organizations Manage What They Know. Boston, MA: Harvard Business School Press.

Gartner Group, Summer Knowledge Management Workshop Report, Summer, 1998.

Gonnet, G. H. & Baeza-Yates, R. (1991). Handbook of Algorithms and Data Structures in Pascal and C, 2nd Ed.

Inkpen, A. C., & Dinur, A. (1998). Knowledge management processes and international joint ventures. Organization Science, 9(4), 454-468.

Jones, P., & Jordan, J. (1998). Knowledge orientations and team effectiveness. International Journal of Technology Management, 16 (1-3), 152-161.

Kwok K. L. (1997). Comparing Representations in Chinese Information Retrieval. in Proceedings of ACM SIGIR’97, pp. 34-41

Knuth, D. E. (1973). The Art of Computer Programming: Sorting and searching, Vol. 3. Addison-Wesley, Mass.

Lesk, M. (1997). Practical Digital Libraries, Morgan Kauffmann, Los Altos, CA.

Li, Z. & Xing, L. (1998). Search the Chinese Web — Design and the Operation of Net-Compass. Proceedings of the First Asia Digital Library Workshop, pp. 42-46.

Lin, C. & Chen, H. (1996). An Automatic Indexing and Neural Network Approach to Concept Retrieval and Classification of Multilingual (Chinese-English)

Documents. IEEE Transactions on Systems, Man, and Cybernetics, 26 (1), pp. 1-14.

Manber, U., & Myers, G. (1993). Suffix arrays: a new method for on-line string searches. SIAM-Journal-on-Computing, 22 (5), pp. 935-48.

Morrison, D. R. (1968). PATRICIA — Practical Algorithm to Retrieve Information Coded in Alphanumeric. Journal of the Association for Computing Machinery, 15(4), pp. 514-534.

Nonaka, I. (1994). A dynamic thery of organizational knowledge creation. Organization Science, 5 (1), 14-37.

O'Leary, D. E. (1998). Enterprise knowledge management. IEEE Computer, 31 (3), 54-62.

Orwig, R., Chen, H., and Nunamaker, J. F. (1997). A graphical, self-organizing approach to classifying electronic meeting output. Journal of the American Society for Information Science, 48 (2), pp. 157-170.

Paepcke, A., S. B. Cousins, H. Garcia-Molino, S. W. Hasson, S. P. Ketcxhpel, M. Roscheisen, and T. Winograd(1996). Using distributed objects for digital library interoperability, IEEE COMPUTER, 29(5), pp. 61-69.

Rouse, W. B., Thomas, B. S., & Boff, K. R. (1998). Knowledge maps for knowledge mining: Application to R&D/technology manangement. IEEE Transaction on Systems, Man and Cybernetics: Part C- Applications and Reviews, 28(3), pp. 309-317.

Salton, G. (1989). Automatic Text Processing. Reading, Addison-Wesley,(City?) MA.

Schatz, B. & Chen, H. (1996). Building Large-Scale Digital Libraries, IEEE Computers, Special Issue on “Building Large-Scale Digital Libraries,” 29 (5), pp. 22-27, May 1996.

Schatz, B. R. & Chen, H. (1999). Digital libraries: technological advancements and social impacts. IEEE Computer, 31(2), 45-50.

Sedgewick, R. (1998). Algorithms, 3rd Ed. Addison-Wesley.

Su, K-Y, Chaing, T-H, & Chang, J-S (1996). An Overview of Corpus-Based Statistics-Oriented (CBSO) Techniques for Natual Language Processing. Computational Linguistics and Chinese Language Processing, 1 (1), pp. 101-157.

Teece, D. J. (1998). Research directions for knowledge management. California Management Review, 40(3), 289-292.

Voutilainen, A. (1997) A Short Introduction to NPTool, www.lingsoft.fi/doc/nptool/intro/

Wong, K-F & Li, W. (1998). Intelligent Chinese Information Retrieval — Why is it so Difficult? Proceedings of the First Asia Digital Library Workshop, pp. 47-56.

Wu Z. & Tseng G. (1993). “Chinese Text Segmentation for Text Retrieval: Achievements and Problems,” Journal of the American Society for Information Sciences, 44, pp. 532-542.

Wu Z. & Tseng G. (1995). ACTS: An Automatic Chinese Text Segmentation System for Full Text Retrieval. Journal of the American Society for Information Sciences, 46, pp. 83- 96.

Yang, C. C., Yen, J., Yung, S. K., & Chung, K. L. (1998). Chinese Indexing using Mutual Information. Proceedings of the First Asia Digital Library Workshop, pp. 57-64.

EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate