Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

Comparing noun phrasing techniques for use with medical digital library tools

Tolle, Kristin M. and Chen, Hsinchun (2000) Comparing noun phrasing techniques for use with medical digital library tools. Journal of the American Society for Information Science 51(4):pp. 352-370.

Full text available as:
HTML

Abstract

In an effort to assist medical researchers and professionals in accessing information necessary for their work, the A1 Lab at the University of Arizona is investigating the use of a natural language processing (NLP) technique called noun phrasing. The goal of this research is to determine whether noun phrasing could be a viable technique to include in medical information retrieval applications. Four noun phrase generation tools were evaluated as to their ability to isolate noun phrases from medical journal abstracts. Tests were conducted using the National Cancer Institute's CANCERLIT database. The NLP tools evaluated were Massachusetts Institute of Technology's (MIT's) Chopper, The University of Arizona's Automatic Indexer, Lingsoft's NPtool, and The University of Arizona's AZ Noun Phraser. In addition, the National Library of Medicine's SPECIALIST Lexicon was incorporated into two versions of the AZ Noun Phraser to be evaluated against the other tools as well as a nonaugmented version of the AZ Noun Phraser. Using the metrics relative subject recall and precision, our results show that, with the exception of Chopper, the phrasing tools were fairly comparable in recall and precision. It was also shown that augmenting the AZ Noun Phraser by including the SPECIALIST Lexicon from the National Library of Medicine resulted in improved recall and precision.

EPrint Type:Journal (Paginated)
Keywords:National Science Digital Library, NSDL, Artificial Intelligence Lab, AI Lab, natural language processing, CANCERLIT
Subjects:Evaluation
Medical Libraries
Digital Libraries
ID Code:408
Deposited On:13 August 2004
Alternative Locations:http://ai.bpa.arizona.edu/go/papers.html
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.

Adam, N.R., & Yesha, Y. (1996). Strategic directions in electronic commerce and digital libraries: Towards a digital agora. ACM Computing Surveys , 28, 818-835. Links

Anick, P.G., & Vaithyanathan, S. (1997). Exploiting clustering and phrases for context-based information retrieval. Paper presented at the 20th annual international ACM SIGIR conference on research and development, Philadelphia, PA.

Arppe, A. (1995). Term extraction from unrestricted text. Paper presented at the 10th Nordic conference on computational linguistics (NODALIDA-95), Helsinki, Finland.

Bates, M.J. (1986). Subject access in online catalogs: A design model. Journal of the American Society for Information Science , 37, 357-376. Links

Boguraev, B., & Pustejovski, J. (1996). Issues in text-based lexical acquisition. Corpus processing for lexical analysis. Cambridge, MA: MIT Press.

Brill, E. (1993). A corpus-based approach to language learning. Unpublished Ph.D. Dissertation, University of Pennsylvania, Philadelphia.

Brill, E. (1995). Transformation-based error-driven learning and natural language processing. Computational Linguistics , 21, 543-565. Links

Chen, H., Martinez, J., Kirchhoff, A., Ng, T.D., & Schatz, B.R. (1998a). Alleviating search uncertainty through concept associations: Automatic indexing, co-occurrence analysis, and parallel computing. Journal of the American Society for Information Science , 49, 206-216. Links

Chen, H., Schatz, B.R., Ng, D.T., & Yang, M.S. (1999). Breaking the semantic barrier: A concept space experiment on the convex exemplar parallel supercomputers. Submitted to Journal of the American Society for Information Science.

Chen, H., Schatz, B.R., Ng, T.D., Martinez, J.P., Kirchhoff, A.J., & Lin, C. (1996a). A parallel computing approach to creating engineering concept spaces for semantic retrieval: The Illinois digital library initiative project (Grant Submission).

Chen, H., Schuffels, C., & Orwig, R. (1996b). Internet categorization and search: A machine learning approach. Journal of Visual Communications and Image Representation , 7, 88-102. Links

Chen, H., Zhang, Y., & Houston, A.L. (1998b). Semantic indexing and searching using a Hopfield net. Journal of Information Science , 24. Links

Cimino, J.J., Johnson, S.B., Peng, P., & Aguirre, A. (1994). From ICD9-CM to MeSH using the UMLS: A how-to guide. Paper presented at the annual symposium on computer applications in medical care.

Cooper, G.F., & Miller, R.A. (1998). An experiment comparing lexical and statistical methods for extracting MeSH terms from clinical free text. Journal of the American Medical Informatics Association , 5, 62-75. Links

Crouch, C.J. (1990). An approach to the automatic construction of global thesauri. Information Processing and Management , 26, 629-640. Links

Cullingford, R.E. (1986). Natural language processing. Totowa, NJ: Rowman and Littlefield.

Cutting, D., Kupiec, J., Pedersen, J., & Sibun, P. (1992) A practical part of speech tagger. Paper presented at the 3rd conference on applied language processing, Trento, Italy.

Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science , 41, 391-407. Links

Detmer, W.M., & Shortliffe, E.H. (1997). Using the internet to improve knowledge diffusion in medicine. Communications of the ACM , 40, 101-108. Links

Devanbu, P., Brachman, R., Selflidge, P., & Ballard, B. (1991). LaSSIE: A knowledge-based software information system. Communications of the ACM , 34, 34-49. Links

Doszkocs, T.E. (1983). CITE NLM: Natural-language searching in an online catalog. Information Technology and Libraries , 2, 364-380. Links

Dumais, S.T. (1994). Latent semantic indexing (LSI) and TREC-2. Text retrieval conference (TREC-2) (pp. 105-115).

Evans, D.A. (1994). Specifying adverse drug reactions for formulating contexts through CLARIT processing of medical abstracts. Paper presented at the proceedings of RIAO '94, New York, NY.

Fox, E.A., & Marchionini, G. (1998). Toward a worldwide digital library. Communications of the ACM , 41, 29-32. Links

Furnas, G.W., Landauer, T.K., Gomez, L.M., & Dumais, S.T. (1987). The vocabulary problem in human-system communication. Communications of the ACM , 30, 964-971. Links

Gallant, S.I. (1988). Connectionist expert system. Communications of the ACM , 31, 152-169. Links

Girardi, M.R., & Ibrahim, B. (1993, April 30th). An approach to improve the effectiveness of software retrieval. Paper presented at the 3rd annual Irvine software symposium, University of California, Irvine, CA.

Halverson, P. (1995). Document processing: Overview. In R.A. Cole, (Ed.), Survey of the state of the art in human language technology (pp. 255-258). New York, NY: Cambridge University Press.

Harter, S.P. (1996). Variations in relevance assessments and the measurement of retrieval effectiveness. Journal of the American Society for Information Science , 47, 37-49. Links

Hartley, H.O., Rao, J.N.K., & LaMotte, L. (1978). A simple synthesis-based method of variance component estimation. Biometrics , 34, 233-244. Links

Hersh, W.R. (1996). Information retrieval: A health care perspective, 1st ed. New York, NY: Springer-Verlag.

Houston, A.L., Chen, H., Hubbard, S.M., Schatz, B.R., Ng, T.D., Sewell, R.R., & Tolle, K.M. (1999a). Data mining on the Internet: Research on a cancer information system. AI Review . Links

Houston, A.L., Chen, H., Schatz, B.R., Hubbard, S.M., Sewell, R.R., & Ng, T.D. (1999b). Exploring the use of concept space to improve medical information retrieval. International Journal of Decision Support Systems . Links

Johnson, S.B., Aguirre, A., Peng, P., & Cimino, J. (1994). Interpreting natural language queries using the UMLS. Paper presented at the annual symposium on computer applications in medical care.

Karlsson, F., & Karttunen, L. (1995). Sub-sentenial processing. New York, NY: Cambridge University Press.

Lewis, D.D., & Croft, B. (1990). Term clustering of syntactic phrases. Paper presented at the proceedings of the 13th international ACM SIGIR conference on research and development in information retrieval.

Lewis, D.D., & Sparck-Jones, K. (1996). Natural language processing for information retrieval. Communications of the ACM , 39, 92-101. Links

Lynch, C., & Garcia-Molina, H. (1995). Interoperability, scaling and the digital libraries research agenda. Reston, VA: Information Infrastructure Technology and Applications (IITA) Digital Libraries Workshop.

Mauldin, M. (1991). Retrieval performance in Ferret. Paper presented at the proceedings of the 14th ACM SIGIR conference on research and development in information retrieval, Chicago, IL.

Quirk, R. (1985). A comprehensive grammar of the English language. London, UK: Longman.

Ramsey, M., Chen, H., Zhu, B., & Schatz, B. (1999). A collection of visual thesauri for browsing large collections of geographic images. Journal of the American Society for Information Science (Perspectives Issue on Visual Information Retrieval Interfaces) . Links

Salton, G. (1986). Another look at automatic text-retrieval systems. Communications of the ACM , 29, 648-656. Links

Salton, G. (1989). Automatic text processing. Addison-Wesley Publishing Company Inc.

Salton, G., Wong, A., & Yang, C.S. (1975). A vector space model for automatic indexing. Communications of the ACM , 18, 613-620. Links

Srinivasan, P. (1996). Query expansion and MEDLINE. Information Processing and Management , 32, 431-443. Links

Tolle, K.M. (1997). Improving concept extraction from text using noun phrasing tools: An experiment in medical information retrieval. Unpublished Masters Thesis, University of Arizona, Tucson.

UMLS. (1998). UMLS knowledge sources, 9th ed., U.S. Dept. of Health and Human Services.

Voutilainen, A. (1997). A short introduction to NPtool. Available at: http://www.lingsoft.fi/doc/nptool/intro/.

Zaenen, A., & Uszkoreit, H. (1995). Language analysis and understanding. In R.A. Cole, (Ed.), Survey of the state of the art in human language technology (pp. 109-110), New York, NY: Cambridge University Press.

EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate