Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

GANNET: A machine learning approach to document retrieval

Chen, Hsinchun and Kim, Jinwoo (1994) GANNET: A machine learning approach to document retrieval . Journal of Management Information Systems 11(3):pp. 9-43.

Full text available as:
HTML

Abstract

Information science researchers have recently turned to new artificial intelligence-based inductive learning techniques including neural networks, symbolic learning and genetic algorithms. An overview of the new techniques and their usage in information science research is provided. The algorithms adopted for a hybrid genetic algorithms and neural nets based system, called GANNET, are presented. GANNET performed concept (keyword) optimization for user-selected documents during information retrieval using the genetic algorithms. It then used the optimized concepts to perform concept exploration in a large network of related concepts through the Hopfield net parallel relaxation procedure. Based on a test collection of about 3,000 articles from DIALOG and an automatically created thesaurus, and using Jaccard's score as a performance measure, the experiment showed that GANNET improved the Jaccard's scores by about 50% and helped identify the underlying concepts that best describe the user-selected documents.

EPrint Type:Journal Article (Paginated)
Keywords:National Science Digital Library, NSDL, Artificial Intelligence Lab, AI Lab, GANNET
Subjects:Database Searching Instructions
Information Extraction
ID Code:462
Deposited On:04 September 2004
Alternative Locations:http://ai.bpa.arizona.edu/go/papers.html
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.

1. Appelt, D. The role of user modelling in language generation and communication planning. In User Modelling Panel, Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, August 1985, pp. 1298-1302.

2. Belew, R.K. Adaptive information retrieval. In Proceedings of the Twelfth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Cambridge, MA, June 25-28, 1989, pp. 11-20.

3. Blair, D.C., and Maron, M.E. An evaluation of retrieval effectiveness for a full-text document-retrieval system. Communications of the ACM, 28, 3 (1985), 289-299.

4. Blosseville, M.J.; Hebrail, G.; Monteil, M.G.; and Penot, N. Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together. In Proceedings of the Fifteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, June 21-24, 1992, pp. 51-57.

5. Booker, L.B.; Goldberg, D.E.; and Holland, .H. Classifier systems and genetic algorithms. In J.G. Carbonell (ed.), Machine Learning, Paradigms and Methods. Cambridge: MIT Press, 1990, pp. 235-282.

6. Bookstein, A., and Swanson, D.R. Probabilistic models for automatic indexing. Journal of the American Society for Information Science, 26, 1 (January-February 1975), 45-50.

7. Carbonell, J.G.; Michalski, R.S.; and Mitchell, T.M. An overview of machine learning. In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell (eds.), Machine Learning, an Artificial Intelligence Approach, Palo Alto, CA: Tioga Publishing, 1983, pp. 3-23.

8. Chen, H., and Dhar, V. Reducing indeterminism in consultation: a cognitive model of user/librarian interaction. In Proceedings of the 6th National Conference on Artificial Intelligence (AAAI-87), Seattle, July 13-17, 1987, pp. 285-289.

9. Chen, H., and Dhar, V. User misconceptions of online information retrieval systems. International Journal of Man-Machine Studies, 32, 6 (June 1990), 673-692.

10. Chen, H. and Dhar, V. Cognitive process as a basis for intelligent retrieval systems design. Information Processing and Management, 27, 5 (1991), 405-432.

11. Chen, H.; Hsu, P.; Orwig, R.; Hoopes, L.; and Nunamaker, J.F. Automatic concept classification of text from electronic meetings. Communications of the ACM, 37, 10 (October 1994).

12. Chen, H., and Lynch, K.J. Automatic construction of networks of concepts characterizing document databases. IEEE Transactions on Systems, Man and Cybernetics, 22, 5 (September-October 1992), 885-902.

13. Chen, H.; Lynch, K.J.; Basu, K.; and Ng, T. Generating, integrating, and activating thesauri for concept-based document retrieval. IEEE EXPERT, Special Series on Artificial Intelligence in Text-Based Information Systems, 8, 2 (April 1993), 25-34.

14. Chen, H., and Mahboob, G. Example-based document retrieval: an inductive machine learning approach. Center for Management of Information, College of Business and Public Administration, University of Arizona, Working Paper, CMI-WPS, 1992.

15. Chen, H., and Ng, T. An algorithmic approach to concept exploration in a large knowledge network (automatic thesaurus consultation): symbolic branch-and-bound vs. connectionist Hopfield net activation. Journal of the American Society for Information Science (1994), forthcoming.

16. Chen, H., and She, L. Inductive query by examples (IQBE): a machine learning approach. In Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), Information Sharing and knowledge Discovery Track, Maui, January 4-7, 1994.

17. Chiaramella, Y., and Defude, B. A prototype of an intelligent system for information retrieval: IOTA. Information Processing and Management, 23, 4 (1987), 285-303.

18. Cohen, P.R., and Kjeldsen, R. Information retrieval by constrained spreading activation in semantic networks. Information Processing and Management 23, 4 (1987), 255-268.

19. Croft, W.B., and Thompson, R.H. I sup 3 R: a new approach to the design of document retrieval systems. Journal of the American Society for Information Science, 38, 6 (1987), 399-404.

20. Dalton, J., and Deshmane, J. Artificial neural networks. IEEE Potentials, 10, 2 (April 1991), 33-36.

21. Daniels, P.T. The user modelling function of an intelligent interface for document retrieval systems. In B.C. Brookes (ed.), Intelligent Information Systems for the Information Society. Amsterdam: Elsevier Science Publishers B.V., North-Holland, 1986.

22. Derthick, M. Mundane Reasoning by Parallel Constraint Satisfaction, Ph.D. dissertation, Carnegie Mellon University, 1988.

23. Doszkocs, T.E.; Reggia, J.: and Lin, K. Connectionist models and information retrieval. Annual Review of Information Science and Technology (ARTIST), 25 (1990), 209-260.

24. Fisher, D.H., and McKusick, K.B. An empirical comparison of ID3 and back-propagation. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (IJCAI-89), Detroit, August 20-25, 1989, pp. 788-793.

25. Fox, E.A. Development of the CODER system: A testbed for artificial intelligence methods in information retrieval. Information Processing and Management, 23, 4 (1987), 341-366.

26. Frawley, W.J.; Pietetsky-Shapiro, G.; and Matheus, C.J. Knowledge discovery in databases: an overview. In G. Piatetsky-Shapiro and W.J. Frawley (eds.), Knowledge Discovery in Databases. Cambridge, MA: MIT Press, 1991, pp. 1-30.

27. Freund, J.E. Mathematical Statistics. Englewood Cliffs, NJ: Prentice-Hall, 1971.

28. Frieder, O., and Siegelmann, H.T. On the allocation of documents in multiprocessor information retrieval systems. In Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Chicago, October 13-16, 1991, pp. 230-239.

29. Gallant, S.I. Connectionist expert system. Communications of the ACM, 31, 2 (1988), 152-169.

30. Goldberg, D.E. Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA: Addison-Wesley, 1989.

31. Goldberg, D.E. Messy genetic algorithm: motivation, analysis, and first results. Complex Systems, 3 (1989), 493-530.

32. Goldberg, D.E. Messy genetic algorithm revisited: studies in mixed size and scale. Complex Systems, 4 (1990), 415-444.

33. Gordon, M. Probabilistic and genetic algorithms for document retrieval. Communications of the ACM, 31, 10 (October 1988), 1208-1218.

34. Gordon, M. User-based document clustering by redescribing subject descriptions with a genetic algorithm. Journal of the American Society for Information Science, 42, 5 (June 1991), 311-322.

35. Greene, D.P., and Smith, S.F. COGIN: symbolic induction with genetic algorithms. In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92), San Jose, CA, July 12-16, 1992, pp. 111-116.

36. Grefenstette, J.J. Incorporating problem specific knowledge into genetic algorithms. In L. Davis (ed.), Genetic Algorithms and Simulated Annealing. San Mateo, CA: Morgan Kauffmann, 1987, pp. 42-60.

37. Hall, L.O., and Romaniuk, S.G. A hybrid connectionist, symbolic learning system. In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90), Boston, July 29-August 3, 1990, pp. 783-788.

38. Harp. S.; Samad. T.; and Guha, A. Towards the genetic synthesis of neural networks. In Proceedings of the Third International Conference on Genetic Algorithms. San Mateo, CA: Morgan Kaufmann, 1989.

39. Hopfield, J.J. Neural network and physical systems with collective computational abilities. Proceedings of the National Academy of Science, USA, 78, 8 (1982), 2554-2558.

40. Humphreys, B.L., and Lindberg, D.A. Building the unified medical language system. In Proceedings of the Thirteenth Annual Symposium on Computer Applications in Medical Care. Washington, DC: IEEE Computer Society Press, November 5-8, 1989, pp. 475-480.

41. Kitano, H. Empirical studies on the speed of convergence of neural network training using genetic algorithms. In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90), Boston, July 29-August 3, 1990, pp. 789-795.

42. Knight, K. Connectionist ideas and algorithms. Communications of the ACM, 33, 11 (November 1990), 59-74.

43. Koza, J.R. Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press, 1992.

44. Kwok, K.L. A neural network for probabilistic information retrieval. In Proceedings of the Twelfth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Cambridge, MA, June 25-28, 1989, pp. 21-30.

45. Lin, X.; Soergel, D.; and Marchionini, G. A self-organizing semantic map for information retrieval. In Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Chicago, October 13-16, 1991, pp. 262-269.

46. Lindberg, D.A., and Humphreys, B.L. The UMLS knowledge sources: tTools for building better user interface. In Proceedings of the Fourteenth Annual Symposium on Computer Applications in Medical Care. Los Alamitos, CA: Institute of Electrical and Electronics Engineers, November 4-7, 1990, pp. 121-125.

47. Lippmann, R.P. An introduction to computing with neural networks. IEEE Acoustics Speech and Signal Processing Magazine, 4, 2 (April 1987), 4-22.

48. MacLeod, K.J., and Robertson, W. A neural algorithm for document clustering. Information Processing and Management, 27, 4 (1991), 337-346.

49. Maron, M.E., and Kuhns, J.L. On relevance, probabilistic indexing and information retrieval. Journal of the ACM, 7, 3 (July 1960), 216-243.

50. Martin, B.K., and Rada, R. Building a relational data base for a physician document index. Medical Information, 12, 3 (July-September 1987), 187-201.

51. McCray, A.T., and Hole, W.T. The scope and structure of the first version of the UMLS semantic network. In Proceedings of the Fourteenth Annual Symposium on Computer Applications in Medical Care. Los Alamitos, CA: Institute of Electrical and Electronics Engineers, November 4-7, 1990, pp. 126-130.

52. Michalewicz, Z. Generic Algorithms + Data Structures = Evolution Programs. Berlin: Springer-Verlag, 1992.

53. Michalski, R.S. A theory and methodology of inductive learning. In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell (eds.), Machine Learning, An Artificial Intelligence Approach. Palo Alto, CA: Tioga Publishing, 1983, pp. 83-134.

54. Monarch, I., and Carbonell, J.G. CoalSORT: a knowledge-based interface. IEEE Expert (Spring 1987), 39-53.

55. Montana, D.J., and Davis, L. Training feedforward neural networks using genetic algorithms. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (IJCAI-89), Detroit, August 20-25, 1989, pp. 762-767.

56. Montgomery, D.D. Design and Analysis of Experiments. New York: John Wiley, 1976.

57. Mooney, R.; Shavlik, J.; Towell, G.; and Gove, A. An experimental comparison of symbolic and connectionist learning algorithms. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (IJCAI-89), Detroit, August 20-25, 1989, pp. 775-780.

58. Parsaye, K.; Chignell, M.; Khoshafian, S.; and Wong, W. Intelligent Databases. New York: John Wiley, 1989.

59. Petry, F.; Buckles, B.: Prabhu, D.; and Kraft, D. Fuzzy information retrieval using genetic algorithms and relevance feedback. In Proceedings of the ASIS Annual Meeting, 1993, pp. 122-125.

60. Piatetsky-Shapiro, G. Workshop on knowledge discovery in real databases. In International Joint Conference of Artificial Intelligence (1989).

61. Pollitt, S. Cansearch: an expert systems approach to document retrieval. Information Processing and Management, 23, 2 (1987), 119-138.

62. Quinlan, J.R. Discovering rules by induction from large collections of examples. In D. Michie (ed.), Expert Systems in the Micro-electronic Age. Edinburgh: Edinburgh University Press, 1979, pp. 168-201.

63. Quinlan, J.R. Learning efficient classification procedures and their application to chess end games. In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell (eds.), Machine Learning, An Artificial Intelligence Approach. Palo Alto, CA: Tioga Publishing, 1983, pp. 463-482.

64. Quinlan, J.R. Induction of decision trees. Machine Learning, 1 (1986), 81-164.

65. Rada, R.; Mili, M.; Bicknell, E.; and Blettner, E. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19, 1 (January-February 1989), 17-30.

66. Raghavan, V.V., and Agarwal, B. Optimal determination of user-oriented clusters: an application for the reproductive plan. In Proceedings of the Second International Conference on Genetic Algorithm and Their Applications, Cambridge, MA, July 1987, pp. 241-246.

67. Rich, E. Building and exploiting user models. In International Joint Conference of Artificial Intelligence, Tokyo, August 1979, pp. 720-722.

68. Rich, E. User modeling via stereotypes. Cognitive Science, 3 (1979), 329-354.

69. Rich, E. Users are individuals: individualizing user models. International Journal of Man-Machine Studies, 18, 3 (March 1983), 199-214.

70. Rose, D.E., and Belew, R.K. A connectionist and symbolic hybrid for improving legal research. International Journal of Man-Machine Studies, 3, 1 (1991), 1-33.

71. Rumelhart, D.E.; Hinton, G.E.; and McClelland, J.L. A general framework for parallel distributed processing. In D.E. Rumelhart, J.L. McClelland, and the PDP Research Group (eds.), Parallel Distributed Processing, Cambridge, MA: MIT Press, 1986, pp. 45-76.

72. Rumelhart, D.E.; Hinton, G.E.; and Williams, R.J. Learning internal representations by error propagation. In D.E. Rumelhart, J.L. McClelland, and the PDP Research Group (eds.), Parallel Distributed Processing, Cambridge, MA: MIT Press, 1986, pp. 318-362.

73. Salton, G. Automatic Text Processing. Reading, MA: Addison-Wesley, 1989.

74. Schaffer, I.; Caruana, R.; Eshelman, L.; and Das, R. A study of control parameters affecting online performance of genetic algorithms for function optimization. In Proceedings of the Third International Conference on Genetic Algorithms. San Mateo, CA: Morgan Kaufmann, 1989, pp. 51-60.

75. Shastri, L. Why semantic networks? In J.F. Sowa (ed.), Principles of Semantic Networks: Explorations in the Representation of Knowledge. San Mateo, CA: Morgan Kauffmann, 1991, pp. 109-136.

76. Simon, H. Artificial intelligence: where has it been, and where is it going? IEEE Transaction on Knowledge and Data Engineering, 3, 2 (June 1991), 128-136.

77. Sleeman, D. UMFE: a user modeling front-end subsystem. International Journal of Man-Machine Studies, 23 (1985), 63-77.

78. Swartout, W. Explanation and the role of the user model: how much will it help? In User Modelling Panel, Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, August 1985, pp. 1298-1302.

79. Tank, D.W., and Hopfield, J.J. Collective computation in neuronlike circuits. Scientific American, 257, 6 (December 1987), 104-114.

80. Touretzky, D., and Hinton, G.E. A distributed connectionist production system. Cognitive Science, 12, 3 (1988), 423-466.

81. VanRijsbergen, C.I. Information Retrieval, 2d ed. London: Butterworths, 1979.

82. Weiss, S.M., and Kapouleas, I. An empirical comparison of pattern recognition, neural nets, and machine learning classification methods. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (IJCAI-89), Detroit, August 20-25, 1989, pp. 781-787.

83. Weiss, S.M., and Kulikowski, C.A. Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Networks, Machine Learning, and Expert Systems. San Mateo, CA: Morgan Kaufmann, 1991.

84. Wilkinson, R., and Hingston, P. Using the cosine measure in neural network for document retrieval. In Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Chicago, October 13-16, 1991, pp. 202-210.

85. Wilkinson, R.; Hingston, P.; and Osborn, T. Incorporating the vector space model in a neural network used for document retrieval. Library Hi Tech, 10, 12 (1992), 69-75.

86. Yang, J., and Korfhage, R.R. Effects of query term weights modification in document retrieval: a study based on a genetic algorithm. In Proceedings of the Second Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, April 26-28, 1993. pp. 271-285.

87. Yang, J.; Korfhage, R.R.; and Rasmussen, E. Query improvement in information retrieval using genetic algorithms: a report on the experiments of the TREC project. In Text Retrieval Conference (TREC-1), Gaithersburg, MD, November 4-6, 1993, pp. 31-58.

88. Zissos, A.Y., and Witten, I.H. User modeling for a computer coach: a case study. International Journal of Man-Machine Studies, 23 (1985), 729-750.

EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate