Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

Final Report for the AMeGA (Automatic Metadata Generation Applications) Project

Greenberg, Jane and Spurgin, Kristina and Crystal, Abe (2005) Final Report for the AMeGA (Automatic Metadata Generation Applications) Project. Technical Report.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

Summary of findings (from Executive Summary of report for Goal 1 (complete), Goal 2, (Partial), Goal 3 (see actual document)): Research in the area of automatic metadata generation falls, primarily, into two areas: Experimental research, focusing on information retrieval techniques and digital resource content, and applications research, focusing on the development of content creation software and metadata generation tools used in the operational setting. The main finding, presented in this report, is that there is a disconnect between experimental research and application development. It seems that metadata generation applications could be vastly improved by integrating experimental research findings. Metadata generation applications might also improve metadata output if they took advantage of metadata generation functionalities supported by content creation software. For example, Microsoft Word supports the metadata generation of a number of elements that conceptually map to the Dublin Core metadata standard. Some of these elements are generated automatically, while others need to be input by a document author or another person. Content creation software provides a means for generating metadata, which can be harvested by metadata generation applications. More research is needed to understand how metadata creation features in content creation software are used in practice. ... Two-hundred and seventeen (217) survey participants provided responses useful for data analysis (the initial goal was to recruit at least 100 participants). Three quarters of participants had three or more years of cataloging and/or indexing experience, verifying their status as metadata experts. Organizations are using a variety of different metadata standards (selected examples include: MAchine Readable Cataloging (MARC)—bibliographic format, Dublin Core, Encoded Archival Description, Gateway to Educational Materials, Metadata Object Description Schema, Text Encoding Initiative, and the Government Information Locator Service). Most participants (81%) reported using one or two systems for metadata creation in their organization, whereas one participant reported the use of seven different systems.

EPrint Type:Technical Report
Subjects:Metadata
ID Code:878
Deposited On:23 June 2005
Alternative Locations:http://www.loc.gov/catdir/bibcontrol/lc_amega_final_report.pdf
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.

AMeGA Project Website: http://ils.unc.edu/mrc/amega.htm.

Anderson, J. D., & Perez-Carball, J. (2001). The nature of indexing: How humans and machines analyze messages and texts for retrieval - Part I: Research, and the nature of human indexing. Information Processing & Management 37(2), 231–254.

Bruce, T. R., & Hillmann, D. I. (2004). The continuum of metadata quality: Defining, expressing, exploiting. In D. I. Hillmann & E. L. Westbrooks (Eds.), Metadata in practice . Chicago, IL: ALA.

Caplan, P. (2001). International metadata initiatives: Lessons in bibliographic control. In Proceedings of the Bicentennial Conference on Bibliographic Control for the New Millennium: Confronting the Challenges of Networked Resources and the Web, Library of Congress,

Washington, D.C., November 15-17, 2000. Retrieved January 5, 2005, from http://www.loc.gov/catdir/bibcontrol/caplan_paper.html.

CONDOC. (1981). Revisiting CONDOC: A new look at the online catalog sponsored by the Ala Catalog Use Committee. Available at: <listserv@listserv.buffalo.edu>. FTP Request:

"CONDOC Report.

Crystal, A., & Greenberg, J. (in press). Usability of a metadata creation application for resource authors. Library and Information Science Research, 27(2).

Cutter, C. A. (1904). Rules for a dictionary catalog (4th ed.). Washington, D.C.: Government Printing Office.

Dakshinamurti, G. (1985). Automation's effect on library personnel. Canadian Library Journal, 42, 343-351.

DCMI Usage Board. (2004). DCMI metadata terms. Retrieved January 5, 2005, from http://dublincore.org/documents/2004/09/20/dcmi-terms/ .

International Federation of Library Associations and Institutions. (1998). Functional

requirements for bibliographic records: Final report. Retrieved January 5, 2005, from

http://www.ifla.org/VII/s13/frbr/frbr.pdf .

Greenberg, J. (2003). Metadata and the World Wide Web. In M.S. Drake (Ed.) Encyclopedia of library and information science (2nd ed.) (pp.1876-1888). New York: Marcel Dekker, Inc.

Greenberg, J. (2004a). Definitions of terms used in the AMeGA Survey. Retrieved January 5,

2005, from http://ils.unc.edu/mrc/amega_survey_defs.htm.

Greenberg, J. (2004b). Metadata extraction and harvesting: A comparison of two automatic metadata generation applications. Journal of Internet Cataloging, 6(4), 59–82.

Greenberg, J. (in press). Understanding metadata and metadata schemes. Cataloging & Classification Quarterly, 41(3/4). Also to appear in R. Smiraglia (Ed.), Metadata: A cataloger's primer. New York: Haworth Information Press.

Greenberg, J., Crystal, A., Robertson, W. D. & Leadem, E. (2003). Iterative design of metadata creation tools for resource authors. In Sutton, S. Greenberg, J., and Tennis, J. (Eds.). Proceedings of the 2003 Dublin Core Conference: Supporting Communities of Discourse and Practice – Metadata Research and Applications, Seattle, Washington, September 28-October 2, 2003. Retrieved January 5, 2005, from http://www.siderean.com/dc2003/202_Paper82-color-

NEW.pdf.

Gunter, B., Nicholas, D., Huntington, P., & Williams, P. (2002). Online versus offline research: Implications for evaluating digital media. Aslib Proceedings, 45(4), 229–239.

Han, H. C., Giles, L., Manavoglu, E., Zha, H., Zhang, Z., & Fox, E.A. (2003). Automatic document metadata extraction using support vector machines. In Proceedings of the Third

ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 37 – 48). New York: ACM Press.

Hatala, M. & Forth, S. (2003). System for computer-aided metadata creation. In Proceedings of 12th International Conference of the World Wide Web Consortium (WWW2003), Budapest, May 20-24, 2003.

Hayslett, M. M., & Wildemuth, B. W. (2004). Pixels or pencils? The relative effectiveness of Web-based versus paper surveys. Library and Information Science Research, 26(1), 73–93.

Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K., & Yee, K. P. (2002). Finding the

flow in web site search. Communications of the ACM, 45(9), 42–49.

Heery, R., & Wagner, H. (2002). A metadata registry for the semantic web. D-Lib Magazine, 8(5). Retrieved January 5, 2005, from http://www.dlib.org/dlib/may02/wagner/05wagner.html.

Heyman, B. L. (1981). In line to get on line: A background report on CONDOC (The Consortium to Develop an On-line Catalog). Colorado Libraries, 7(4), 10-13.

Ji, Y. G. & Salendy, G. (2002). A metadata filter for intranet portal organizational memory information systems. International Journal of Human-Computer Studies, 56(5), 525 – 537.

Johnson, F. (1995). Automatic abstracting research. Library Review, 44(8), 28 - 36.

Lan, W. C. (2002). From document clues to descriptive metadata: Document characteristics used by graduate students in judging the usefulness of web documents. Doctoral dissertation, University of North Carolina at Chapel Hill.

Libbenga, J. (2004). Microsoft releases metadata removal

tool. The Register. Retrieved January 5, 2005, from

http://www.theregister.co.uk/2004/02/02/microsoft_releases_metadata_removal_tool .

Liddy, E. D., Allen, E., Harwell, S., Corieri, S., Yilmazel, O., Ozgencil, N. E., Diekema, A., McCracken, N. J., Silverstein, J., & Sutton, S. A. (2002). Automatic metadata generation & evaluation. Proceedings of the 25th Annual International ACM SIGIR Conference on Research

and Development in Information Retrieval, August 11-15, 2002, Tampere, Finland (pp. 401–402). New York: ACM Press.

Losee, R. (2003). Adaptive organization of tabular data for display. Journal of Digital Information 4(1). Retrieved January 5, 2005, from http://jodi.ecs.soton.ac.uk/Articles/v04/i01/Losee/.

Lutes, B. (1999). Web thesaurus compendium. Retrieved January 5, 2005, from http://www.ipsi.fraunhofer.de/~lutes/thesoecd.html.

Nadkarni, P., Chen, R., & Brandt, C. (2001). UMLS concept indexing for production databases: A feasibility study. Journal of the American Medical Information Association, 8(1), 80–91.

National Information Standards Organization. (2002). Data dictionary: Technical metadata for digital still images. Proposed NISO standard Z39.87. Retrieved January 5, 2005, from http://www.niso.org/standards/resources/Z39_87_trial_use.pdf.

Patton, M., Reynolds, D., Choudhury, G. S., & DiLauro, T. (2004). Toward a metadata generation framework: A case study at the John Hopkins university. D-Lib Magazine, 10(11). Retrieved January 5, 2005, from

http://www.dlib.org/dlib/november04/choudhury/11choudhury.html.

Research Libraries Group. (2003). Automatic exposure: Capturing technical for digital still images. Retrieved January 5, 2005, from www.rlg.org/longterm/ae_whitepaper_2003.pdf.

Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. New York: McGraw Hill.

Schwartz, C. (2002). Sorting out the web: Approaches to subject access. Westport, Connecticut: Ablex publishing.

Smiraglia, R. P., & Leazer, G. H. (1999). Derivative bibliographic control relationships: The word relationship in a global bibliographic database. Journal of the American Society for Information Science, 50(6): 493–504.

Takasu, A. (2003). Bibliographic attribute extraction from erroneous references based on a statistical model. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 49 – 60). New York: ACM Press.

Tillet, B. (1991). A taxonomy of bibliographic relationships. Library Resources & Technical

Services, 35(2), 150 – 158.

Tillett, B. B. (1992). Bibliographic relationships: An empirical study of the LC machinereadable records. Library Resources & Technical Services, 36(2), 162 – 88.

Toms, E., Campbell, D., & Blades, R. (1999). Does genre define the shape of information: The role of form and function in user interaction with digital documents. Proceedings of the 62nd American Society for Information Science Annual Meeting, pp. 693-704.

Van Duinen, R. S. (2004). New discoveries in the André

Savine collection: Examining the author-generated metadata contained in the bibliographic and biographical record of André Savine. Unpublished Master’s Paper, School of Information and Library Science, University of North Carolina at Chapel Hill. Retrieved January 7, 2005, from http://hdl.handle.net/1901/121.

Vellucci, S. L. (1997). Bibliographic relationships. Paper presented at the International Conference on the Principles and Future Development of AACR, Toronto, Canada. Retrieved January 5, 2005, from http://collection.nlc-bnc.ca/100/200/300/jsc_aacr/bib_rel/r-bibrel.pdf.

Weinstein, P. C. (1998). Ontology-based metadata: Transforming the MARC legacy. In Proceedings of the 3rd ACM International Conference on Digital Libraries, June 23-26, Pittsburgh, PA (pp. 254 – 263). New York: ACM Press.

Weintraub, K. D. (1979). The essential of the bibliographic record as discovered by research.

Library Resources & Technical Services, 23(4), 391-405.

Woodley, M. (2000). Metadata standards crosswalks. In

Baca, M. (Ed.), Introduction to metadata: Pathways to digital information. Los Angles, CA: Getty Information Institute. Retrieved January 5, 2005, from http://www.getty.edu/research/conducting_research/standards/intrometadata/3_crosswalks/index.html.

Zhang, Y. (2000). Using the internet for survey research: A case study. Journal of the American Society for Information Science, 51(1), 57-68.

Zuboff, S. (1988). In the age of the smart machine: The future of work and power. Oxford: Heinemann Professional.

EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate