Home | Browse | Search | Credits | About
Register | User Area | DL-Harvest | Help
DLIST

Adapting Web Archive Catalogues for Dynamic Change

Wu, Paul H-J and Ichsan, Tamsir P. and Nguyen, Ngoc Giang (2007) Adapting Web Archive Catalogues for Dynamic Change . In Julien, Masanes and Andreas, Rauber, Eds. Proceedings The Seventh International Workshop of Archiving Web, Vancouver, Canada.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

Web archives are an important source of information. However, before a Web archive can be properly utilized, it needs to be catalogued. This is to ensure that the accessed materials yield the historical understanding intended by the researcher. At the same time, the dynamic nature of the Web will easily render these catalogues outdated, and there is a constant need to monitor when the Web catalogues become irrelevant upon change of the Web content. This means a substantial amount of human effort is required to maintain the catalogue records for the Web archives, adding additional burden to any institutions that maintain it. In this paper, we propose an automatic mechanism to monitor changes in Web content, so that human workload can be reduced. The system combines two component technologies to make this possible: (1) a contextualized annotation module and (2) an evidence change detection module. Contextualized annotation enables the cataloguing process to link content on the Web page (the evidence), to the value assigned for an element of a metadata schema. Thus, the metadata is “supported” by certain Web content that functions as evidence for a cataloguing decision. Regardless of changes in the webpages outside of the evidence, the metadata remains valid as long as all the evidence remains the same. In order to achieve evidence-specific change detection, we need to extend the traditional Longest Common Subsequence (LCS) based Diff engine using a Page Coordinate translation algorithm, which we argue, through a survey, is the first among many other Web content monitoring approaches.

EPrint Type:Conference Paper
Keywords:web archives, evidence-based cataloguing, change detection, web curation
Subjects:World Wide Web
Information Science
Archives
Knowledge Organization
ID Code:2308
Deposited On:06 May 2008
Alternative Locations:http://iwaw.net/07/IWAW2007_wu.pdf
Eprint Statistics:View statistics for this eprint
Tell A Colleague:Tell a colleague about it.

[1] Wu, H-J P. and Heok, K-Y A. Annotating Web Archives – Structure, Provenance and Context through Archival Cataloguing. New Review of Hypermedia and Multimedia (Accepted, In Press)

[2] Heck, R.M., Luebke, S.M., & Obermark, C.H. (1999) A Survey of Web Annotation Systems. Retrieved June 3, 2005 from Grinnel College Website: http://www.math.grin.edu/~rebelsky/Blazers/Annotations/Summer1999/Papers/survey_paper.html

[3] Perry, P. (n.d.). Web Annotations. Retrieved June 3, 2005 from PaulPerry Website: http://www.paulperry.net/notes/annotations.asp

[4] Desmontils, E., Jacquin, C., & Simon, L. (2004). Advances in Web-Based Learning – ICWL 2004 (Vol 3143). Chapter 8: Dinosys: An Annotation Tool for Web-Based Learning. Heidelberd: Springer Berlin

[5] Silverman (n.d.) The Annotation Engine, from Harvard Law School’s Berkman Center for Internet and Society, can be found online at: http://cyber.law.harvard.edu/projects/annotate.html

[6] Koivunen, M.R. (2005). Annotea Project. Retrieved March 14, 2006 from W3C Website: http://www.w3.org/2001/Annotea/

[7] Yee, Ka-Ping. (2002). Zest: Discussion Mapping for Mailing Lists. Proceedings of the ACM Conference on Computer-Supported Cooperative Work. Retrieved May 14, 2006 from CiteSeer Website: http://citeseer.ist.psu.edu/cache/papers/cs/30352/http:zSzzSzzesty.cazSzpubszSzyee-zest-cscw2002-demo.pdf/yee02zest.pdf

[8] Yee, Ka-Ping (2002). CritLink: Advanced Hyperlinks Enable Public Annotation on the Web. Demonstration. Proceedings of the ACM Conference on Computer-Supported Cooperative Work. Retrieved May 14, 2006 from Ka Ping Yee Website: http://zesty.ca/crit/yee-crit-cscw2002-demo.pdf

[9] Paul H. J. Wu, Adrian K. H. Heok, Ichsan P. Tamsir (2006). Annotating the Web Archives – An Exploration of Web Archives Cataloging and Semantic Web.

[10] Paul H. J. Wu, Adrian K. H. Heok, Ichsan P. Tamsir (2006). Applying Context-Sensitive Web Annotation in Evidence-based, Collaborative Web Archives Cataloging. International Workshop on Archiving Web (IWAW 2006). Spain.

[11] James W. Hunt and M. Douglas McIlroy (June 1976). "An Algorithm for Differential File Comparison". Computing Science Technical Report, Bell Laboratories 41.

[12] Fred Douglis, Thomas Ball, Yih-Farn Chen, and Eleftherios Koutso_os. The AT&T Internet Difference Engine: Tracking and viewing changes on the web. World Wide Web, January 1998. To appear. Also published as AT&T Labs.Research TR 97.23.1, April, 1997

[13] Ling Liu , Wei Tang , David Buttler , Calton Pu, Information Monitoring on the Web: A Scalable Solution, World Wide Web, v.5 n.4, p.263-304, 2002

[14] Jacob, J, A. Sachde, and S. Chakravarthy, CX-Diff: A Change Detection Algorithm for XML Content and Change Presentation Issues in WebVigiL, in the Proc. of XSDM Workshop, Chicago, 2003, pp. 273--284.

[15] Jacob, J., et al., WebVigiL: An approach to Just-In-Time Information Propagation In Large Network-Centr environments, in Web Dynamics Book. 2003, Springer-Verlag, 2004.

EPrints dLIST, an open access archive for the Information Sciences, is supported by the School of Information Resources and Library Science and Learning Technologies Center, University of Arizona. Established in 2002, dLIST has a global Advisory Board and is a part of the Information Technology & Society Research Lab. Open Archives
Contact: Admin | Donate