ECDL 2002: Proceedings of the European Conference on Digital Libraries

Fullname:ECDL 2002: Research and Advanced Technology for Digital Libraries: 6th European Conference
Editors:Maristella Agosti; Costantino Thanos
Location:Roma, Italy
Dates:2002-Sep-16 to 2002-Sep-18
Publisher:Springer Berlin Heidelberg
Series:Lecture Notes in Computer Science 2458
Standard No:DOI: 10.1007/3-540-45747-X; ISBN: 978-3-540-44178-6 (print), 978-3-540-45747-3 (online); hcibib: ECDL02
  1. Web Archiving
  2. e-Book
  3. Collection Building
  4. Web Technologies
  5. OAI Applications
  6. Case Studies
  7. Navigation / Query Language
  8. Audio / Video Retrieval
  9. Architecture I
  10. IR
  11. Architecture II
  12. Evaluation
  13. Multimedia / Mixed Media
  14. Preservation / Classification / User Studies
  15. Architecture III
  16. Humanities
  17. Demos and Posters

Web Archiving

A First Experience in Archiving the French Web BIBAFull-Text 1-15
  Serge Abiteboul; Gregory Cobena; Julien Masanès; Gerald Sedrati
The web is a more and more valuable source of information and organizations are involved in archiving (portions of) it for various purposes, e.g., the Internet Archive www.archive.org. A new mission of the French National Library (BnF) is the "dépôt légal" (legal deposit) of the French web. We describe here some preliminary work on the topic conducted by BnF and INRIA. In particular, we consider the acquisition of the web archive. Issues are the definition of the perimeter of the French web and the choice of pages to read once or more times (to take changes into account). When several copies of the same page are kept, this leads to versioning issues that we briefly consider. Finally, we mention some first experiments.
Austrian Online Archive Processing: Analyzing Archives of the World Wide Web BIBAKFull-Text 16-31
  Andreas Rauber; Andreas Aschenbrenner; Oliver Witvoet
With the popularity of the World Wide Web and the recognition of its worthiness of being archived we find numerous projects aiming at creating large-scale repositories containing excerpts and snapshots of Web data. Interfaces are being created that allow users to surf through time, analyzing the evolution of Web pages, or retrieving information using search interfaces. Yet, with the timeline and metadata available in such a Web archive, additional analyzes that go beyond mere information exploration, become possible. In this paper we present the AOLAP project building a Data Warehouse of such a Web archive, allowing its analysis and exploration from different points of view using OLAP technologies. Specifically, technological aspects such as operating systems and Web servers used, geographic location, and Web technology such as the use of file types, forms or scripting languages, may be used to infer e.g. technology maturation or impact.
Keywords: Web Archiving; Data Warehouse (DWH); On-Line Analytical; Processing (OLAP); Technology Evaluation; Digital Cultural Heritage


Conversion of eBook Documents Based on Mapping Relations BIBAFull-Text 32-46
  Seung-Kyu Ko; Myoung-Soo Kang; Won-Sung Sohn; Soon-Bum Lim; Yoon-Chul Choy
An electronic book means a digital form of a paper book. Currently, to promote an eBook market, many countries have established eBook content standards. But, the publication of different caused exchanging problems due to mismatch of content forms. Therefore, to exchange eBook conforming each standard, the content has to be converted according to document structure and its semantic information. But existing conversion methods are almost based on syntax information. Even, using semantic information they are not reflected eBook characteristics. So, to precise and correct eBook conversion, we analyze each standard and define mapping relations considering semantic information and eBook characteristics. To generalize the mapping relations, we classify mapping relations into ten conversion classes, and provide conversion scripts examples for each class. With defined mapping relations and conversion classes, we write up conversion scripts for EBKS to OEB PS/JepaX, and experiment with them. We believe defined conversion classes can be applied to normal document conversions.
Guidelines for Designing Electronic Books BIBAFull-Text 47-60
  Ruth Wilson; Monica Landoni; Forbes Gibb
This paper presents the guidelines emerging from the EBONI (Electronic Books ON-screen Interface) Project's evaluations of electronic textbooks [1], which describe how e-learning content can be made usable for the UK Higher Education community. The project's on-screen design guidelines are described, including recommendations as to which features of the paper book metaphor should be retained, and how the electronic medium can best be exploited. Advice on hardware design is also provided. Finally, accessibility issues are examined and practical considerations for the creators of digital educational content are discussed.

Collection Building

Personalized Classification for Keyword-Based Category Profiles BIBAFull-Text 61-74
  Aixin Sun; Ee-Peng Lim; Wee Keong Ng
Personalized classification refers to allowing users to define their own categories and automating the assignment of documents to these categories. In this paper, we examine the use of keywords to define personalized categories and propose the use of Support Vector Machine (SVM) to perform personalized classification. Two scenarios have been investigated. The first assumes that the personalized categories are defined in a flat category space. The second assumes that each personalized category is defined within a pre-defined general category that provides a more specific context for the personalized category. The training documents for personalized categories are obtained from a training document pool using a search engine and a set of keywords. Our experiments have delivered better classification results using the second scenario. We also conclude that the number of keywords used can be very small and increasing them does not always lead to better classification performance.
Statistical Analysis of Bibliographic Strings for Constructing an Integrated Document Space BIBAFull-Text 75-90
  Atsuhiro Takasu
It is important to utilize retrospective documents when constructing a large digital library. This paper proposes a method for analyzing recognized bibliographic strings using an extended hidden Markov model. The proposed method enables analysis of erroneous bibliographic strings and integrates many documents accumulated as printed articles in a citation index. The proposed method has the advantage of providing a robust bibliographic matching function using the statistical description of the syntax of bibliographic strings, a language model and an Optical Character Recognition (OCR) error model. The method also has the advantage of reducing the cost of preparing training data for parameter estimation, using records in the bibliographic database.
Focused Crawls, Tunneling, and Digital Libraries BIBAFull-Text 91-106
  Donna Bergmark; Carl Lagoze; Alex Sbityakov
Crawling the Web to build collections of documents related to pre-specified topics became an active area of research during the late 1990's, crawler technology having been developed for use by search engines. Now, Web crawling is being seriously considered as an important strategy for building large scale digital libraries. This paper covers some of the crawl technologies that might be exploited for collection building. For example, to make such collection-building crawls more effective, focused crawling was developed, in which the goal was to make a "best-first" crawl of the Web. We are using powerful crawler software to implement a focused crawl but use tunneling to overcome some of the limitations of a pure best-first approach. Tunneling has been described by others as not only prioritizing links from pages according to the page's relevance score, but also estimating the value of each link and prioritizing them as well. We add to this mix by devising a tunneling focused crawling strategy which evaluates the current crawl direction on the fly to determine when to terminate a tunneling activity. Results indicate that a combination of focused crawling and tunneling could be an effective tool for building digital libraries.

Web Technologies

Goal-Oriented Requirements Specification for Digital Libraries BIBAFull-Text 107-117
  Davide Bolchini; Paolo Paolini
This paper presents a model for systematically organizing the activity of requirements analysis for web-based hypermedia digital libraries and for tying it up with design in a coherent fashion. In order to accomplish this goal, three conceptual tools are proposed: a goal-oriented requirements analysis model based on existing practices and concepts in requirements engineering; a lightweight notation and a taxonomy for requirement specifications. The approach presented in this paper has been developed and validated within the EU-funded UWA project (Ubiquitous Web Applications, IST-2000-25131).
OntoLog: Temporal Annotation Using Ad Hoc Ontologies and Application Profiles BIBAFull-Text 118-128
  Jon Heggland
This paper describes OntoLog, a prototype annotation system for temporal media. It is a Java application built to explore the issues and benefits of using ontologies, application profiles and RDF for temporal annotation. It uses an annotation scheme based on hierarchical ontologies, and an RDF-based data model that may be adapted and extended through the use of RDF Schema. Dublin Core is used as a default description scheme. The paper also describes an ontology-based logging interface and annotation visualisation, and a web-based searching and browsing system.
An XML Log Standard and Tool for Digital Library Logging Analysis BIBAFull-Text 129-143
  Marcos André Gonçalves; Ming Luo; Rao Shen; Mir Farooq Ali; Edward A. Fox
Log analysis can be a primary source of knowledge about how digital library patrons actually use DL systems and services and how systems behave while trying to support user information seeking activities. Log recording and analysis allow evaluation assessment, and open opportunities to improvements and enhanced new services. In this paper, we propose an XML-based digital library log format standard that captures a rich, detailed set of system and user behaviors supported by current digital library services. The format is implemented in a generic log component tool, which can be plugged into any digital library system. The focus of the work is on interoperability, reusability, and completeness. Specifications, implementation details, and examples of use within the MARIAN digital library system are described.

OAI Applications

Notes from the Interoperability Front: A Progress Report on the Open Archives Initiative BIBAFull-Text 144-157
  Herbert Van de Sompel; Carl Lagoze
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) was first released in January 2001. Since that time, the protocol has been adopted by a broad community and become the focus of a number of research and implementation projects. We describe the various activities building on the OAI-PMH since its first release. We then describe the activities and decisions leading up to the release of a stable Version 2 of the OAI-PMH. Finally, we describe the key features of OAI-PMH Version 2.
Dynamic Generation of Intelligent Multimedia Presentations through Semantic Inferencing BIBAFull-Text 158-175
  Suzanne Little; Joost Geurts; Jane Hunter
This paper first proposes a high-level architecture for semiautomatically generating multimedia presentations by combining semantic inferencing with multimedia presentation generation tools. It then describes a system, based on this architecture, which was developed as a service to run over OAI archives -- but is applicable to any repositories containing mixed-media resources described using Dublin Core. By applying an iterative sequence of searches across the Dublin Core metadata, published by the OAI data providers, semantic relationships can be inferred between the mixed-media objects which are retrieved. Using predefined mapping rules, these semantic relationships are then mapped to spatial and temporal relationships between the objects. The spatial and temporal relationships are expressed within SMIL files which can be replayed as multimedia presentations. Our underlying hypothesis is that by using automated computer processing of metadata to organize and combine semantically-related objects within multimedia presentations, the system may be able to generate new knowledge by exposing previously unrecognized connections. In addition, the use of multilayered information-rich multimedia to present the results, enables faster and easier information browsing, analysis, interpretation and deduction by the end-user.
Technical Report Interchange through Synchronized OAI Caches BIBAFull-Text 176-189
  Xiaoming Liu; Kurt Maly; Mohammad Zubair; Rong Tang; Mohammed Imran Padshah; George Roncaglia; JoAnne Rocker; Michael L. Nelson; William von Ofenheim; Richard Luce; Jacqueline Stack; Frances Knudson; Beth Goldsmith; Irma Holtkamp; Miriam Blake; Jack Carter; Mariella Di Giacomo; Major Jerome Nutter; Susan Brown; Ron Montbrand; Sally Landenberger; Kathy Pierson; Vince Duran; Beth Moser
The Technical Report Interchange project is a cooperative experimental effort between NASA Langley Research Center, Los Alamos National Laboratory, Air Force ResearchLab oratory, Sandia National Laboratory and Old Dominion University to allow for the integration of technical reports. This is accomplished using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and having each site cache the metadata from the other participating sites. Each site also implements additional software to ingest the OAI-PMH harvested metadata into their native digital library (DL). This allows the users at each site to see an increased technical report collection through the familiar DL interfaces and take advantage of whatever valued added services are provided by the native DL.

Case Studies

Functional Requirements for Online Tools to Support Community-Led Collections Building BIBAFull-Text 190-203
  Michael Khoo; Holly Devaul; Tamara Sumner
The Digital Water Education Library collection (DWEL) is being generated by primary and secondary school teachers in the United States. This complex process involves both individual research and team design, and the use of a variety of online tools, such as an online cataloguing tool. Interactions amongst DWEL members are being ethnographically analysed in order to identify requirements for further development of these tools. The analysis suggests that many DWEL members envision their work as occurring in an integrated environment with stable documents, a situation which is not supported by the current configuration of DWEL tools. The design implications of these findings are reviewed.
A Study on the Evaluation Model for University Libraries in Digital Environments BIBAKFull-Text 204-217
  Byeong Heui Kwak; Woochun Jun; Le Gruenwald; Suk-ki Hong
Advanced information technology has changed our society in various aspects. University libraries are also changing with the adoption of advanced information technology. Specifically, digital technology including the Internet has changed traditional university libraries in their operations as well as infrastructures. Traditional university libraries have stored and distributed scholarly information in printed media. However, most of current university libraries are hybrid libraries, which are dependent on digital media as well as printed media, and are based on both network facilities and physical facilities. The evaluation metrics developed for traditional university libraries are no longer adequate to evaluate current university libraries. This paper presents an evaluation of hybrid libraries. Based on the opinions of library experts and the previous works on the evaluation of both traditional and digital libraries, an initial evaluation model was developed, which consists of 8 categories, 33 items, and 84 indicators. The Delphi method was then applied to develop a valid evaluation model for university libraries. A survey was conducted 3 times for 50 balanced subjects among library-related professors, researchers, and senior university librarians. Based on the surveys' results, the categories, items, and indicators were modified to derive the new evaluation model, which consists of 7 evaluation categories, 35 items, and 92 indicators. The content validity of this model was confirmed through the survey results of 184 university librarians.
Keywords: Hybrid Library; Digital Library; Delphi Method
Renardus: Following the Fox from Project to Service BIBAFull-Text 218-229
  Lesly Huxley
The Renardus academic subject gateway service in Europe was launched in April 2002. The author first presented the challenges facing this pan-European collaborative project at ECDL 2000. This paper identifies the progress made in information, technical and organisational developments and deployment since Lisbon 2000, presents the results of evaluation activities and outlines the challenges, setbacks and successes for Renardus transition -- in June 2002 -- from project to service.
From Digital Archive to Digital Library -- A Middleware for Earth-Observation Data Management BIBAFull-Text 230-237
  Stephan Kiemle
The German Remote Sensing Data Center (DFD) has developed a digital library for the long-term management of earth observation data products. This Product Library is a central part of DFD's multi-mission ground segment Data Information and Management System (DIMS) and is successfully in operation since 2000. Its data model is regularly extended to support products of upcoming earth observation missions. The Product Library implements a middleware filling the gap between application-level object data models and physical storage structures such as a digital robot archive with hierarchical storage management. This paper presents the principles of the Product Library middleware and its application in the specific earth observation context.

Navigation / Query Language

Navigating in Bibliographic Catalogues BIBAFull-Text 238-250
  Trond Aalberg
The FRBR-model provided by the IFLA Study group on the Functional Requirements for Bibliographic Records addresses the need for a more thorough model of bibliographic information. This paper describes a solution for applying the FRBR model to existing bibliographic catalogues. This is accomplished by augmenting the catalogue with an externally stored map that contains relationships and entities according to the model. The core of the system is the Digital Library Link Service developed at the Norwegian University of Science and Technology -- a flexible and general purpose link service developed to support the structuring of information objects in digital libraries. A client application is developed to visualize the map and to enable users to interact with the map by navigating along the available paths.
Foundations of a Multidimensional Query Language for Digital Libraries BIBAFull-Text 251-265
  Donatella Castelli; Carlo Meghini; Pasquale Pagano
A query language for Digital Libraries is presented, which offers access to documents by structure and sophisticated usage of metadata. The language is based on a mathematical model of digital library documents, centered around a multilevel representation of documents as versions, views and manifestations. The core of the model is the notion of document view, which is recursive, and captures the content and structure of a document. The metadata representation distinguishes between formats and specifications, so being able to accommodate different metadata formats, even for the same document. A query is a logical formula, and its result are the digital library documents satisfying the user query.

Audio / Video Retrieval

The TREC2001 Video Track: Information Retrieval on Digital Video Information BIBAFull-Text 266-275
  Alan F. Smeaton; Paul Over; Cash Costello; Arjen P. de Vries; David S. Doermann; Alexander G. Hauptmann; Mark E. Rorvig; John R. Smith; Lide Wu
The development of techniques to support content-based access to archives of digital video information has recently started to receive much attention from the research community. During 2001, the annual TREC activity, which has been benchmarking the performance of information retrieval techniques on a range of media for 10 years, included a "track" or activity which allowed investigation into approaches to support searching through a video library. This paper is not intended to provide a comprehensive picture of the different approaches taken by the TREC2001 video track participants but instead we give an overview of the TREC video search task and a thumbnail sketch of the approaches taken by different groups. The reason for writing this paper is to highlight the message from the TREC video track that there are now a variety of approaches available for searching and browsing through digital video archives, that these approaches do work, are scalable to larger archives and can yield useful retrieval performance for users. This has important implications in making digital libraries of video information attainable.
Automated Alignment and Annotation of Audio-Visual Presentations BIBAFull-Text 276-291
  Gareth J. F. Jones; Richard J. Edens
Recordings of audio-visual presentations are a potentially valuable component of digital libraries. These recordings can be archived to enable remote access to audio presentations including lectures and seminars. Recordings of presentations often contain multiple information streams involving visual and audio data. If the full benefit of these recordings is to be realised these multiple media streams must be properly integrated to enable rapid navigation. This paper describes the application of information retrieval techniques within a system to automatically synchronise an audio soundtrack with electronic slides from a presentation. A novel component of the system is the detection of sections of the presentation unsupported by prepared slides, such as discussion and question answering, and automatic development of keypoint slides for these elements of the presentation.

Architecture I

OpenDLib: A Digital Library Service System BIBAFull-Text 292-308
  Donatella Castelli; Pasquale Pagano
OpenDLib is a software toolkit that can be used to create a digital library easily, according to the requirements of a given user community, by instantiating the software appropriately and then either loading or harvesting the content to be managed. OpenDLib consists of a federation of services that implement the digital library functionality making few assumptions about the nature of the documents to be stored and disseminated. If necessary, the system can be extended with other services to meet particular needs. The main focus of the paper is the openness and extendibility of the system. This feature has been obtained by applying a systematic approach to the design of the toolkit. A model of the system architecture has been defined in order to support this approach. The paper presents OpenDLib through this model.
Prototyping Digital Library Technologies in zetoc BIBAKFull-Text 309-323
  Ann Apps; Ross MacIntyre
zetoc is a current awareness and document delivery service providing World Wide Web and Z39.50 access to the British Library's Electronic Table of Contents database of journal articles and conference papers, along with an email alerting service. An experimental prototype version of zetoc is under development, based on open standards, including Dublin Core and XML, and using the open source, leading-edge Cheshire II information retrieval technology. Enhancements investigated in this prototype include request and delivery of discovered articles, location of electronic articles using OpenURL technology, and additional current awareness functionality including the exposure of journal issue metadata according to the Open Archives Initiative protocol. These experimental developments will enhance the zetoc service to improve the information environment for researchers and learners.
Keywords: Electronic table of contents; current awareness; document delivery; alerting; OpenURL; Open Archives Initiative; Z39.50
Employing Smart Browsers to Support Flexible Information Presentation in Petri Net-Based Digital Libraries BIBAFull-Text 324-337
  Unmil Karadkar; Jin-Cheon Na; Richard Furuta
For effective real-life use, digital libraries must incorporate resource and system policies and adapt to user preferences and device characteristics. The caT (context-aware Trellis) hypertext model incorporates these policies and adaptation conditions within the Petri net specification of the digital library to support context-aware delivery of digital documents in a dynamically changing environment. This paper describes extensions to the caT architecture for supporting adaptation via smarter browsers and an external resource store to provide greater flexibility in information presentation. Browsers request resources that they can best display with their knowledge of intrinsic capabilities and constraints imposed on them by the devices that they run on. The data store returns the most appropriate version of a resource in response to browser requests, thus allowing maintainers of libraries to add, modify and remove resources without any changes to the structure, presentation or document pointers in the digital library.


On the Use of Explanations as Mediating Device for Relevance Feedback BIBAFull-Text 338-345
  Ian Ruthven
In this paper we examine the role of explanations as a means of facilitating the use of relevance feedback in information retrieval systems. We do this with particular reference to previous experimental work. This demonstrates that explanations can increase the user's willingness to interact more fully with the system. We outline the general conclusions from this experimental work and discuss the implications for interactive IR systems that incorporate relevance feedback.
Qualitative Evaluation of Thesaurus-Based Retrieval BIBAFull-Text 346-361
  Dorothee Blocks; Ceri Binding; Daniel Cunliffe; Douglas Tudhope
This paper reports on a formative evaluation of a prototype thesaurus-based retrieval system, which involved qualitative investigation of user search behaviour. The work is part of the ongoing 'FACET' project in collaboration with the National Museum of Science and Industry and its collections database. The main thesaurus employed in the project is the Getty Art and Architecture Thesaurus. The aim of the evaluation is to analyse at a micro level the user's interaction with interface elements in order to illuminate problems and inform interface design decisions. Data gathered included transcripts of think-aloud sessions, screen capture movie files, user action logs and observator notes. Key incidents from the sessions are analysed and the qualitative methodology is discussed. The evaluation analysis informs design issues concerning the allocation of search functionality to sub-windows, the appropriate role of thesaurus browsing in the search process, the formation of faceted queries and query reformulation. The analysis suggests that, although the prototype interface supports basic level operations, it does not provide nonexpert searchers with sufficient guidance on query structure and when to use the thesaurus. Conclusions are drawn that future work should further support and suggest models of the search process to the user.
Meta-data Extraction and Query Translation. Treatment of Semantic Heterogeneity BIBAFull-Text 362-373
  Robert Strötgen
The project CARMEN¹ ("Content Analysis, Retrieval and Metadata: Effective Networking") aimed among other goals at improving the expansion of searches in bibliographic databases into Internet searches. We pursued a set of different approaches to the treatment of semantic heterogeneity (meta-data extraction, query translation using statistic relations and cross-concordances). This paper describes the concepts and implementation of this approaches and the evaluation of the impact for the retrieval result.

Architecture II

MetaDL: A Digital Library of Metadata for Sensitive or Complex Research Data BIBAFull-Text 374-389
  Fillia Makedon; James Ford; Li Shen; Tilmann Steinberg; Andrew J. Saykin; Heather Wishart; Sarantos Kapidakis
Traditional digital library systems have certain limitations when dealing with complex or sensitive (e.g. proprietary) data. Collections of digital libraries have to be accessed individually and through non-uniform interfaces. By introducing a level of abstraction, a Meta-Digital Library or MetaDL, users gain a central access portal that allows for prioritized queries, evaluation and rating of the results, and secure negotiations to obtain primary data. This paper demonstrates the MetaDL architecture with an application in brain imaging research, BrassDL, the Brain Support Access System Digital Library. BrassDL is currently under development. This paper describes a theoretical framework behind it, addressing aspects from metadata extraction and system-supported negotiations to legal, ethical and sustainability issues.
Importing Documents and Metadata into Digital Libraries: Requirements Analysis and an Extensible Architecture BIBAFull-Text 390-405
  Ian H. Witten; David Bainbridge; Gordon W. Paynter; Stefan J. Boddie
Flexible digital library systems need to be able to accept, or "import," documents and metadata in a variety of forms, and associate metadata with the appropriate documents. This paper analyzes the requirements of the import process for general digital libraries. The requirements include (a) format conversion for source documents, (b) the ability to incorporate existing conversion utilities, (c) provision for metadata to be specified in the document files themselves and/or in separate metadata files, (d) format conversion for metadata files, (e) provision for metadata to be computed from the document content, and (f) flexible ways of associating metadata with documents or sets of documents. We argue that these requirements are so open-ended that they are best met by an extensible architecture that facilitates the addition of new document formats and metadata facilities to existing digital library systems. An implementation of this architecture is briefly described.
The Mellon Fedora Project BIBAHTML 406-421
  Sandra Payette; Thornton Staples
The University of Virginia received a grant of $1,000,000 from the Andrew W. Mellon Foundation to enable the Library, in collaboration with Cornell University, to build a digital object repository system based on the Flexible Extensible Digital Object and Repository Architecture (Fedora). The new system demonstrates how distributed digital library architecture can be deployed using web-based technologies, including XML and Web services. The new system is designed to be a foundation upon which interoperable web-based digital libraries can be built. Virginia and collaborating partners in the US and UK will evaluate the system using a diverse set of digital collections. The software will be made available to the public as an open-source release.


Hybrid Partition Inverted Files: Experimental Validation BIBAFull-Text 422-431
  Wensi Xi; Ohm Sornil; Ming Luo; Edward A. Fox
The rapid increase in content available in digital forms gives rise to large digital libraries, targeted to support millions of users and terabytes of data. Efficiently retrieving information then is a challenging task due to the size of the collection and its index. In this paper, our high performance "hybrid" partition inverted index is validated through experiments with a 100 Gbyte collection from TREC-9 and -10. The hybrid scheme combines the term and the document approaches to partitioning inverted indices across nodes of a parallel system. Experiments on a parallel system show that this organization outperforms the document and the term partitioning schemes. Our hybrid approach should support highly efficient searching for information in a largescale digital library, implemented atop a network of computers.
Digital Library Evaluation by Analysis of User Retrieval Patterns BIBAFull-Text 432-447
  Johan Bollen; Somasekhar Vemulapalli; Weining Xu
We propose a methodology to evaluate the impact of a Digital Library's (DL) collection and the characteristics of its user community by an analysis of user retrieval patterns. Patterns of journal and document co-retrievals are reconstructed from DL server logs and used to generate proximity data for journals and documents, resulting in a weighted relation defined over the DL document collection represented by a network of document and journals. A measure of discrepancy between user-defined measures of document impact and the Journal Citation Record (JCR) Impact Factor (IF) published by the Institute for Scientific Information (ISI) is used to analyze characteristics of the DL user community. A preliminary analysis of the Los Alamos National Laboratory (LANL) Research Library (RL) server logs registered in 2001 demonstrates the potential of this approach.
Interactive Search Results BIBAFull-Text 448-462
  Ioannis Papadakis; Ioannis Andreou; Vassilios Chrissikopoulos
In this paper, we address the issue of interactive search results manipulation, as provided by typical Web-based information retrieval modules like search engines and directories. Many digital library systems could benefit a lot from the proposed approach, since it is heavily based on metadata, which constitute the building block of such systems. We also propose a way of ranking search results according to their overall importance, which is defined as a weighted combination of the relevancy and popularity of a resource that is being referenced in a search results list. In order to evaluate this model, we have developed an interactive search results manipulation application, which is executed at the client's workspace through a Web browser without any further interaction with the server that provided the initial search results list. The prototype implementation is based on the XML standard and has been evaluated through an adequate evaluation process from which useful conclusions have been obtained.

Multimedia / Mixed Media

An Investigation of Mixed-Media Information Retrieval BIBAFull-Text 463-478
  Gareth J. F. Jones; Adenike M. Lam-Adesina
Digital document archives are increasingly derived from various different media sources. At present such archives are stored and searched independently. The Information Retrieval from Mixed-Media Collections (IRMMC) project is investigating retrieval from combined document collections composed of items originating from differing media forms. Experimental investigation of a "mixed-media" retrieval task based on the existing TREC Spoken Document Retrieval task combining Text, Spoken and Scanned Image is described. Results show that nontext media perform well within the mixed-media collection. Also while pseudo relevance feedback is extremely effective for spoken documents, its behaviour for document image retrieval is more complex.
Alignment of Performances with Scores Aimed at Content-Based Music Access and Retrieval BIBAFull-Text 479-492
  Nicola Orio
Music digital libraries pose interesting and challenging research problems, in particular for the development of methodologies and tools for the retrieval of music documents. One difficult aspect of content-based retrieval of musical works is that only scores can be represented by a symbolic notation, while performances, which are of interest for the majority of users, allow for access based on bibliographic values only. The research work reported in this paper proposes to index and retrieve music performances through an automatic alignment of acoustic recordings with the music scores. Alignment my allow for: automatic recognition of performances, aimed at cataloging large collections of recordings; automatic tagging of performances, aimed at an easy access to long recordings. The methodology is based on the use of hidden Markov models, a powerful tool that has been successfully used in many research areas, like speech recognition and molecular biology. The approach has been tested on a collection of acoustic and synthetic performances, showing good results in the recognition and in the tagging of performances. The proposed approach can be used to increase the functionalities of a music digital library, allowing for content-based access to scores and recordings.
Alternative Surrogates for Video Objects in a Digital Library: Users' Perspectives on Their Relative Usability BIBAFull-Text 493-507
  Barbara M. Wildemuth; Gary Marchionini; Todd Wilkens; Meng Yang; Gary Geisler; Beth Fowler; Anthony Hughes; Xiangming Mu
In a digital environment, it is feasible to integrate multimedia materials into a library collection with ease. However, it seems likely that nontextual surrogates for multimedia objects, e.g., videos, could effectively augment textual representations of those objects. In this study, five video surrogates were evaluated in relation to their usefulness and usability in accomplishing specific tasks. The surrogates (storyboards with text or audio keywords, slide shows with text or audio keywords, fast forward) were created for each of seven video segments. Ten participants, all of whom watch videos at least monthly and search for videos at least occasionally, viewed the surrogates for seven video segments and provided comments about the strengths and weaknesses of each. In addition, they performed a series of tasks (gist determination, object recognition, action recognition, and visual gist determination) with three surrogates selected from those available. No surrogate was universally judged "best," but the fast forward surrogate garnered the most support, particularly from experienced video users. The participants expressed their understanding of video gist as composed of three components: topicality, the story of the video, and the visual gist of the video. They identified several real-world tasks for which they regularly use video collections. The viewing compaction rates used in these surrogates supported adequate performance, but users expressed a desire for more control over surrogate speed and sequencing. Further development of these surrogates is warranted by these results, as well as the development of mechanisms for surrogate display.
Word Alignment in Digital Talking Books Using WFSTs BIBAFull-Text 508-515
  António Joaquim Serralheiro; Diamantino Caseiro; Hugo Meinedo; Isabel Trancoso
This paper describes the motivation and the method that we used for aligning digital spoken books, and the results obtained both at a word level and at a phone level. This alignment will allow specific access interfaces for persons with special needs, and also tools for easily detecting and indexing units (words, sentences, topics) in the spoken books. The tool was implemented in a Weighted Finite State Transducer framework, which provides an efficient way to combine different types of knowledge sources, such as alternative pronunciation rules. With this tool, a 2-hour long spoken book was aligned in a single step in much less than real time.

Preservation / Classification / User Studies

Migration on Request, a Practical Technique for Preservation BIBAFull-Text 516-526
  Phil Mellor; Paul Wheatley; Derek M. Sergeant
Maintaining a digital object in a usable state over time is a crucial aspect of digital preservation. Existing methods of preserving have many drawbacks. This paper describes advanced techniques of data migration which can be used to support preservation more accurately and cost effectively.
   To ensure that preserved works can be rendered on current computer systems over time, "traditional migration" has been used to convert data into current formats. As the new format becomes obsolete another conversion is performed, etcetera. Traditional migration has many inherent problems as errors during transformation propagate throughout future transformations.
   CAMiLEON's software longevity principles can be applied to a migration strategy, offering improvements over traditional migration. This new approach is named "Migration on Request." Migration on Request shifts the burden of preservation onto a single tool, which is maintained over time. Always returning to the original format enables potential errors to be significantly reduced.
Information Alert in Distributed Digital Libraries: The Models, Languages, and Architecture of DIAS BIBAFull-Text 527-542
  Manolis Koubarakis; T. Koutris; Christos Tryfonopoulos; Paraskevi Raftopoulou
This paper presents DIAS, a distributed alert service for digital libraries, currently under development in project DIET. We first discuss the models and languages for expressing user profiles and notifications. Then we present the data structures, algorithms and protocols that underly the peer-to-peer agent architecture of DIAS.
DSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories BIBAFull-Text 543-549
  MacKenzie Smith
The DSpace™ project of the MIT Libraries and the Hewlett Packard Laboratories (dspace.org) has built an institutional repository system for digital research material. This paper will describe the rationale for institutional repositories, the DSpace system, and its implementation at MIT. Also described are the plans for making DSpace open source in an effort to provide a useful test bed and a platform for future research in the areas of open scholarly communication and the long-term preservation of fragile digital research material.
User Behavior Tendencies on Data Collections in a Digital Library BIBAFull-Text 550-559
  Michalis Sfakakis; Sarantos Kapidakis
We compare the usage of a Digital Library with many different categories of collections, by examining its log files for a period of twenty months, and we conclude that the access points that the users mostly refer to, depend heavily on the type of content of the collection, the detail of the existing metadata and the target user group. We also found that most users tend to use simple query structures (e.g. only one search term) and very few and primitive operations to accomplish their request. Furthermore, as they get more experienced, they reduce the number of operations in their sessions.
Student Comprehension of Classification Applications in a Science Education Digital Library BIBAKFull-Text 560-567
  Jane Greenberg; Kristen A. Bullard; M. L. James; Evelyn Daniel; Peter White
Piaget's theory of cognitive development serves as a basis for a comparative analysis of middle school students' understanding of classification in the physical and digital library. Attention is also given to student comprehension of scientific classification. Results of this pilot study show that although participants had good comprehension of classification principles in the physical environment, with which they are more familiar, their understanding diminishes in the digital environment and when addressing scientific classification. Results are compared to an earlier study and implications for the design of educational science digital libraries are discussed.
Keywords: Digital Library; Classification; Science Education; Piaget; Cognitive Development

Architecture III

Designing Protocols in Support of Digital Library Componentization BIBAFull-Text 568-582
  Hussein Suleman; Edward A. Fox
Reusability always has been a controversial topic in Digital Library (DL) design. While componentization has gained momentum in software engineering in general, there has not been broad DL standardization in component interfaces. Recently, the Open Archives Initiative (OAI) has begun to address this by creating a standard protocol for accessing metadata archives. We propose that the philosophy and approach adopted by the OAI can be extended easily to support inter-component protocols. In particular, we propose building DLs by connecting small components that communicate through a family of lightweight protocols, using XML as the data interchange mechanism. In order to test the feasibility of this, a set of protocols was designed based on the work of the OAI. Components adhering to these protocols were implemented and integrated into production and research DLs. The performance of these components was analyzed from the perspective of execution speed, network traffic, and data consistency. On the whole, this work has shown promise in the approach of applying the fundamental concepts of the OAI protocol to the task of DL component design and implementation.
Exploring Small Screen Digital Library Access with the Greenstone Digital Library BIBAFull-Text 583-596
  George Buchanan; Matt Jones; Gary Marsden
In recent years, the use of small screen devices has multiplied rapidly. This paper covers a number of different issues which arise when digital libraries are used in combination with such displays. Known limitations of small screens are presented to the Digital Library community. Two evaluations of pilot small-screen DL systems are presented, with some unexpected cultural and socio-technical concerns which arose. The pilot systems also demonstrate the delivery of small-screen access using an existing popular DL system.
Daffodil: An Integrated Desktop for Supporting High-Level Search Activities in Federated Digital Libraries BIBAFull-Text 597-612
  Norbert Fuhr; Claus-Peter Klas; André Schaefer; Peter Mutschke
Daffodil is a digital library system targeting at strategic support during the information search process. For the user, mainly high-level search functions, so-called stratagems, implement this strategic support, which provide functionality beyond today's digital libraries. Through the tight integration of stratagems and with the federation of heterogeneous digital libraries, Daffodil reaches a high synergy effect for information and services. These effects provide high-quality metadata for the searcher through an intuitively controllable user interface. The visualisation of stratagems is based on a strictly object-oriented tool-based model. This paper presents the graphical user interface with a particular view on the integration of stratagems to enable strategic support.


Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content BIBAFull-Text 613-625
  Kalina Bontcheva; Diana Maynard; Hamish Cunningham; Horacio Saggion
In this paper we show how we used robust human language technology, such as our domain-independent and customisable named entity recogniser, for automatic content annotation and indexing in two digital library applications. Each of these applications posed a unique challenge: one required adapting the language processing components to the non-standard written conventions of 18th century English, while the other presented the challenge of processing material in multiple modalities. This reusable technology could also form the basis for the creation of computational tools for the study of cultural heritage languages, such as Ancient Greek and Latin.
Cultural Heritage Digital Libraries: Needs and Components BIBAFull-Text 626-637
  Gregory Crane
This paper describes preliminary conclusions from a long-term study of cultural heritage digital collections. First, those features most important to cultural heritage digital libraries are described. Second, we list those components that have proven most useful in boot-strapping new collections.
Visualization of Variants in Textual Collations to Analyze the Evolution of Literary Works in the Cervantes Project BIBAFull-Text 638-653
  Carlos Monroy; Rajiv Kochumman; Richard Furuta; Eduardo Urbina; Eréndira Melgoza; Arpita Goenka
As part of the Cervantes Project digital library, we are developing an Electronic Variorum Edition (EVE) of Don Quixote de la Mancha. Multiple editors can create an EVE with our Multi Variant Editor for Documents (MVED), which allows collation of one base text against several comparison texts to identify, link and edit all existing variants among them. In this context we are investigating the use of visualizations to depict graphically variants in order to validate the accuracy of the textual transcriptions and to understand the similarities and differences among different printings and editions. Our broader goal is to enable users to analyze the collation's results and to discover facts about the evolution of the Quixote textual history, and to provide evidence to eliminate printing and compositor's errors and thus to produce a more correct edition closer to Cervantes' original manuscript. This paper describes the visualization tool, and presents the initial results of its use.

Demos and Posters

Alinari Online: Access Quality through a Cultural Photographic Heritage Site BIBAFull-Text 654-655
  Andrea de Polo; Sam H. Minelli
The unique heritage of the Alinari collections gives life to one of the biggest international centers of photographic and iconographic documentation with over 3.5 million vintage images from the 19the and 20the century from all over the world.
   Today Alinari is a modern reality operating in the wider field of image and communication: a brand name which guarantees an age-old fund of experience combined with state-of-the-art technological skills. A good example of this synergy is given in the REGNET project, which aims to set up a functional network of service centres in Europe which provides IT-services dedicated to Cultural Heritage organisations and will be an enabler of eBusiness activities for CH organizations. Multi media industries enabling the production of electronic publications will be integrated. It will provide access and use of digital data (scientific and cultural) as well as of physical goods as provided by museum shops.
An Access Control System for Digital Libraries and the Web: The MaX Prototype Demonstration BIBAFull-Text 656-657
  Elisa Bertino; Elena Ferrari; Andrea Perego
The goal of this demonstration is to present the main features of MaX, a system enforcing access control to Web documents. This system has been developed at the Dipartimento di Scienze dell'Informazione of the University of Milano in the framework of the European project EUFORBIA, and implements the Milano Model, an access control mechanism conceived for Digital Library (DL) and Web environments.
Human Language Technology for Automatic Annotation and Indexing of Digital Library Content BIBAFull-Text 658
  Kalina Bontcheva; Hamish Cunningham
This demo will present a set of domain-independent and customisable Human Language Technology (HLT) tools and the way they were applied for annotating 18th century OldBailey proceedings and indexing multimedia content. This demo accompanies the paper with the same title.
The IntraText Digital Library: XML-Driven Online Library Based on High Accessibility, Lexical Hypertextualization and Scholarly Accuracy in Philological / Textual Notations BIBAFull-Text 659
  Nicola Mastidoro
The IntraText Digital Library born in 1999. At June 2002 it offers over 3000 full-text books and collections in 36 languages; the readers (over 5000 subscribed the News) access over 1 million pages per month. Six interface languages are available.
   The core of the project is a XML-driven Digital Library Framework offering high accessibility and scholarly quality in text representation: multi-level footnotes, philological notations, distinction between the lexicon of the author and that of other sources in concordance, hyphenation, sorting, etc. It is based on a scalable, low-cost architecture intended to manage (with workflow control) publishing and archiving books and collections by local or remote users and serving thousands of readers.
   ETML, a very accurate text-to-XML translation metalanguage, has been defined. It simplifies the text-to-XML process, provides tools for philological notations and gives automatic tools to create hypertexualized collections of archived works, e.g. opera omnia. ETML allows non-technically-skilled people to produce XML simply using any text-editor and e-mail, dramatically reducing times in manual processing: it takes about 30 minutes to produce an XML Bible from a Word file using ETML.
   The main publishing method for the library is lexical hypertextualization on highly accessible HTML pages, both on-line and on CD. Words (all or selected from a custom list) are linked to the concordance, concordance is itself linked to the text trough full references. Lists (frequency, alphabetical) and statistics are also available.
   Other publishing formats (MS Reader, XML TEI, etc.) will be available within the Library. Dublin Core metadata are about to be activated.
The Metae Project -- Automated Digitisation of Books and Journals BIBAFull-Text 660
  Günter Mühlberger; Birgit Stehno
The digitisation of printed materials such as books and journals is still a complicated and expensive process that requires a patchwork of software programs for the single conversation steps. Many libraries therefore shrink from venturing digitisation activities. To improve and simplify digitisation by highly automating the conversion process is the prominent aim of METAe -- a project co-funded by the European Commission within the 5th Framework, IST-Programme. The consortium of the METAe-project is made up of a number of leading libraries, university departments and digitisation companies from all over Europe, including the University of Innsbruck (Austria), i323 (Austria), CCS Compact Computer Systeme (Germany), Abbyy Europe/MitCom (Germany), the University of Florence (Italy), and the Scuola Normale Superiore of Pisa (Italy).
COVAX: A Contemporary Culture Virtual Archive in XML BIBAFull-Text 661-662
  Luciana Bordoni
The objectives of the EU-funded COVAX project are:
  • to build a web service for search and retrieval of contemporary European
       cultural documents from memory institutions.
  • to make existing library, archive and museum document descriptions accessible
       over the Internet.
  • to assist memory institutions to provide access to their collections,
       regardless of document type or collection size.
  • to implement standards and achieve interoperability between retrieval systems
       operating in the cultural heritage area. Partners in the project include technology developers and providers (public research organizations and private companies) and content owners (memory institutions). The content owners have collections of varying type and size, catalogued using a variety of library, museum and archiving systems. The project is assessing ways to improve access to these collections by converting samples of existing data into a limited set of common structured formats, each of which can be expressed using XML (eXtensible Markup Language). According to the philosophy adopted by the project, future catalogs for libraries, museums and archives will be stored in a variety of XML formats instead of proprietary formats, or formats such as MARC which have not gained wide acceptance outside of their development context. Since much material is already described in machine-readable form, the project worked on developing tools to convert such descriptions to XML and to integrate them with native XML data in order to build user-friendly websites and data archives. A comprehensive set of documents for the implementation of the prototype was selected. It contains a wide variety of documents, descriptions, formats and databases: standard and non-standard bibliographic records (including five different MARC formats), and four different structures for archive and museum finding aids and information in six different languages (Catalan, Italian, English, German, Spanish, Swedish). COVAX partners have implemented two different database models: ad hoc XML databases, or existing non-XML repositories. In the latter case, information is retrieved from the original database and transformed into XML format before presenting it to users. To summarize, COVAX is not only incorporating XML as a basic standard but also integrating other standards, and adapting them to XML. COVAX partners have implemented XML repositories using two software packages, Tamino from Software AG, a COVAX technical partner and TeXtML from IXIAsoft. Sites have been established in London, Rome, Salzburg, Graz and Madrid. COVAX will test the benefits of XML to encode and process cultural heritage information, explore the feasibility of converting existing cultural heritage descriptions into XML encoded information, adapt cultural information systems to user requirements and contribute to the extension of standards for presentation and dissemination of cultural heritage.