ECDL'99: Proceedings of the European Conference on Digital Libraries

Fullname:ECDL'99: Research and Advanced Technology for Digital Libraries: Third European Conference
Editors:Serge Abiteboul; Anne-Marie Vercoustre
Location:Paris, France
Dates:1999-Sep-22 to 1999-Sep-24
Publisher:Springer Berlin Heidelberg
Series:Lecture Notes in Computer Science 1696
Standard No:DOI: 10.1007/3-540-48155-9; ISBN: 978-3-540-66558-8 (print), 978-3-540-48155-3 (online); hcibib: ECDL99
Links:Online Proceedings | DBLP Contents
  1. Invited Talks
  2. Image Categorisation and Access
  3. Audio and Video in Digital Libraries
  4. Information Retrieval
  5. User Adaptation
  6. Knowledge Sharing
  7. Cross Language
  8. Case Studies
  9. Modelling, Accessibility and Connectness

Invited Talks

Challenges for the Web: Universality and Scalability BIBAFull-Text 1
  Jean-François Abramatic
The Web is becoming the universal information space that was envisioned by its inventor, Tim Berners-Lee. To reach its full potential, the Web needs to face two major challenges: Universality and Scalability. Universality means that anybody should be able to access and publish information on the Web. Therefore, the Web should take into account the vast differences in culture, education, ability, material resources, and physical limitations of users on all continents. Scalability means that while millions of services are deployed on the Web, the infrastructure should be able to ensure that performance, trust and relevance keep developing.
   The talk will present achievements and work in progress at the World Wide Web Consortium (W3C) that address the challenges facing the Web.
The UC Berkeley Digital Library Project: Re-thinking Scholarly Information Dissemination and Use BIBAFull-Text 2
  Robert Wilensky
Information technology is not merely provided enhanced versions of services of the sort we have come to expect from libraries; it is inducing a fundamental change in the way information is created, disseminated, and used. The shift from the current centralized, discrete publishing model, toward a distributed, continuous, and self-publishing model, is already underway. However, left to its own devices, some of the better aspects of the current model, such as peer review, may be compromised, even as the opportunity for new services is afforded. Effort will also be required to provide first class support in the emerging infrastructure for data that are not textual in nature, such as images, videos, maps, and scientific data sets.
   Many tools and technologies will be useful in enhancing and exploiting this view of the emerging information infrastructure. One set of tools relates to document technologies. "Multivalent Documents" is a new model of documents that seems useful in this context. The multivalent document model is (i) highly open, meaning that is supports an open-ended variety of document formats and functions, (ii) highly extensible, meaning that it can be extended and customized in novel ways and to meet particular user needs, and (iii) highly distributed, meaning that components of a document may exist as separate networked resources, which are combined dynamically into a coherent documents. A particularly attractive aspect of the model is the manner in which it supports "spontaneous collaboration", the ability of a user to annotate web pages, scanned images, and other networked, resources for which that user has no privileged relation.
   Multivalent documents address some issues in manipulating on-line resources. Finding those resources is still problematic, especially for those in image form. "Automatic content analysis" is the set of techniques for analyzing the content of information objects so as to facilitate their subsequent access. We present some recent developments in this area for accessing document images, photographs, and text.

Image Categorisation and Access

Image and Metadata Distribution at Seven University Campuses: Reports from a Study of the Museum Educational Site Licensing Project BIBAFull-Text 3-18
  Howard Besser; Rosalie Lack
This paper summarizes the major findings of a University of California study of the Museum Educational Site Licensing Project (MESL) -- the first large-scale multi-institutional image and metadata distribution experiment in the US. The study examined the costs and social impacts of distributing a large body of digital images and metadata from a set of different museums to universities. Among the findings are that the digital distribution environment, as a whole, appears to be good for individual image usage, but is problematic for group viewing situations such as classrooms. Impediments to widespread adoption include: lack of comprehensive content, absence of necessary tools to facilitate use, and inadequate recognition and support for faculty who adopt new technology in their teaching. Other key issues that still need to be addressed include: integration of consortia-provided images and metadata with images acquired elsewhere; allowing instructors to change descriptive information or annotate images; encouraging the creation of added-value tools; and providing particular user interfaces or new integrated tools. The study also compared the cost of digital distribution to the costs of running an analog slide library.
Text-Based Approaches for the Categorization of Images BIBAFull-Text 19-38
  Carl L. Sable; Vasileios Hatzivassiloglou
The rapid expansion of multimedia digital collections brings to the fore the need for classifying not only text documents but their embedded non-textual parts as well. We propose a model for basing classification of multimedia on broad, non-topical features, and show how information on targeted nearby pieces of text can be used to effectively classify photographs on a first such feature, distinguishing between indoor and outdoor images. We examine several variations to a TF*IDF-based approach for this task, empirically analyze their effects, and evaluate our system on a large collection of images from current news newsgroups. In addition, we investigate alternative classification and evaluation methods, and the effect that a secondary feature can have on indoor/outdoor classification. We obtain a classification accuracy of 82%, a number that clearly outperforms baseline estimates and competing image-based approaches and nears the accuracy of humans who perform the same task with access to comparable information.
Metadata for Photographs: From Digital Library to Multimedia Application BIBAFull-Text 39-57
  Anne-Marie Vercoustre; François Paradis
This paper describes the production of an educational multimedia CD-ROM about French rural houses and farms, and how to renovate them without losing their traditional features. The educational message is illustrated with many photographs of non-renovated or renovated houses, and made explicit through comments and descriptions associated with the photos. The paper focuses on the XML metadata describing the photos and the use of this metadata for the automatic generation of Web pages. We first report on the usability of the Dublin Core for interoperable photographs metadata, together with more detailed XML descriptions to support a specific multimedia application. We then show how to generate the Web pages by defining HTML document prescriptions which embed queries to the XML metadata, using Norfolk, a virtual document generator. The approach can be used in various applications ranging from personal virtual photo albums to complex virtual museum.

Audio and Video in Digital Libraries

Audiovisual Cultural Heritage: From TV and Radio Archiving to Hypermedia Publishing BIBAFull-Text 58-75
  Gwendal Auffret; Bruno Bachimont
Abstract. In this article, we present a model of digital audiovisual (AV) library. We describe how AV library users need to be provided not only with accurate and efficient ways to retrieve images and sounds, but also with new environments allowing to read and interpret these images and sounds as AV documents. We show how library users perform an active reading of documents by contextualizing them using corpora of structured meta-information. This documentation consists of documents elaborated from previous readings of this AV content, such as producers' files, critics, etc. It provides a good alternate representation as defined in [34]. We propose a model allowing library users to read AV documents not only along their documentation but from their documentation. This model is based on concepts from the electronic publishing world: it defines different levels of editorial control over the semantics, the structure and the layout of documentation and, in the end, allows the automatic generation of hypermedia applications, which we can be used as a new and efficient AV reading environment by library users. We also describe a prototype implementing parts of this model.
An Indexing, Browsing, Search and Retrieval System for Audiovisual Libraries BIBAFull-Text 76-91
  Jane Hunter; Jan Newmarch
This paper describes an application which enables the computer-assisted generation of Dublin Core-based metadata descriptions and online digital visual summaries for videos. It is a Java application which integrates a video replay window with vcr-type controls and metadata input forms generated from an hierarchical RDF schema. The schema definition is also used to validate the descriptions input by the user and control the format of the output. The generated metadata descriptions can be saved as RDF, HTML or to a database. They can be used to enable metadata interchange, searching across the Internet or dynamic generation of detailed visual summaries for video browsing. This prototype system has been developed for the State Library of Queensland's (SLQ) Audiovisual unit to enable quick, easy, cost-effective generation of standardized metadata which can be used to create online detailed visual summaries of the latest video acquisitions.
Music Structure Analysis and Its Application to Theme Phrase Extraction BIBAFull-Text 92-105
  Atsuhiro Takasu; Takashi Yanase; Teruhito Kanazawa; Jun Adachi
Music is an important component of digital libraries. This paper discusses a digital music library from the information retrieval viewpoint and proposes a method for extracting theme phrases. These are then used to present a shorter version of retrieved music to users. The method consists of two steps, phrase extraction and syntactical classification of segmented fragments of melodies. Phrase extraction is carried out based on a few heuristic rules. We conducted an experiment on the accuracy of phrase extraction using 94 Japanese popular songs and obtained 0.766 recall and 0.786 precision. The syntactical classification is based on a probabilistic syntactical pattern analysis combining classification and syntactical analysis. The proposed method uses a decision tree and a finite state automaton and obtained 0.884 accuracy in theme phrase extraction.

Information Retrieval

Effectiveness of Keyword-Based Display and Selection of Retrieval Results for Interactive Searches BIBAFull-Text 106-125
  Ezio Berenci; Claudio Carpineto; Vittorio Giannini; Stefano Mizzaro
We present an approach to increasing the effectiveness of ranked-output retrieval systems that relies on graphical display and user manipulation of "views" of retrieval results, where a view is the subset of retrieved documents that contain a specified subset of query terms. This approach has been implemented in a system named VIEWER (VIEwing WEb Results), acting as an interface to available search engines. An experimental evaluation of the performance of VIEWER in contrast to AltaVista is the major focus of the paper. We first report the results of an experiment on single, short query searches where VIEWER, used as an interactive ranking system, markedly outperformed AltaVista. We then concentrate on a more realistic searching scenario, involving free query formulation, unconstrained selection of retrieval results, and possibility of query reformulation. We report the results of an experiment where the use of VIEWER, compared to AltaVista, seemed to shift the user effort from inspection to evaluation of results, increasing retrieval effectiveness and user satisfaction. In particular, we found that the VIEWER users retrieved half as many non-relevant documents as the AltaVista users while retrieving a comparable number of relevant documents.
Towards More Effective Techniques for Automatic Query Expansion BIBAFull-Text 126-141
  Claudio Carpineto; Giovanni Romano
Techniques for automatic query expansion from top retrieved documents have recently shown promise for improving retrieval effectiveness on large collections but there is still a lack of systematic evaluation and comparative studies. In this paper we focus on term-scoring methods based on the differences between the distribution of terms in (pseudo-)relevant documents and the distribution of terms in all documents, seen as a complement or an alternative to more conventional techniques. We show that when such distributional methods are used to select expansion terms within Rocchio's classical reweighting scheme, the overall performance is not likely to improve. However, we also show that when the same distributional methods are used to both select and weight expansion terms the retrieval effectiveness may considerably improve. We then argue, based on their variation in performance on individual queries, that the set of ranked terms suggested by individual distributional methods can be combined to further improve mean performance, by analogy with ensembling classifiers, and present experimental evidence supporting this view. Taken together, our experiments show that with automatic query expansion it is possible to achieve performance gains as high as 21.34% over non-expanded query (for non-interpolated average precision). We also discuss the effect that the main parameters involved in automatic query expansion, such as query difficulty, number of selected documents, and number of selected terms, have on retrieval effectiveness.
Predicting Indexer Performance in a Distributed Digital Library BIBAFull-Text 142-166
  Naomi Dushay; James C. French; Carl Lagoze
Resource discovery in a distributed digital library poses many challenges, one of which is how to choose search engines for query distribution, given a query and a set of search engines. This paper focuses on search engine performance as a criterion for search engine selection and defines two measurements of search engine performance: availability -- will the search engine respond within a time limit, and response time -- how quickly will the search engine respond, given that it responds at all. We predicted both of these performance characteristics with a variety of algorithms, all of which required little computation time and combined past performance data for each search engine into a succinct record. We used operational data from the NCSTRL distributed digital library to make and evaluate predictions, and we found that simple prediction methods performed as well as more complex methods and that prediction accuracy was closely related to data consistency.

User Adaptation

Design Guidelines and User-Centred Digital Libraries BIBAFull-Text 167-183
  Yin Leng Theng; Elke Duncker; Norliza Mohd-Nasir; George Buchanan; Harold W. Thimbleby
As current digital libraries are becoming more complex, the facilities provided by them will increase and the difficulty of learning associated with the complexity of using these facilities will also increase. In order to produce usable and useful interactive systems, designers need to ensure that good design features are incorporated into the systems, taking into consideration end-users' needs and cultural backgrounds. We carried out a study to investigate useful design features digital libraries should have. The study provides insights on the usability impact of digital libraries for task completion and end-users' perceived impressions on the effectiveness of the digital libraries. The results also suggest that there is little provision on the interface to cater to end-users' browsing and inter-cultural needs. Hence, this paper also discusses design guidelines for the design of user-centred digital libraries.
User Profile Modeling and Applications to Digital Libraries BIBAFull-Text 184-197
  Giuseppe Amato; Umberto Straccia
The ultimate goal of an information provider is to satisfy the user information needs. That is, to provide the user with the right information, at the right time, through the right means. A prerequisite for developing personalised services is to rely on user profiles representing users' information needs. In this paper we will first address the issue of presenting a general user profile model. Then, the general user profile model will be customised for digital libraries users.
Using and Evaluating User Directed Summaries to Improve Information Access BIBAFull-Text 198-214
  Manuel J. Maña López; Manuel de Buenaga Rodríguez; José María Gómez Hidalgo
Textual information available has grown so much as to make necessary to study new techniques that assist users in information access (IA). In this paper, we propose utilizing a user directed summarization system in an IA setting for helping users to decide about document relevance. The summaries are generated using a sentence extraction method that scores the sentences performing some heuristics employed successfully in previous works (keywords, title and location). User modeling is carried out exploiting user's query to an IA system and expanding query terms using WordNet. We present an objective and systematic evaluation method oriented to measure the summary effectiveness in two IA significant tasks: ad hoc retrieval and relevance feedback. Results obtained prove our initial hypothesis, i.e., user adapted summaries are a useful tool assisting users in an IA context.

Knowledge Sharing

Pharos, a Collaborative Infrastructure for Web Knowledge Sharing BIBAFull-Text 215-233
  Vincent Bouthors; Olivier Dedieu
Finding relevant information is one of the biggest problems that Web users experience. This article describes Pharos, a new service that has been developed to help groups of Web users share their knowledge about interesting documents. Pharos relies on a collaborative infrastructure which allows user groups to index and evaluate documents on specific topics. This information, possibly subjective, is synthesized to produce personalized recommendations. Scalability is handled by distributing servers and replicating their databases. Pharos has been implemented in Java and is currently being evaluated.
Integrating Ontologies and Thesauri to Build RDF Schemas BIBAFull-Text 234-253
  Bernd Amann; Irini Fundulaki
In this paper we present a new approach for building RDF schemas by integrating existing ontologies and structured vocabularies (thesauri). We will present a simple mechanism based on the specification of inclusion relationships between thesaurus terms and ontology concepts and show how these relationships can be exploited to create application-specific RDF schemas incorporating the structural views of ontologies and deep classification schemes provided by thesauri.
Dynamic Use of Digital Library Material -- Supporting Users with Typed Links in Open Hypermedia BIBAFull-Text 254-273
  Klaus Marius Hansen; Christian Yndigegn; Kaj Grønbæk
This paper introduces a novel approach to supporting digital library users in organising and annotating material. We have extended the concept of open hypermedia by introducing typed links, which support: addition of (user-defined) semantics to hypertexts, user navigation, and machine supported analysis and synthesis of hypermedia structures. The Webvise open hypermedia system is integrated with the World Wide Web, and has been augmented with a type system. We illustrate the potential use in the context of digital libraries with a scenario of teachers jointly preparing a course based on digital library material.

Cross Language

Disambiguation Strategies for Cross-Language Information Retrieval BIBAKFull-Text 274-293
  Djoerd Hiemstra; Franciska de Jong
This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of disambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching.
Keywords: Cross-Language Information Retrieval; Statistical Machine Translation
Crosslingual Interrogation of Multilingual Catalogs BIBAFull-Text 294-310
  Christian Fluhr; Dominique Schmit; C. Andrieux; Ph. Ortet; Frédérique Bisson; V. Combet
In this paper, we describe a crosslingual Information Retrieval System (IRS), which makes the interrogation of multilingual databases possible. Indeed, the CEA needs to be able to process an important amount of multilingual databases and documents, and so we had to adapt the IRS we use, SPIRIT (which relies on linguistical and statistical processing) to this situation. We have thus set up a crosslingual interrogation system based on the indexation of documents containing parts written in different languages and on the bilingual reformulation of the query. The latter tries all the possible translations for every significant word of the query and the documents are used as filters in case of uncertainty or ambiguity. The answers to a query are given in the form of a list of classes of documents ranked according to their relevance. This paper describes the application of these techniques to crosslingual access to catalogs and bibliographic databases.
Term Similarity-Based Query Expansion for Cross-Language Information Retrieval BIBAFull-Text 311-322
  Mirna Adriani; C. J. van Rijsbergen
We propose a query expansion technique which is based on a statistical similarity measure among terms to improve the effectiveness of the dictionary-based cross-language information retrieval (CLIR) method. We employ a term similarity-based sense disambiguation technique proposed in our earlier work to enhance the accuracy of the dictionary-based query translation method. The query expansion technique is then applied to the translation of queries to further improve their retrieval performance. We demonstrate the effectiveness of the two techniques combined using queries in three languages, namely, German, Spanish, and Indonesian, to retrieve English documents from a standard TREC (Text Retrieval Conference) collection. The results of our experiments indicate that the term similarity-based techniques work better when there are more phrases in the queries. In addition, our results also re-emphasize other researchers' finding that phrase recognition and translation are critical to CLIR's effectiveness.

Case Studies

The SOMLib Digital Library System BIBAKFull-Text 323-342
  Andreas Rauber; Dieter Merkl
Digital Libraries have gained tremendous interest with several research projects addressing the wealth of challenges in this field. While computational intelligence systems are being used for specific tasks in this arena, the majority of projects relies on conventional techniques for the basic structure of the library itself. With the SOMLib project we created a digital library system that uses a neural network-based core for the representation of the library. The self-organizing map, a popular unsupervised neural network model, is used to topically structure a document collection similar to the organization of real-world libraries. Based on this core, additional modules provide information retrieval features, integrate distributed libraries, and automatically label the various topical sections in the document collection. A metaphor graphics based interface further assists the user in intuitively understanding the library providing an instant overview.
Keywords: Self-Organizing Map (SOM); Document Clustering; Learning; Distributed Digital Libraries; Dublin Core Metadata; Metaphor Graphics; Visualization
Developing a European Technical Reference Digital Library BIBAFull-Text 343-362
  Antonella Andreoni; Maria Bruna Baldacci; Stefania Biagioni; Carlo Carlesi; Donatella Castelli; Pasquale Pagano; Carol Peters
The development of a European digital library for grey literature is described. The aim has been to provide a digital library for scientists working in the areas of information science and applied mathematics and also to build a test-bed for research activities. The service has been implemented as part of NCSTRL (the US Networked Computer Science Technical Reference Library) and developed, extending the Dienst system used by NCSTRL, to meet the requirements of the European scientific community. The additional functionality is described and the difficulties encountered when trying to extend an existing architecture, protocol and system are discussed.
Issues in the Development and Operation of a Digital Library BIBAFull-Text 363-382
  Sarantos Kapidakis
This paper briefly describes both organizational and technical issues and approaches involved in creating an operational digital library at the University of Crete, found at http://dlib.libh.uoc.gr. We investigate and describe our approaches and experiences, the last few years, on setting in operation a Digital Library with many collections. We had to analyze the library goals and user needs, to select appropriate software, to make flexible design for the additional functionality needed, to adapt and extend the selected software to make it applicable to the current demands, to install and configure the software, to improve it using feedback, and to interact with document authors and librarians to make the digital library friendly, usable and easily maintainable, and even to collect and digitize the library material. The final system is operated by current library personnel.
   The main technical issues are related to the design, implementation and application of features of digital libraries, such as multilingual storage and interface, generalization of the software to permit searching on heterogeneous collections, adding support for the Z39.50 protocol and tools that simplify the configuration, administration and data insertion to the digital library, as well as tools to input or modify the metadata and to upload data, when submitting new documents in the digital library.

Modelling, Accessibility and Connectness

Declarative Specification of Z39.50 Wrappers Using Description Logics BIBAFull-Text 383-402
  Yannis Velegrakis; Vassilis Christophides; Panos Constantopoulos
Z39.50 is a client/server protocol widely used in digital libraries and museums for searching and retrieving information spread over a number of heterogeneous sources. To overcome semantic and schematic discrepancies among the various data sources the protocol relies on a world view of information as a flat list of fields, called Access Points (AP). One of the major issues for building Z39.50 wrappers is to map this unstructured list of APs to the underlying source data. Unfortunately, existing Z39.50 wrappers have been developed from scratch and they do not provide high-level mapping languages with verifiable properties. In this paper, we propose a Description Logic based toolkit for the declarative specification of Z39.50 wrappers. We claim that the conceptualization of AP mappings enables a formal validation of the query translation quality and therefore ensures the quality of the retrieved data. Finally, it allows to tackle a number of Z39.50 pending issues (e.g., metadata retrieval, query failures due to unsupported APs, etc.) by enriching the generated Z39.50 wrappers with a number of added-value services such as conceptual structuring of flat Z39.50 vocabularies and intelligent Z39.50 query assists.
PIA -- A Generic Model and System for Interactive Product and Service Catalogs BIBAFull-Text 403-422
  Florian Matthes; Ulrike Steffens
This text motivates and defines a generic model for interactive (online or offline) product catalogs. Based on a detailed requirements analysis, the data model is defined using an object-oriented design notation and the query language for expressing customer interests on the catalog is defined using techniques from fuzzy set theory. The model provides the basis for the implementation of a generic, highly-interactive catalog management system which is designed to be interfaced with relational databases, information-retrieval engines and special-purpose index structures.
Representing Scholarly Claims in Internet Digital Libraries: A Knowledge Modelling Approach BIBAFull-Text 423-442
  Simon Buckingham Shum; Enrico Motta; John Domingue
This paper is concerned with tracking and interpreting scholarly documents in distributed research communities. We argue that current approaches to document description, and current technological infrastructures particularly over the World Wide Web, provide poor support for these tasks. We describe the design of a digital library server which will enable authors to submit a summary of the contributions they claim their documents makes, and its relations to the literature. We describe a knowledge-based Web environment to support the emergence of such a community-constructed semantic hypertext, and the services it could provide to assist the interpretation of an idea or document in the context of its literature. The discussion considers in detail how the approach addresses usability issues associated with knowledge structuring environments.
The Small World Web BIBAFull-Text 443-452
  Lada A. Adamic
I show that the World Wide Web is a small world, in the sense that sites are highly clustered yet the path length between them is small. I also demonstrate the advantages of a search engine which makes use of the fact that pages corresponding to a particular search query can form small world networks. In a further application, the search engine uses the small-worldness of its search results to measure the connectedness between communities on the Web.
SODA: Smart Objects, Dumb Archives BIBAFull-Text 453-464
  Michael L. Nelson; Kurt Maly; Mohammad Zubair; Stewart N. T. Shen
We present the Smart Object, Dumb Archive (SODA) model for digital libraries (DLs). The SODA model transfers functionality traditionally associated with archives to the archived objects themselves. We are exploiting this shift of responsibility to facilitate other DL goals, such as interoperability, object intelligence and mobility, and heterogeneity. Objects in a SODA DL negotiate presentation of content and handle their own terms and conditions. In this paper we present implementations of our smart objects, buckets, and our dumb archive (DA). We discuss the status of buckets and DA and how they are used in a variety of DL projects.