HCI Bibliography Home | HCI Conferences | ECDL Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
ECDL Tables of Contents: 9798990001020304050607080910

ECDL'97: Proceedings of the European Conference on Digital Libraries

Fullname:ECDL'97: Libraries and digital property rights: First European Conference
Editors:Carol Peters; Costantino Thanos
Location:Pisa, Italy
Dates:1997-Sep-01 to 1997-Sep-03
Publisher:Springer Berlin Heidelberg
Series:Lecture Notes in Computer Science 1324
Standard No:DOI: 10.1007/BFb0026717; ISBN: 978-3-540-63554-3 (print), 978-3-540-69597-4 (online); hcibib: ECDL97
Links:Online Proceedings | DBLP Contents
  1. Invited Talk
  2. Supporting User Interaction
  3. Metadata
  4. Information Retrieval I
  5. Architectures
  6. Multilingual Information Retrieval
  7. Structured Documents
  8. Information Retrieval II
  9. Case Studies

Invited Talk

Libraries and Digital Property Rights BIBAFull-Text 1-10
  Mark Stefik; Giuliana Lavendel
The realization of the digital library -- a computer system to enable anyone with a workstation to have access to any of the published works of mankind -- has stayed out of reach because of a presumed technical problem. Once a written work is digitized, it becomes so easy to make and distribute copyright infringing copies that publishers would go out of business. A technical solution to this problem based on trusted systems and digital property rights is now becoming available. The big issues for libraries -- social and institutional policy challenges -- are still ahead.
Object Database Support for Digital Libraries BIBAFull-Text 11-23
  Serge Abiteboul
In this paper, we discuss some aspects of database support for digital libraries.
   From a DL perspective, database systems, and in particular, object database systems provide a nice basis for future DL systems. More generally, database research provides solutions to many DL issues even if these are partial or fragmented. When possible, work should not be duplicated and good software and ideas should be reused. From a DB perspective, we want to stress that digital libraries propose beautiful applications and challenges to DBMS technology. They suggest a number of improvements to DBMSs that could be beneficial beyond DL applications.

Supporting User Interaction

Enhancing Community and Collaboration in the Virtual Library BIBAKFull-Text 25-40
  Rob Procter; Andy McKinlay; Ana Goldenberg; Elisabeth Davenport; Peter Burnhill; Sheila Cannell
The advent of the virtual library is usually presented as a welcome development for library users. Unfortunately, the emphasis which is often placed upon convenience of access tends to reinforce the perception of the use of information resources as a solitary activity. In fact, information retrieval (IR) in the conventional library is often a highly collaborative activity, involving users' peers and experts such as librarians. Failure in the design of virtual library services to take into account the ways in which physical spaces help engender a sense of community and facilitate collaboration will result in its users being denied timely and effective access to valuable sources of assistance.
   We report an investigation of collaboration issues in IR. We begin by defining a generic model of collaboration, and of collaborative spaces. Finally, we describe the design of a prototype multimedia-based system intended to facilitate a sense of community and collaboration between its users.
Keywords: information retrieval; collaboration; virtual library
Comprehension and Object Recognition Capabilities for Presentations of Simultaneous Video Key Frame Surrogates BIBAFull-Text 41-54
  Laura A. Slaughter; Ben Shneiderman; Gary Marchionini
The demand for more efficient browsing of video data is expected to increase as greater access to this type of data becomes available. This experiment looked at one technique for displaying video data using key frame surrogates that are presented as a "slide show". Subjects viewed key frames for between one and four video clips simultaneously. Following this presentation, the subjects performed object recognition and gist comprehension tasks in order to determine human thresholds for divided attention between these multiple displays. It was our belief that subject performance would degrade as the number of slide shows shown simultaneously increased. For object recognition and gist comprehension tasks, a decrease in performance between the one slide show display and the two, three or four slide show displays was found. In the case of two or three video presentations, performance is about the same, and there remains adequate object recognition abilities and comprehension of the video clips. Performance drops off to unacceptable levels when four slide shows are displayed at once.


Automating the Construction of Authority Files in Digital Libraries: A Case Study BIBAFull-Text 55-71
  James C. French; Allison L. Powell; Eric Schulman; John L. Pfaltz
The issue of quality control has become increasingly important as more online databases are integrated into digital libraries. This can have a dramatic effect on the search effectiveness of an online system. Authority work, the need to discover and reconcile variant forms of strings in bibliographic entries, will become more difficult. Spelling variants, misspellings, translation and transliteration differences all increase the difficulty of retrieving information. This paper is a case study of our efforts to automate the creation of an authority file for authors' institutional affiliations in the Astrophysics Data System. The techniques surveyed here for the detection and categorization of variant forms have broader applicability and may be used to help automate authority work for other bibliographic fields.
Using Semantic, Geographical, and Temporal Relationships to Enhance Search and Retrieval in Digital Catalogs BIBAKFull-Text 73-86
  Klaus Tochtermann; Wolf-Fritz Riekert; Gerlinde Wiest; Jürgen Seggelke; Birgit Mohaupt-Jahr
The amount and quality of information available on the Internet increases steadily. To search for information, users are provided with search engines which often return unsatisfactory search results. Against this background, digital catalog systems are becoming more and more popular. Unlike earlier search engines, they contain information about information (meta-information) available on the Internet or in the holdings of digital libraries but not the information itself. Users can benefit from these systems in two ways depending on what information is modeled in them. Firstly, these systems allow for new types of queries; secondly, the quality of retrieval results is improved. This paper sets out how semantic, geographical, and temporal relationships can be integrated into digital catalog systems and how these relationships can be used to enhance search and retrieval processes in such systems. The presentation covers both concepts and a comprehensive description of a digital catalog system which is already used by environmental agencies.
Keywords: Digital Catalog Systems; Semantic; Geographical; Temporal Relationships; German Environmental Information Network
Metadata Repositories using PICS BIBAFull-Text 87-98
  Renato Iannella
Metadata is 'information about data'. That is, metadata describes some aspect of data on the Internet. There has been significant activity recently on defining the semantic and technical aspects of metadata for use on the Internet and the WWW. A number of metadata sets have been proposed together with the technological framework to support the interchange of metadata. These initiatives will have a dramatic effect on how the Web is indexed and will improve the discovery of resources on the Internet by a significant factor. This paper discusses the issue of the provision of a mechanism for a registry of metadata schemas. A proposal, using an enhanced version of PICS is presented. This will enable global interoperability across various extensible metadata sets.

Information Retrieval I

Relevance Feedback and Query Expansion for Searching the Web: A Model for Searching a Digital Library BIBAFull-Text 99-112
  Alan F. Smeaton; Francis Crimmins
A fully operational large scale digital library is likely to be based on a distributed architecture and because of this it is likely that a number of independent search engines may be used to index different overlapping portions of the entire contents of the library. In any case, different media, text, audio, image, etc., will be indexed for retrieval by different search engines so techniques which provide a coherent and unified search over a suite of underlying independent search engines are thus likely to be an important part of navigating in a digital library. In this paper we present an architecture and a system for searching the world's largest DL, the world wide web. What makes our system novel is that we use a suite of underlying web search engines to do the bulk of the work while our system orchestrates them in a parallel fashion to provide a higher level of information retrieval functionality. Thus it is our meta search engine and not the underlying direct search engines that provide the relevance feedback and query expansion options for the user. The paper presents the design and architecture of the system which has been implemented, describes an initial version which has been operational for almost a year, and outlines the operation of the advanced version.
Text Segmentation by Topic BIBAFull-Text 113-125
  Jay M. Ponte; W. Bruce Croft
We investigate the problem of text segmentation by topic. Applications for this task include topic tracking of broadcast speech data and topic identification in full-text databases. Researchers have tackled similar problems before but with different goals. This study focuses on data with relatively small segment sizes and for which within-segment sentences have relatively few words in common making the problem challenging. We present a method for segmentation which makes use of a query expansion technique to find common features for the topic segments. Experiments with the technique show that it can be effective.
Scalable Text Retrieval for Large Digital Libraries BIBAFull-Text 127-145
  David Hawking
It is argued that digital libraries of the future will contain terabyte-scale collections of digital text and that full-text searching techniques will be required to operate over collections of this magnitude. Algorithms expected to be capable of scaling to these data sizes using clusters of modern workstations are described. First, basic indexing and retrieval algorithms operating at performance levels comparable to other leading systems over gigabytes of text on a single workstation are presented. Next, simple mechanisms for extending query processing capacity to much greater collection sizes are presented, to tens of gigabytes for single workstations and to terabytes for clusters of such workstations. Query-processing efficiency on a single workstation is shown to deteriorate dramatically when data size is increased above a certain multiple of physical memory size. By contrast, the number of clustered workstations necessary to maintain a constant level of service increases linearly with increasing data size. Experiments using clusters of up to 16 workstations are reported. A non-replicated 20 gigabyte collection was indexed in just over 5 hours using a ten workstation cluster and scalability results are presented for query processing over replicated collections of up to 102 gigabytes.


Awareness Services for Digital Libraries BIBAFull-Text 147-171
  Arturo Crespo; Hector Garcia-Molina
We propose an architecture for Digital Library repositories where one or more data stores persistently hold the digital objects (e.g., documents), and interact with clients that perform indexing, replication, intellectual property management, revenue management, and other functions. One of the most critical components in such stores is the awareness mechanism, used to notify clients of inserted, deleted or changed objects. In this paper we survey the various awareness schemes (including snapshot, timestamp and log based), describing them all as variations of a single unified scheme. This makes it possible to understand their relative differences and strengths. In particular we focus on a signature-based awareness scheme that we believe is especially well suited for digital libraries, and show enhancements to improve its performance.
Towards a Common Infrastructure for Large-scale Distributed Applications BIBAFull-Text 173-193
  Christos Nikolaou; Manolis Marazakis; Dimitris Papadakis; Yiorgos Yeorgiannakis; Jakka Sairamesh
This paper discusses the requirements of current and emerging large-scale distributed applications and emphasizes the need for a common infrastructure to support them. A design for an infrastructure that aims at satisfying these requirements is presented. Moreover, it is shown how key aspects of important large-scale applications can exploit the services included in the proposed infrastructure. The paper concludes by discussing the current status of a prototype implementation and our research plan.
Machine Learning + On-line Libraries = IDL BIBAFull-Text 195-214
  Giovanni Semeraro; Floriana Esposito; Donato Malerba; Nicola Fanizzi; Stefano Ferilli
One of the current issues faced by information professionals is that of building digital libraries. In this context, two key points are represented by information capture, which involves complex pattern recognition problems, and integration of different DBMS technologies, in order to connect many libraries to form a unique virtual library. This paper presents IDL, a prototypical intelligent digital library service. IDL addresses both the problems mentioned above and proposes a solution for them: The former, by integrating learning tools and techniques in order to make effective, efficient and economically feasible the task of capturing the information that should be stored and indexed by content in a digital library; the latter, by defining a metaquery language which answers for the interoperability of the various digital libraries to be connected.

Multilingual Information Retrieval

Building a Multi-lingual Electronic Text Collection of Folk Tales as a Set of Encapsulated Document Objects: An Approach for Casual Users to Browse Multi-lingual Documents on the Fly BIBAFull-Text 215-231
  Myriam Dartois; Akira Maeda; Takehisa Fujita; Tetsuo Sakaguchi; Shigeo Sugimoto; Koichi Tabata
Folk tales are an important heritage of every nation. Electronic text collections of folk tales are meaningful information resources for people who wish to learn about foreign cultures and their languages. This paper describes an electronic text collection of old folk tales which was developed using a multilingual document browsing system called the MHTML browser system, a gateway service to help clients access and display WWW documents written in foreign or multiple languages that the client browser cannot display by itself. The MHTML browser system converts a WWW document into a form which contains the source text and the minimum set of font glyphs required to display the text. The converted document object is sent to the client with a set of applets which display the document on the client browser. Since the glyphs are sent to the client from the MHTML gateway, the client does not need to have installed the fonts for the multilingual document, provided that the client is Java-enabled. The folk tale collection currently includes ten old Japanese folk tales. Each tale is written in English, French, and Japanese, and the user can show the three texts of a tale simultaneously on his/her WWW browser, e.g., Netscape Navigator and Internet Explorer. Thus, a consumer user utilizing an off-the-shelf WWW browser can get a multilingual document on the fly without any additional procedures to set up his/her environment. In this paper, we first discuss the technological background of MHTML and the multilingual browser service for the digital library, as well as the issues involved in building the folk tale collection.
Automated Indexing with Thesaurus Descriptors: A Co-occurence Based Approach to Multilingual Retrieval BIBAFull-Text 233-252
  Reginald Ferber
Indexing documents with descriptors from a multilingual thesaurus is an approach to multilingual Information Retrieval. However, manual indexing is expensive. Automated indexing methods in general use terms found in the document. Thesaurus descriptors are complex terms that are often not used in documents or have specific meanings within the thesaurus; therefore most weighting schemes of automated indexing methods are not suited to select thesaurus descriptors.
   In this paper a linear associative system is described that uses similarity values extracted from a large corpus of manually indexed documents to construct a rank ordering of the descriptors for a given document title. The system is adaptive and has to be tuned with a training sample of records for the specific task.
   The system was tested on a corpus of some 80,000 bibliographic records. The results show a high variability with changing parameter values. This indicates that it is very important to empirically adapt the model to the specific situation it is used in. The overall median of the manually assigned descriptors in the automatically generated ranked list of all 3,631 descriptors is 14 for the set used to adapt the system and 11 for a test set not used in the optimization process. This result shows that the optimization is not a fitting to a specific training set but a real adaptation of the model to the setting.
Cross-Language Information Retrieval in a Multilingual Legal Domain BIBAFull-Text 253-268
  Paraic Sheridan; Martin Braschler; Peter Schäuble
We describe here the application of a cross-language information retrieval technique based on similarity thesauri in the domain of Swiss law. We present the theory of similarity thesauri, which are information structures derived from corpora, and show how they can be used for cross-language retrieval. We also discuss the collections of Swiss legal documents and show how we have used them to construct an environment in which we can directly evaluate the performance of our cross-language retrieval system. Evaluation shows that cross-language retrieval works equally as well as monolingual retrieval in the best case. We conclude that providing cross-language access to digital libraries is already a viable possibility.

Structured Documents

The Digital Library and Computational Philology: The BAMBI Project BIBAKFull-Text 269-285
  Andrea Bozzi; Sylvie Calabretto
The work presented in this paper has been developed within a European project called BAMBI. It enhances the accessibility of ancient manuscripts and presents new ways of working with them. More precisely, the BAMBI project aims to produce a software tool allowing historians, and more particularly codicologists and philologists, to read manuscripts, write annotations, and navigate between the words of the transcription and the matching piece of image in the digitized picture of the manuscript.
   In the first part of this paper, we present the functions and the design of a Hypermedia Workstation. In the second part, we describe how HyTime (Hypermedia/Time-based Structured Language) can be used as a modelling language to describe work on manuscripts (annotations, links, ...). We present relevant parts of the HyTime model and prove that the model thus obtained can also serve as a basis for implementation.
Keywords: Ancient Manuscripts; Digital Library; Hypermedia; HyTime; Philological Workstation
Multivalent Annotations BIBAFull-Text 287-303
  Thomas A. Phelps; Robert Wilensky
Paper is still preferred to digital document systems for tasks involving annotating, folding, juxtaposing, or otherwise treating the document as a tactile object. Based on the Multivalent Documents model, Multivalent annotations bring to digital documents of potentially any source format, from PostScript to SGML, an open ended variety of user-extensible, sharable manipulations. Several very different forms of distributed annotation based on this model have been implemented. The Multivalent framework composes together annotations of any type, which can result in novel, useful combinations.
A Semantic Network Approach to Semi-Structured Documents Repositories BIBAFull-Text 305-324
  Vassilis Christophides; Martin Doerr; Irini Fundulaki
Using database technology for the administration of digital libraries offers many advantages in a multi-user and distributed environment. However, conventional DBMS are not particularly suited to manage semi-structured data with heterogeneous, irregular, evolving structures as in the case of SGML documents found in digital libraries. To overcome the difficulties imposed by the rigid schema of conventional systems, several schema-less approaches have been proposed. Using instead unconstrained, extensible schemata offered by object-oriented semantic network systems, we are able both to map document specific structures as database classes, and to model the associated constraint information as integrated schema annotations. In this paper we present the benefits of this approach to create, access and process heterogeneous SGML documents, and in particular to exploit the shared semantics of evolving SGML structures. A respective application is currently being implemented in the context of the AQUARELLE project.

Information Retrieval II

Modelling the Retrieval of Structured Documents Containing Texts and Images BIBAFull-Text 325-344
  Carlo Meghini; Fabrizio Sebastiani; Umberto Straccia
We present a model for complex documents possibly consisting of a hierarchically structured set of images or texts. Documents are represented both at the form level (as sets of physical features of the representing objects), at the content level (as sets of properties of the represented entities), and at the structure level. A uniform and powerful query language allows queries to be issued that transparently combine features pertaining to form, content and structure alike. Queries are expressions of a (fuzzy) logical language. While that part of the query that pertains to (medium-independent) content is "directly" processed by an inferential engine, that part that pertains to (medium-dependent) form is entrusted to specialised document processing procedures linked to the logical language by a procedural attachment mechanism. The model thus combines the power of state-of-the-art document processing techniques with the advantages of a clean, logically defined framework for understanding multimedia document retrieval.
Probabilistic Retrieval of OCR Degraded Text Using N-Grams BIBAFull-Text 345-359
  Stephen M. Harding; W. Bruce Croft; C. Weir
The retrieval of OCR degraded text using n-gram formulations within a probabilistic retrieval system is examined in this paper. Direct retrieval of documents using n-gram databases of 2 and 3-grams or 2, 3, 4 and 5-grams resulted in improved retrieval performance over standard (word based) queries on the same data when a level of 10 percent degradation or worse was achieved. A second method of using n-grams to identify appropriate matching and near matching terms for query expansion which also performed better than using standard queries is also described. This method was less effective than direct n-gram query formulations but can likely be improved with alternative query component weighting schemes and measures of term similarity. Finally, a web based retrieval application using n-gram retrieval of OCR text and display, with query term highlighting, of the source document image is described.

Case Studies

Deposit for Dutch Electronic Publications: Research and Practice in The Netherlands BIBAFull-Text 361-373
  Trudi C. Noordermeer
The objective of this article is to describe the state-of-affairs of the Deposit for Dutch Electronic Publications which is organized by the Koninklijke Bibliotheek, the National Library of The Netherlands. It describes in general the results and actual status of the research which is carried out in the period April 1996 - December 1997. Research topics are e.g. selection, acquisition, bibliographical and technical description, unique identification, migration, storage, authenticity and the experience with a limited number of test records to define the workflow. Further, tests with publishers like Elsevier Science and Kluwer Academic Publishers are described. The objective of the Deposit for Dutch Electronic Publications is to preserve electronic off-line and on-line documents, publications, for the remote future, as a last resort.
Charging for a Digital Library -- The Business Model and the Cost Models of the MeDoc Digital Library BIBAFull-Text 375-385
  Michael Breu; Ricarda Weber
MeDoc is a German digital library project bringing together 7 developing institutions and 24 pilot user institutions as well as 12 international publishing houses. McDoc provides uniform access to a variety of information sources and an information broker service. McDoc offers a range of billable digital books and journals contributed by the participating publishing houses.
   Operating a digital library has not only many technical but also important economical aspects. The contents of a digital library can be regarded as information merchandise just like paper books or journals bought in a book store. In order to encourage publishing houses to contribute their books and journals to digital libraries, suitable business models must be defined. New innovative cost models, like floating licenses or fine grained pay per view, become both necessary and feasible in network based digital libraries.
   This paper introduces the McDoc business model and discusses various cost models and their applicability to the services of a digital library. It gives an overview of the McDoc license and pricing policy and the applied cost models.
Bibliothèque Nationale de France's Audiovisual System: Digital Audio, Video, and Photo Consultation in a Library BIBAFull-Text 387-403
  Sylvie Mony
Digital audio, video and photo have become an operationnal service and a reality very appreciated by users, at the Bibliothèque nationale de France since the 20th of December 1996, date of its opening to general public.
   In December, in the Audiovisual Room of the General Public level, readers could consult on 45 audiovisual workstations digitized audiovisual materials including 120 hours of video (documentaries), 250 hours of audio (interviews and music), and 50,000 photos. The increase of these digitalized collections will be possible until the full capacity of servers: 300 hours for video, 500 hours for audio, and 300,000 photos.
   In this communication, we describe the reasons why the Bibliothèque nationale de France chose a digital system to communicate a part of its audiovisual collections; the stages of setting up this audiovisual system, and the first lessons to glean after 6 months in service.
The Electronic Colloquium on Computational Complexity (ECCC): A Digital Library in Use BIBAFull-Text 405-421
  Jochen Bern; Carsten Damm; Christoph Meinel
The Electronic Colloquium on Computational Complexity (ECCC) is a digital library that specifically addresses the current problem of scientific publishing, more precisely, the problem of presenting suitably filtered work to other researchers, for the field of computational complexity. Developing the detailed concepts in discussions with a scientific board of researchers in this field, ECCC now fills the gap between author controlled electronic publication (preprint servers, very fast but lacking content filtering) and conventional journal or conference proceedings publication (currently taking months, if not over a year, from submission to publication). Additionally, like a real colloquium, ECCC supports ongoing discussions through the publication of comments to already published material. Further authors have the possibility to present improved versions of their publications while maintaining bibliographic consistency by version control.
   In this paper, we will first describe the situation ECCC is meant to remedy (Sections 1 and 2) and then detail the setup with respect to organization (3.1), basic functionality (3.2 through 3.4), cooperation with other services (3.5) and plans for the future (3.6).