ECDL 2008: Proceedings of the European Conference on Digital Libraries

Fullname:ECDL 2008: Research and Advanced Technology for Digital Libraries: 12th European Conference
Editors:Birte Christensen-Dalsgaard; Donatella Castelli; Bolette Ammitzbøll Jurik; Joan Lippincott
Location:Aarhus, Denmark
Dates:2008-Sep-14 to 2008-Sep-19
Publisher:Springer Berlin Heidelberg
Series:Lecture Notes in Computer Science 5173
Standard No:DOI: 10.1007/978-3-540-87599-4; ISBN: 978-3-540-87598-7 (print), 978-3-540-87599-4 (online); hcibib: ECDL08
Links:Online Proceedings | DBLP Contents
Best Paper

Improving Placeholders in Digital Documents BIBAKFull-Text 1-12
  George Buchanan; Jennifer Pearson
Placeholders in physical documents provide critical support for the human reader in relocating material and their place in the text. However, the equivalent tools in digital documents have long been identified as suffering from unintuitive interactions and low rates of use. This paper evaluates the current bookmarking technologies found in digital document readers, and identifies a number of specific and significant shortcomings in their support for user activity. We introduce some simple interactions that close the gap between user requirements and the placeholder support in a simple document reader program. Through this, we demonstrate that improved interactions can be created that reduce the barriers that inhibit placeholder use in digital documents.
Keywords: Digital Libraries; Interaction Design; Document Triage

Best Student Paper

Towards Ontology-Based Chinese E-Government Digital Archives Knowledge Management BIBAKFull-Text 13-24
  Ying Jiang; Hui Dong
This paper focuses on the problem of e-Government digital archives management in China. It firstly depicts the background of e-Government progress of China, and then points out the knowledge utilization challenge of e-Government digital archives. What's following is an introduction of a project, which aims at making digital archives in a provincial archives bureau easy to use for civil servants. The main approach of this project is ontology related technology, including the building of knowledge base and the realization of knowledge retrieval system. It's actually a knowledge management solution for digital archives.
Keywords: Digital Archives; Chinese E-Government; Ontology; Ontology Molecule; Knowledge Management

Digital Preservation

Distributed Preservation Services: Integrating Planning and Actions BIBAFull-Text 25-36
  Christoph Becker; Miguel Ferreira; Michael Kraxner; Andreas Rauber; Ana Alice Baptista; José Carlos Ramalho
Digital preservation has turned into an active field of research. The most prominent approaches today are migration and emulation; especially considering migration, a range of working tools is available, each with specific strengths and weaknesses. The decision process on which actions to take to preserve a given set of digital objects for future access, i.e., preservation planning, is usually an ad-hoc procedure with little tool support and even less support for automation.
   This paper presents the integration of tools and services for object migration and characterization through a service oriented architecture into a planning tool called Plato, thus creating a distributed and highly automated preservation planning environment.
Archive Design Based on Planets Inspired Logical Object Model BIBAFull-Text 37-40
  Eld Zierau; Anders Sewerin Johansen
This paper describes a proposal for a logical data model based on preliminary work within the Planets project. In OAIS terms the main areas discussed are related to the introduction of a logical data model for representing the past, present and future versions of the digital object associated with the Archival Storage Package for the publications deposited by our client repositories.
Significant Characteristics to Abstract Content: Long Term Preservation of Information BIBAKFull-Text 41-49
  Manfred Thaller; Volker Heydegger; Jan Schnasse; Sebastian Beyl; Elona Chudobkaite
The (automatic) extraction of significant characteristics of files is an important feature of all long term preservation activities. We propose, however, that for the necessary automatic evaluation of the outcomes of certain preservation actions -- notably migration -- an approach is necessary, which follows other traditions in the abstraction of format descriptions. To implement a strategy for the automatic evaluation of various actions within a preservation environment, we define two formal, XML base languages: One allowing to define the content of a specific file, the other describing a file format in such a way, that it can be handled by multi-purpose software.
Keywords: File characteristics; format definition languages; data abstraction; long term preservation

Social Tagging

Can Social Tags Help You Find What You Want? BIBAKFull-Text 50-61
  Khasfariyati Razikin; Dion Hoe-Lian Goh; Alton Yeow-Kuan Chua; Chei Sian Lee
One of the uses of social tagging is to associate freely selected terms (tags) to resources for sharing resources among tag consumers. This enables tag consumers to locate new resources through the collective intelligence of other tag creators, and offers a new avenue for resource discovery. This paper investigates the effectiveness of tags as resource descriptors determined through the use of text categorisation using Support Vector Machines. Two text categorisation experiments were done for this research, and tags and web pages from del.icio.us were used. The first study concentrated on the use of terms as its features. The second study used both terms and its tags as part of its feature set. The results indicate that the tags were not always reliable indicators of the resource contents. At the same time, the results from the terms only experiment were better compared to the experiment with terms and tags. A deeper analysis of a sample of tags and documents were also conducted and implications of this research are discussed.
Keywords: Social tagging; Resource Descriptors; Resource Discovery; Support Vector Machines
TagNSearch: Searching and Navigating Geo-referenced Collections of Photographs BIBAKFull-Text 62-73
  Quang Minh Nguyen; Thi Nhu Quynh Kim; Dion Hoe-Lian Goh; Yin Leng Theng; Ee-Peng Lim; Aixin Sun; Chew-Hung Chang; Kalyani Chatterjea
TagNSearch is a map-based tool for searching and browsing geo-tagged photographs based on their associated tags. Using Flickr as the dataset, TagNSearch returns, for a given query, photographs clustered by locations, and summarizes each cluster of photographs by cluster-specific tags. A map-based interface is also provided to help users better search, navigate and browse photographs and their clusters. A qualitative evaluation comparing TagNSearch and an existing tag search support in Flickr was also conducted. The task involved finding locations associated with a set of photographs. Participants were found to perform this task better using TagNSearch than Flickr.
Keywords: Social tagging; TagNSearch; clustering; Flickr; geo-tagged photographs
Evaluation of Semantic and Social Technologies for Digital Libraries BIBAFull-Text 74-77
  Sebastian Ryszard Kruk; Ewelina Kruk; Katarzyna Stankiewicz
Libraries are the tools we use to learn and to answer our questions. The quality of our work depends, among others, on the quality of the tools we use. Recently, the semantic web and social networking technologies are being introduced to the digital libraries domain. In this article we present the results of an evaluation of social and semantic end-user information discovery services for the digital libraries.

Quotations and Annotations

Identifying Quotations in Reference Works and Primary Materials BIBAKFull-Text 78-87
  Andrea Ernst-Gerlach; Gregory Crane
Identifying quotations from reference works in primary materials is a very important feature for digital libraries. By adding corresponding citation links to the original text, we can help contextualize the source material. In this paper we introduce an algorithm for identifying citations automatically based on an analysis of the structure of quotations from three different reference works of Latin texts. An evaluation shows that this approach is capable of finding a large number of quotations with which no machine actionable citations are associated. Additionally this approach can be applied for quotations that have been altered in a range of ways from their source.
Keywords: citations; reference works
Superimposed Information Architecture for Digital Libraries BIBAFull-Text 88-99
  David W. Archer; Lois M. L. Delcambre; Fabio Corubolo; Lillian N. Cassel; Susan Price; Uma Murthy; David Maier; Edward A. Fox; Sudarshan Murthy; John McCall; Kiran Kuchibhotla; Rahul Suryavanshi
A variety of software tools commonly used in research and industry allow a user to select (usually contiguous) segments of content to be annotated, referenced, or otherwise distinguished from a containing document. However, digital libraries (DLs) often curate only full documents, not these selected sub-documents. Thus, sub-documents in a DL may not have the full complement of metadata, and they may not be visible using DL browse and search facilities. We are interested in explicit representation of sub-documents in a DL environment. In this paper, we show how sub-documents may be represented and curated. We focus on the explicit representation of what we call a mark -- an encapsulated address of a sub-document along with associated context. Our contributions are: a software architecture for representing marks as first-class objects together with regular documents in a DL; and an implementation of our architecture using existing software packages with modest enhancements. This approach provides new capabilities for the DL with minimal modification to tools and interfaces familiar to the DL user.

User Studies and System Evaluation

Impact-ED -- A New Model of Digital Library Impact Evaluation BIBAKFull-Text 100-105
  Gemma Madle; Patty Kostkova; Abdul V. Roudsari
This paper presents Impact-ED, a new model for digital library impact evaluation. The model draws on assumptions from the Theory of Planned Behaviour and the Sense-Making Model. The paper discusses the current shortfalls of digital library impact evaluation and presents an alternative. Knowledge and attitude are put forward as potential measures of impact and different methods are triangulated and data linked to provide a comprehensive picture of the impact of the library at the time of use. The model shows how the digital library is being used to benefit users in their work, how it is changing their knowledge and attitudes and how the information found is used in real-time in the real world. It is being tested in the healthcare domain on the National Resource for Infection Control (www.nric.org.uk) but is expected to be transferable to other domains as further work will prove.
Keywords: Digital Library Evaluation; Sense-making; Knowledge and Attitudes; Impact Evaluation
Prioritisation, Resources and Search Terms: A Study of Decision-Making at the Virtual Reference Desk BIBAKFull-Text 106-116
  Simon Attfield; Stephann Makri; James Kalbach; Ann Blandford; Stephen De Gabrielle; Mark Edwards
The reinterpretation of the traditional reference service in an online context is the virtual reference desk. Placing reference services into an online setting, however, presents many challenges. We report a study and analytic framework which addresses support for decision-making during virtual enquiry work. Focusing on specialist law-libraries, the study shows that enquirers do not volunteer important information to the service and that asynchronous communication media and some social obstacles present barriers to prompting. Also, previous enquiries are frequently used to inform current enquiry strategies but barriers exist in accessing this information. We conclude that email is an inadequate medium for supporting virtual reference services, and that system should support automatic, speculative matching between new enquiry content and integrated enquiry knowledge bases. The contribution of the framework is to offer a structured approach to evaluation in multiple virtual reference contexts and enable rapid convergence on barriers to efficient and effective service.
Keywords: virtual reference service; evaluation; collaborative information access
Searchling: User-Centered Evaluation of a Visual Thesaurus-Enhanced Interface for Bilingual Digital Libraries BIBAKFull-Text 117-121
  Amy Stafford; Ali Shiri; Stan Ruecker; Matthew Bouchard; Paras Mehta; Karl Anvik; Ximena Rossello
In this paper, we describe a qualitative user study of Searchling -- an experimental visual interface that allows users to leverage a bilingual thesaurus for query formulation and enhancement. The design of Searchling is based on theories of thesaurus-based interface design from Shiri et al. [1], combined with the principles of rich-prospect browsing [2]. The Searchling interface provides the user with three working spaces on one screen: the Thesaurus space, Query space, and Document space. We interviewed 15 graduate and faculty researchers at the University of Alberta, who carried out three structured tasks in a think-aloud protocol, with simultaneous audio recording and screen capture. These participants identified a number of significant advantages to the researcher provided by Searchling, including the value of having an interface that could help with identifying search terms, suggesting preferred terms, and giving bilingual search support. They also suggested areas for future improvement, primarily related to our assumption that common knowledge of thesauri would be sufficient to make the various features clear if they were described using standard vocabulary from the thesaurus field.
Keywords: Visual Interfaces; Thesauri; Multilingual Digital Libraries; Information Retrieval; User Evaluation

From Content-Centric to Person-Centric Systems

An Extensible Virtual Digital Libraries Generator BIBAFull-Text 122-134
  Massimiliano Assante; Leonardo Candela; Donatella Castelli; Luca Frosini; Lucio Lelii; Paolo Manghi; Andrea Manzi; Pasquale Pagano; Manuele Simi
In this paper we describe the design and implementation of the VDL Generator, a tool to simplify and automatise the Digital Library development process. In particular, we discuss how our approach to the realisation of this tool simplifies the task of implementing, extending and modifying such a fundamental component. This tool models its issue as a generic search problem that can easily be adapted to different application scenarios. In particular, to guarantee its extensibility we carefully identify, isolate and organise the VDL Generator constituents, i.e. (i) the set of logical components that can be used when designing a Digital Library, (ii) the set of physical components that by implementing the logical components contribute to implement the Digital Library and (iii) the search strategy exploited to accomplish the generation task. Furthermore, we report on the experiences matured in implementing and exploiting such an innovative service in the context of the Diligent EU funded project and discuss future plans for its consolidation.
A Participative Digital Archiving Approach to University History and Memory BIBAKFull-Text 135-147
  Jyishane Liu
As digital archiving is heading into the next level of development and influence, we must consider the need of connecting digital archiving with more people and more resources to enhance the continuing effort. In this paper, we address the issue of engaging users in digital archiving task and forming a community of collective content creation. We propose a conceptual architecture for participative digital archiving and report a pilot project to redesign and reconstruct the archiving process of a university history. It also serves the purpose of showcasing archived content and providing reminiscence of university life for all university members.
Keywords: Digital Archiving; Web 2.0; User Participation; Collective Memory
Enhancing Library Services with Web 2.0 Functionalities BIBAKFull-Text 148-159
  Dimitris Gavrilis; Constantia Kakali; Christos Papatheodorou
In this paper, a prototype of an Online Public Access Catalog (OPAC) is presented. This new OPAC features new functionalities and utilizes web 2.0 technologies in order to deliver improved search and retrieval services. Some of these new services include social tag annotations, user opinions and ranks and tag-based similarity searches. The prototype is evaluated by a user group through questionnaires, interviews and with the system's integrated logging mechanism. The results are encouraging enough and show that Library 2.0 technologies seem to be acceptable by the majority of the users.
Keywords: Web 2.0; social tagging; subject representation; OPAC; evaluation

Citation Analysis

A Service-Oriented Infrastructure for Early Citation Management BIBAKFull-Text 160-171
  José Hilario Canós; Manuel Llavador; Eduardo Mena; Marcos R. S. Borges
Citation analysis needs an in-depth transformation. Current systems have been long criticized due to defects such as lack of coverage and low accuracy of the citation data. Surprisingly, incorrect or incomplete data are used to make important decisions about researchers' careers. We argue that a new approach based on the collection of citation data when they are actually generated (that is, during the edition of papers) can overcome current limitations, and propose a new framework in which the research community as a whole is the owner as well as beneficiary of a Global Citation Registry characterized by high quality citation data. The registry will be accessible for all the interested parties and will be the source over which the different impact models can be applied.
Keywords: Citation management; Service-Oriented Architecture
Releasing the Power of Digital Metadata: Examining Large Networks of Co-related Publications BIBAFull-Text 172-184
  David Tarrant; Les Carr; Terry R. Payne
Bibliographic metadata plays a key role in scientific literature, not only to summarise and establish the facts of the publication record, but also to track citations between publications and hence to establish the impact of individual articles within the literature. Commercial secondary publishers have typically taken on the role of rekeying, mining and analysing this huge corpus of linked data, but as the primary literature has moved to the world of the digital repository, this task is now undertaken by new services such as Citeseer, Citebase or Google Scholar. As institutional and subject-based repositories proliferate and Open Access mandates increase, more of the literature will become openly available in well managed data islands containing a much greater amount of detailed bibliometric metadata in formats such as RDF. Through the use of efficient extraction and inference techniques, complex relations between data items can be established. In this paper we explain the importance of the co-relation in enabling new techniques to rate the impact of a paper or author within a large corpus of publications.
Author Name Disambiguation for Citations Using Topic and Web Correlation BIBAKFull-Text 185-196
  Kai-Hsiang Yang; Hsin-Tsung Peng; Jian-Yi Jiang; Hahn-Ming Lee; Jan-Ming Ho
Today, bibliographic digital libraries play an important role in helping members of academic community search for novel research. In particular, author disambiguation for citations is a major problem during the data integration and cleaning process, since author names are usually very ambiguous. For solving this problem, we proposed two kinds of correlations between citations, namely, Topic Correlation and Web Correlation, to exploit relationships between citations, in order to identify whether two citations with the same author name refer to the same individual. The topic correlation measures the similarity between research topics of two citations; while the Web correlation measures the number of co-occurrence in web pages. We employ a pair-wise grouping algorithm to group citations into clusters. The results of experiments show that the disambiguation accuracy has great improvement when using topic correlation and Web correlation, and Web correlation provides stronger evidences about the authors of citations.
Keywords: Citation clustering; Citation analysis; Author disambiguation

Collection Building

Development of a National Syllabus Repository for Higher Education in Ireland BIBAFull-Text 197-208
  Arash Joorabchi; Abdulhussain E. Mahdi
With the significant growth in electronic education materials such as syllabus documents and lecture notes available on the Internet and intranets, there is a need for developing structured central repositories of such materials to allow both educators and learners to easily share, search and access them. This paper reports on our on-going work to develop a national repository for course syllabi in Ireland. In specific, it describes a prototype syllabus repository system for higher education in Ireland that has been developed by utilising a number of information extraction and document classification techniques, including a new fully unsupervised document classification method that uses a web search engine for automatic collection of training set for the classification algorithm. Preliminary experimental results for evaluating the system's performance are presented and discussed.
Matching Hierarchies Using Shared Objects BIBAKFull-Text 209-220
  Robert Ikeda; Kai Zhao; Hector Garcia-Molina
One of the main challenges in integrating two hierarchies (e.g., of books or web pages) is determining the correspondence between the edges of each hierarchy. Traditionally, this process, which we call hierarchy matching, is done by comparing the text associated with each edge. In this paper we instead use the placement of objects present in both hierarchies to infer how the hierarchies relate. We present two algorithms that, given a hierarchy with known facets (attribute-value pairs that define what objects are placed under an edge), determine feasible facets for a second hierarchy, based on shared objects. One algorithm is rule-based and the other is statistics-based. In the experimental section, we compare the results of the two algorithms, and see how their performances vary based on the amount of noise in the hierarchies.
Keywords: data integration; mapping
Virtual Unification of the Earliest Christian Bible: Digitisation, Transcription, Translation and Physical Description of the Codex Sinaiticus BIBAFull-Text 221-226
  Zeki Mustafa Dogan; Alfred Scharsky
This paper describes the deployment of innovative digitisation methods and new web technologies to reunify the oldest Bible -- the Codex Sinaiticus -- and to make it available to wider public. The conception of the website development has begun in late 2006 and the first stage of the development will allow free access to the website of this eminent part of the cultural heritage in 2008, which only has been possible through the close collaboration between international partners.
Sustainable Digital Library Systems over the DRIVER Repository Infrastructure BIBAFull-Text 227-231
  Michele Artini; Leonardo Candela; Donatella Castelli; Paolo Manghi; Marko Mikulicic; Pasquale Pagano
The DRIVER Infrastructure is an e-infrastructure providing an environment where organizations find the tools to aggregate heterogeneous content sources into uniform shared Information Spaces and then build and customize their Digital Library Systems to operate over them. In this paper, we shall show the benefits for organizations embracing the infrastructural approach by presenting the DRIVER infrastructure, its current status of maintenance, its participating organizations, and the first two systems built on top of its Information Space.

User Interfaces and Personalization

Interactive Paper as a Reading Medium in Digital Libraries BIBAFull-Text 232-243
  Moira C. Norrie; Beat Signer; Nadir Weibel
In digital libraries, much of the reading activity is still done on printed copies of documents. We show how digital pen and paper technologies can be used to support readers by automatically creating interactive paper versions of digital documents during the printing process that enable users to activate embedded hyperlinks to other documents and services from printed versions. The approach uses a special printer driver that allows information about hyperlinks to be extracted and stored at print time. Users can then activate hyperlinks in the printed document with a digital pen.
Personalizing the Selection of Digital Library Resources to Support Intentional Learning BIBAKFull-Text 244-255
  Qianyi Gu; Sebastian de la Chica; Faisal Ahmad; Huda J. Khan; Tamara Sumner; James H. Martin; Kirsten R. Butcher
This paper describes a personalization approach for using online resources in digital libraries to support intentional learning. Personalized resource recommendations are made based on what learners currently know and what they should know within a targeted domain to support their learning process. We use natural language processing and graph based algorithms to automatically select online resources to address students' specific conceptual learning needs. An evaluation of the graph based algorithm indicates that the majority of recommended resources are highly relevant or relevant for addressing students' individual knowledge gaps and prior conceptions.
Keywords: Personalization; Information Retrieval; Intentional Learning; Knowledge Map
Enrichment of European Digital Resources by Federating Regional Digital Libraries in Poland BIBAKFull-Text 256-259
  Agnieszka Lewandowska; Cezary Mazurek; Marcin Werla
In this paper we present the PIONIER Network Digital Libraries Federation, which was founded in the June 2007 in Poland. This federation is a single point of access to the majority of Polish digital resources gathered in regional and institutional digital libraries. Besides of the resources aggregation and promotion this service also allows for automated coordination of digitization and PURL resolution of OAI identifiers for objects from Polish digital libraries. It is also a part of networked digital library user profile system enabled recently in the Polish network of distributed digital libraries. During the development of the PIONIER Network Digital Libraries Federation extensions for OAI-PMH protocol and Shibboleth middleware were made and deployed in order to achieve required federation functionality. The PIONIER DLF service is based on the set of distributed atomic services giving together its functionality.
Keywords: digital libraries federation; coordination of digitization; metadata harvesting; atomic services; digital object identifiers resolution; networked user profile
Access Modalities to an Imagistic Library for Medical e-Learning BIBAKFull-Text 260-263
  Liana Stanescu; Dumitru Dan Burdescu; Gabriel Mihai; Cosmin Stoica Spahiu; Anca Ion
The paper presents the organization way and the access facilities to a multimedia digital library with medical information for electronic learning. The digital library contains course materials and medical images collected in the patient's diagnosis process. The originality of the paper is given by the presentation of two access modalities to multimedia information from the digital library: content-based visual query and semantic query. The content-based visual query can be effectuated at the image or region level using colour and texture characteristics automatically extracted from medical images at their loading in the database. Also, semantic queries against the multimedia database can be automatically launched with the help of the topic map based on a part of MeSH thesaurus, the part that includes the medical diagnosis names. The student can navigate through topic map depending on its interest subject, bringing in this way big advantages. These access paths can be combined for retrieving the interest information. The multimedia digital library represents a very useful tool in the medical knowledge improvement, addressing to the students, resident doctors, young specialists or family doctors.
Keywords: imagistic library; content-based visual query; color feature; texture feature; topic map; semantic query
What a Difference a Default Setting Makes BIBAKFull-Text 264-267
  Te Taka Keegan; Sally Jo Cunningham
This paper examines the effect of the default interface language on the usage of a bilingual digital library. In 2005 the default interface language of a bilingual digital library was alternated on a monthly basis between Maori and English. A comprehensive transaction log analysis over this period reveals that not only did usage in a particular language increase when the default interface language was set to that language but that the way the interface was used, in both languages, was quite different depending on the default language.
Keywords: Log Analysis; Multi-Language Access


A Methodology for Sharing Archival Descriptive Metadata in a Distributed Environment BIBAFull-Text 268-279
  Nicola Ferro; Gianmaria Silvello
This paper discusses how to exploit widely accepted solutions for interoperation, such as the pair OAI-PMH and DC metadata format, in order to deal with the peculiar features of archival description metadata and allow their sharing. We present a methodology for mapping EAD metadata into DC metadata records without losing information. The methodology exploits DLS technologies enhancing archival metadata sharing possibilities and at the same time considers archival needs; furthermore, it permits to open valuable information resources held by archives to the wider context of the cross-domain interoperation among different cultural heritage institutions.
Semantic Interoperability in Archaeological Datasets: Data Mapping and Extraction Via the CIDOC CRM BIBAKFull-Text 280-290
  Ceri Binding; Keith May; Douglas Tudhope
Findings from a data mapping and extraction exercise undertaken as part of the STAR project are described and related to recent work in the area. The exercise was undertaken in conjunction with English Heritage and encompassed five differently structured relational databases containing various results of archaeological excavations. The aim of the exercise was to demonstrate the potential benefits in cross searching data expressed as RDF and conforming to a common overarching conceptual data structure schema -- the English Heritage Centre for Archaeology ontological model (CRM-EH), an extension of the CIDOC Conceptual Reference Model (CRM). A semi-automatic mapping/extraction tool proved an essential component. The viability of the approach is demonstrated by web services and a client application on an integrated data and concept network.
Keywords: knowledge organization systems; mapping; CIDOC CRM; core ontology; semantic interoperability; semi-automatic mapping tool; thesaurus; terminology services
Annotations: A Way to Interoperability in DL BIBAFull-Text 291-295
  Maristella Agosti; Nicola Ferro
This paper discusses how annotations and interoperability relate together and affect each other in digital library settings. We analyse interoperability and annotations in the light of the evolution of the field of digital libraries and provide recommendations for successful interoperable annotations towards the European Digital Library.
Semantic Based Substitution of Unsupported Access Points in the Library Meta-search Environments BIBAFull-Text 296-307
  Michalis Sfakakis; Sarantos Kapidakis
Meta-searching library communities involve access to sources where metadata are invisible behind query interfaces. Many of the query interfaces utilize predefined abstract Access Points for the implementation of the search services, without any further access to the underlining meta-data and query methods. The existence of unsupported Access Points and its consequences, which are either query failures or inconsistent query answers, creates a major issue when meta-searching this kind of systems. An example of the abstract Access Point based search model is the Z39.50 information retrieval protocol, which is widely used by the library community. In this paper we present the zSAPN (Z39.50 Semantic Access Point Network), a system which improves the search consistency and eliminates the query failures by exploiting the semantic information of the Access Points from an RDFS description. The current implementation of zSAPN is in the context of the Z39.50 protocol, using the official specification of the Access Point semantics and can benefit the huge number of the available sources worldwide. zSAPN substitutes each unsupported Access Point with a set of other supported ones, whose appropriate combination would either broaden or narrow the initial semantics, according to the user's choice. Finally, we estimate the impact of the modification of the initial semantics during the substitution process to the precision or the recall of the original query, with the unsupported Access Point.

Information Retrieval

Proximity Scoring Using Sentence-Based Inverted Index for Practical Full-Text Search BIBAFull-Text 308-319
  Yukio Uematsu; Takafumi Inoue; Kengo Fujioka; Ryoji Kataoka; Hayato Ohwada
We propose a search method that uses sentence-based inverted indexes to achieve high accuracy at practical speeds. The proposed method well supports the vast majority of queries entered on the web; these queries contain single words, multiple words for proximity searches, and semantically direct phrases. The existing approach, the inverted index which holds word-level position data is not efficient, because the size of index becomes extremely large. Our solution is to drop the word position data and index only the existence of each word in each sentence. We incorporate the sentence-based inverted index into a commercial search engine and evaluate it using both Japanese and English standard IR corpuses. The experiment shows that our method offers high accuracy, while index size and search processing time are greatly reduced.
Information Retrieval and Filtering over Self-organising Digital Libraries BIBAFull-Text 320-333
  Paraskevi Raftopoulou; Euripides G. M. Petrakis; Christos Tryfonopoulos; Gerhard Weikum
We present iClusterDL, a self-organising overlay network that supports information retrieval and filtering functionality in a digital library environment. iClusterDL is able to handle huge amounts of data provided by digital libraries in a distributed and self-organising way. The two-tier architecture and the use of semantic overlay networks provide an infrastructure for creating large networks of digital libraries that require minimum administration, yet offer a rich set of tools to the end-user. We present the main components of our architecture, the protocols that regulate peer interactions, and an experimental evaluation that shows the efficiency, and the retrieval and filtering effectiveness of our approach.
A Framework for Managing Multimodal Digitized Music Collections BIBAFull-Text 334-345
  Frank Kurth; David Damm; Christian Fremerey; Meinard Müller; Michael Clausen
In this paper, we present a framework for managing heterogeneous, multimodal digitized music collections containing visual music representations (scanned sheet music) as well as acoustic music material (audio recordings). As a first contribution, we propose a preprocessing workflow comprising feature extraction, audio indexing, and music synchronization (linking the visual with the acoustic data). Then, as a second contribution, we introduce novel user interfaces for multimodal music presentation, navigation, and content-based retrieval. In particular, our system offers high quality audio playback with time-synchronous display of the digitized sheet music. Furthermore, our system allows a user to select regions within the scanned pages of a musical score in order to search for musically similar sections within the audio documents. Our novel user interfaces and search functionalities will be integrated into the library service system of the Bavarian State Library as part of the Probado project.

Metadata Generation

A Quantitative Evaluation of Dissemination-Time Preservation Metadata BIBAFull-Text 346-357
  Joan A. Smith; Michael L. Nelson
One of many challenges facing web preservation efforts is the lack of metadata available for web resources. In prior work, we proposed a model that takes advantage of a site's own web server to prepare its resources for preservation. When responding to a request from an archiving repository, the server applies a series of metadata utilities, such as Jhove and Exif, to the requested resource. The output from each utility is included in the HTTP response along with the resource itself. This paper addresses the question of feasibility: Is it in fact practical to use the site's web server as a just-in-time metadata generator, or does the extra processing create an unacceptable deterioration in server responsiveness to quotidian events? Our tests indicate that (a) this approach can work effectively for both the crawler and the server; and that (b) utility selection is an important factor in overall performance.
Improving Temporal Language Models for Determining Time of Non-timestamped Documents BIBAFull-Text 358-370
  Nattiya Kanhabua; Kjetil Nørvåg
Taking the temporal dimension into account in searching, i.e., using time of content creation as part of the search condition, is now gaining increasingly interest. However, in the case of web search and web warehousing, the timestamps (time of creation or creation of contents) of web pages and documents found on the web are in general not known or can not be trusted, and must be determined otherwise. In this paper, we describe approaches that enhance and increase the quality of existing techniques for determining timestamps based on a temporal language model. Through a number of experiments on temporal document collections we show how our new methods improve the accuracy of timestamping compared to the previous models.
Revisiting Lexical Signatures to (Re-)Discover Web Pages BIBAFull-Text 371-382
  Martin Klein; Michael L. Nelson
A lexical signature (LS) is a small set of terms derived from a document that capture the "aboutness" of that document. A LS generated from a web page can be used to discover that page at a different URL as well as to find relevant pages in the Internet. From a set of randomly selected URLs we took all their copies from the Internet Archive between 1996 and 2007 and generated their LSs. We conducted an overlap analysis of terms in all LSs and found only small overlaps in the early years (1996-2000) but increasing numbers in the more recent past (from 2003 on). We measured the performance of all LSs in dependence of the number of terms they consist of. We found that LSs created more recently perform better than early LSs created between 1996 and 2000. All LSs created from year 2000 on show a similar pattern in their performance curve. Our results show that 5-, 6- and 7-term LSs perform best with returning the URLs of interest in the top ten of the result set. In about 50% of all cases these URLs are returned as the number one result and in 30% of all times we considered the URLs as not discovered.


The Web Versus Digital Libraries: Time to Revisit This Once Hot Topic BIBAFull-Text 383-384
  Vittore Casarosa; Jill Cousins; Anna Maria Tammaro; Yannis E. Ioannidis
At the end of last century (Internet time elapses much quicker than normal time, and it already looks like a long time ago), the "information explosion" on the Web on one side, and the flourishing of research activities on digital library technologies on the other, spurred heated discussions about the future of traditional libraries. The view of one camp was that since "all" the information was available on-line, the use of smart search engines and clever software tools would allow Digital Libraries to provide all the information (and the services) needed by an information seeker. The view of the other camp was that the value of information was not just in its sheer quantity, but was rather in the organization and the quality of the information made available, and that could never be done by "programs".

Posters and Demonstrations

The MultiMatch Prototype: Multilingual/Multimedia Search for Cultural Heritage Objects BIBAFull-Text 385-387
  Giuseppe Amato; Franca Debole; Carol Peters; Pasquale Savino
MultiMatch is a 30 month targeted research project under the Sixth Framework Programme, supported by the unit for Content, Learning and Cultural Heritage (Digicult) of the Information Society DG. MultiMatch is developing a multimedia/multilingual search engine designed specifically for the access, organization and personalized presentation of cultural heritage information. The demonstration will present the MultiMatch system prototype.
Digital Preservation of Scientific Data BIBAKFull-Text 388-391
  José Barateiro; Gonçalo Antunes; Manuel Cabral; José Luis Borbinha; Rodrigo Rodrigues
Digital preservation aims at maintaining digital objects and data accessible over long periods of time. We propose the use of dedicated or surplus storage resources of data grids to build frameworks of digital preservation. In this paper we focus on the problem of digital preservation in two scenarios: a national digital library and a repository of scientific information for dam safety. We detail the scenario of dam safety data and provide an analysis of an existing data grid solution that can be used for this purpose.
Keywords: Digital Libraries; Digital Preservation; Data Grids
Using Terminology Web Services for the Archaeological Domain BIBAFull-Text 392-393
  Ceri Binding; Douglas Tudhope
The AHRC funded STAR project (Semantic Technologies for Archaeological Resources) has developed web services for knowledge organisation systems (KOS) represented in SKOS RDF format, building on previous work by the University of Glamorgan Hypermedia Research Unit on terminology web services. The current service operates on a repository of multiple (English Heritage) thesauri converted to SKOS format, containing terms and concepts that would be familiar to those working within the archaeological domain. It provides facilities for search, concept browsing and semantic expansion across these specialist terminologies.
Building a Digital Research Community in the Humanities BIBAFull-Text 394-397
  Toby Burrows; Ela Majocha
The ARC Network for Early European Research (NEER), funded under the Australian Research Council's Research Networks programme, aims to enhance the scale of Australian research in medieval and early modern studies, and to build collaborative and innovative approaches to planning and managing research. An integral part of NEER's vision is the development of a digital environment which provides a setting for the work of this national research community. This environment has three major components: the Confluence collaborative Web workspace, the PioNEER digital repository for research outputs and data, and the Europa Inventa gateway to cultural heritage objects.
Agile DL: Building a DELOS-ConformedDigital Library Using Agile Software Development BIBAFull-Text 398-399
  Javier D. Fernández; Miguel A. Martínez-Prieto; Pablo de la Fuente; Jesús Vegas; Joaquín Adiego
This paper describes a concrete partial implementation of the DELOS Reference Model to the particular field of manuscripts and incunabula, and how an agile software methodology, SCRUM, suits the evolutive nature of Digital Libraries, solving misunderstandings and lightening the underlying model.
Design of a Digital Library System for Large-Scale Evaluation Campaigns BIBAFull-Text 400-401
  Marco Dussin; Nicola Ferro
This work describes the effort of designing and developing a Digital Library System (DLS) able to manage the different types of information resources produced during a large-scale evaluation campaign and to support the different stages of it. We discuss, in particular, the design of DIRECT, a DLS developed to assist the work of the actors of international evaluation campaigns.
An XML-Centric Storage for Better Preservation and Maintenance of Data: Union Catalog of NDAP, Taiwan BIBAKFull-Text 402-405
  Tzu-Yen Hsu; Ting-Hua Chen; Chung-Hsi Hung; Sea-Hom Chou
The Union Catalog (UC) of Taiwan was established to provide an integrated search service for millions of digital objects distributed in the databases of different institutions. The main challenge is how to continuously and consistently manage large quantities of data. XML technologies have already been recommended for greater data preservation rather than database systems. In addition, we assume that a database design in our case would be complex and that consistent maintenance would be difficult. For this reason, databases are not used as the primary storage mechanism of the UC. Although the UC adopts an XML-centric architecture, it has difficulty handling data queries, data modification, and category listing efficiently. In this paper, we discuss how we use XML technologies to implement the UC system, and how we solve the issues arising from XML's limitations.
Keywords: NDAP; architecture; digital library
Summa: This Is Not a Demo BIBAKFull-Text 406-409
  Gitte Behrens; Mikkel Kamstrup Erlandsen; Toke Eskildsen; Bolette Ammitzbøll Jurik; Dorete Bøving Larsen; Hans Lauridsen; Michael Poltorak Nielsen; Jørn Thøgersen; Mads Villadsen
The Summa search system is a fast, scalable, modular, open source search system, which can integrate all types of library metadata and full text. The Summa search system is based on user studies and on librarian expertise in formats and metadata. Summa is an open and modular design. Summa offers modules for faceted browsing, automated cluster extraction and a flexible user interface among others. The in-house Summa production system at The State and University Library in Denmark searches a corpus of 8 million records. The Summa search system version 1.0 to be released in the autumn 2008 is designed to scale to hundreds of millions.
Keywords: Search; open source; modularity; scalability; performance
New Tasks on Collections of Digitized Books BIBAFull-Text 410-412
  Gabriella Kazai; Antoine Doucet; Monica Landoni
Motivated by the plethora of book digitization projects around the world, the Initiative for the Evaluation of XML Retrieval (INEX) launched a Book Search track in 2007. In its first year, the track focused on Information Retrieval (IR) tasks, exploring the utility of traditional and structured document retrieval techniques to books. In this paper, we propose three new tasks to be investigated at the Book Search track in 2008. The tasks aim to promote research in a wider context, across IR, Human Computer Interaction, Digital Libraries, and eBooks. We identify three novel problem areas, define tasks around these and propose possible evaluation methods.
Plato: A Preservation Planning Tool Integrating Preservation Action Services BIBAFull-Text 413-414
  Hannes Kulovits; Christoph Becker; Michael Kraxner; Florian Motlik; Kevin Stadler; Andreas Rauber
The creation of a concrete plan for preserving a collection of digital objects of a specific institution necessitates the evaluation of available solutions against clearly defined and measurable criteria. This process is called preservation planning and aids in the decision making process to find the most suitable preservation strategy considering the institution's requirements, the planning context and available actions applicable to the objects contained in the repository. Performed manually, this evaluation promises to be hard and tedious work, inasmuch as there exist numerous potential preservation action tools of different quality. In this demonstration, we present Plato [4], an interactive software tool aimed at creating preservation plans.
Event Representation in Temporal and Geographic Context BIBAFull-Text 415-418
  Ryan Shaw; Ray R. Larson
Linking digital resources that refer to the same people or places is becoming common. Events are another kind of entity that might be used to link resources in this way. We examine a number of standards for encoding of archival, historical, genealogical, and news information to compare the tools they offer for representing events.
A Mechanism for Solving the Unencoded Chinese Character Problem on the Web BIBAKFull-Text 419-422
  Te-Jun Lin; Jyun-Wei Huang; Christine Lin; Hung-Yi Li; Hsiang-An Wang; Chih-Yi Chiu
The unencoded Chinese character problem that occurs when digitizing historical Chinese documents makes digital archiving difficult. Expanding the character coding space, such as by using the Unicode Standard, does not solve the problem completely due to the extensibility of Chinese characters. In this paper, we propose a mechanism based on a Chinese glyph structure database, which contains glyph expressions that represent the composition of Chinese characters. Users can search for Chinese characters through our web interface and browse the search results. Each Chinese character can be embedded in a web document using a specific Java Script code. When the web document is opened, the Java Script code will load the image of the Chinese character in an appropriate font size for display. Even if the Chinese characters are not available in the database, their images can be generated through the dynamic character composition function. As the proposed mechanism is cross-platform, users can easily access unencoded Chinese characters without installing any additional font files in their personal computers. A demonstration system is available at http://char.ndap.org.tw.
Keywords: Chinese glyph structure database; digital archive; unencoded Chinese characters
Gaze Interaction and Access to Library Collection BIBAFull-Text 423-424
  Haakon Lund; John Paulin Hansen
A new module in the GazeTalk eye-typing communication software for people with severe disabilities has been developed The web-service based module enables the user to gain access to a collection of digitized full text. This demonstration shows the functionalities in the library access module.
Covering Heterogeneous Educative Environments with Integrated Editions in the Electronic Work BIBAFull-Text 425-426
  Miguel A. Martínez-Prieto; Pablo de la Fuente; Jesús Vegas; Joaquín Adiego
Although e-books usage has a positive impact in educational environments, contents representation is a complex issue given their audience. In this paper, we show a flexible and functional appearance that allows a synchronized consultation of the literary editions integrated in an electronic work.
Exploring Query Formulation and Reformulation: A Preliminary Study to Map Users' Search Behaviour BIBAKFull-Text 427-430
  Anna Mastora; Maria Monopoli; Sarantos Kapidakis
This study aims to investigate the query formulation and reformulation patterns such as generalisations, specifications, parallel movements and replacements with synonyms within the search procedure. Results showed that users reformulated their queries by using terms contained in the retrieved results while in the query reformulation process they mainly used terms with parallel meanings. Participants used equally either more specific or more general terms for follow-up queries. Finally, the study revealed that a high proportion of same terms were used instead of unique ones; half of them were included in the Eurovoc thesaurus.
Keywords: Query formulation; Query reformulation; Search behaviour; Search patterns; Query length
Identification of Bibliographic Information Written in Both Japanese and English BIBAFull-Text 431-433
  Yuko Taniguchi; Hidetsugu Nanba
We have studied the automatic construction of a multilingual citation index by collecting Postscript and PDF files from the Internet [2], and in this paper, we propose a method that can identify duplicate bibliographic information written in both Japanese and English, which will be an indispensable module for the construction of a multilingual citation index.
DIGMAP: A Digital Library Reusing Metadata of Old Maps and Enriching It with Geographic Information BIBAKFull-Text 434-435
  Gilberto Pedrosa; João Luzio; Hugo Manguinhas; Bruno Martins; José Luis Borbinha
The DIGMAP service reuses metadata from European national libraries and other relevant third party metadata sources. The gathered metadata is enhanced locally with geographical indexing, leveraging on geographic gazetteers and authority files. When available, the images of the maps are also processed to extract potentially relevant features. This made it possible to develop a rich integrated environment for searching and browsing services based mainly in enriched metadata.
Keywords: Geographic information; Old maps; Systems architectures; Interoperability
Visual Analysis of Classification Systems and Library Collections BIBAFull-Text 436-439
  Magnus Pfeffer; Kai Eckert; Heiner Stuckenschmidt
In this demonstration we present a visual analysis approach that addresses both developers and users of hierarchical classification systems. The approach supports an intuitive understanding of the structure and current use in relation to a specific collection. We will also demonstrate its application for the development and management of library collections.
A Framework for Music Content Description and Retrieval BIBAFull-Text 440-443
  Alberto Pinto; Goffredo Haus
The recently approved format for music content description IEEE PAR1599 (MX) defines a standard for retrieval models representation within music and audio/video formats that makes use of XML documents as content descriptors. We show how music/audio semantics can be represented within the Structural layer of MX through the introduction of novel Music Information Retrieval (MIR) objects in order to embed metadata relative to specific retrieval models.
XCL: The Extensible Characterisation Language -- One Step towards an Automatic Evaluation of Format Conversions BIBAFull-Text 444-446
  Jan Schnasse; Sebastian Beyl; Elona Chudobkaite; Volker Heydegger; Manfred Thaller
Today file format specifications are formulated in natural languages. A programmer who wants to decode, encode or render the information contained in a file has to read through the specification before translating it into the terms of a programming language. The maintainer of the format usually eases that process by the deployment of libraries for the format. While this is a well proven process the translation from one format into another format is often an error-prone undertaking, nevertheless. For content holders format conversion is one strategy to assure long term access to their digital resources. However, currently there is still no standardised automatic procedure for the evaluation of format conversions available. Mainly in the case where format conversion is used as a strategy for long time preservation of digital content, this is a serious gap. With the Extensible Characterisation Languages (XCL) we want to address the problem of automatic evaluation of format conversions.
A User Field Study: Communication in Academic Communities and Government Agencies BIBAFull-Text 447-449
  Filip Kruse; Annette Balle Sørensen; Bart Ballaux; Birte Christensen-Dalsgaard; Hans Hofman; Michael Poltorak Nielsen; John W. Pattenden-Fail; Seamus Ross; Kellie Snow; Jørn Thøgersen
The preliminary findings of a study focusing on communication in academic communities and government agencies are outlined. The study was conducted within the academic community at British and Danish universities and government agencies in The Netherlands, using the 'Contextual Design' approach and 'Cultural Probes'. Qualitative data on researchers' and government agents' communicative and interactive behaviour were collected and an affinity analysis carried out. The analysis produced two types of results; 1) a conceptual model of flow from idea to dissemination, and 2) a catalogue of central elements of the communicative and collaborative behaviour of researchers and government agents. These results will be further explored and validated by means of a questionnaire based survey of academic communities and government agencies.
Digital Preservation Needs of Scientific Communities: The Example of Göttingen University BIBAKFull-Text 450-452
  Heike Neuroth; Stefan Strathmann; Sven Vlaeminck
Digital information has become an integral part of our cultural and scientific heritage. We are increasingly confronted with scientific findings, historical events and cultural achievements presented in electronic form. The rapid pace of technical change is causing data carriers and data formats to age quickly. The result is an acute threat to the long-term usability of digital objects which serve as sources for science and research. The necessity for long-term preservation has to be anchored in the social context of the national information, research and cultural policy, and the global integrations of science and research. To examine the preservation needs in the context of large scaled research facilities the awareness and practices at the University of Göttingen and at the ETH Zürich was explored. As a first step, an online questionnaire was developed and conducted in summer 2007. The poster explains first findings of the online survey.
Keywords: digital preservation; university; metadata; survey
Dynamic Catalogue Enrichment with SeeAlso Link Servers BIBAFull-Text 453-454
  Jakob Voß
The poster presents architecture and usage of SeeAlso, a simple protocol for link servers that is used to dynamically enrich catalogues of libraries in the German Common library network GBV.
Access to Archival Finding Aids: Context Matters BIBAFull-Text 455-457
  Junte Zhang; Khairun Nisa Fachry; Jaap Kamps
We detail the design of a search engine for archival finding aids based on an XML database system. The resulting system shows results -- which can vary in granularity from individual archival items to the whole fonds -- within the context of the archive. The presentation preserves the archival structure by providing important contextual information, and all individual results can be "clicked", warping the user to the full finding aid with the selected part in focus.