ECDL 2010: Proceedings of the European Conference on Digital Libraries

Fullname:ECDL 2010: Research and Advanced Technology for Digital Libraries: 14th European Conference
Editors:Mounia Lalmas; Joemon Jose; Andreas Rauber; Fabrizio Sebastiani; Ingo Frommholz
Location:Glasgow, UK
Dates:2010-Sep-06 to 2010-Sep-10
Publisher:Springer Berlin Heidelberg
Series:Lecture Notes in Computer Science 6273
Standard No:DOI: 10.1007/978-3-642-15464-5; ISBN: 978-3-642-15463-8 (print), 978-3-642-15464-5 (online); hcibib: ECDL10
Links:Online Proceedings | DBLP Contents
  1. System Architectures
  2. Metadata
  3. Multimedia IR
  4. Interaction and Interoperability
  5. Digital Preservation
  6. Social Web/Web 2.0
  7. Search in Digital Libraries
  8. (Meta) Analysis of Digital Libraries
  9. Query Log Analysis
  10. Cooperative Work in DLs
  11. Ontologies
  12. Domain-Specific DLs
  13. Posters
  14. Demos
Keynote: The Web Changes Everything: Understanding and Supporting People in Dynamic Information Environments BIBAFull-Text 1
  Susan T. Dumais
Most digital library resources and the Web more generally are dynamic and ever-changing collections of information. However, most of the tools that have been developed for interacting with Web and DL content, such as browsers and search engines, focus on a single static snapshot of the information. In this talk, I will present analyses of how web content changes over time, how people re-visit web pages over time, and how re-visitation patterns are influenced by user intent and changes in content. These results have implications for many aspects of search including crawling, ranking algorithms, result presentation and evaluation. I will describe a prototype that supports people in understanding how information they interact with changes over time, by highlighting what content has changed since their last visit. Finally, I will describe a new retrieval model that represents features about the temporal evolution of content to inform crawl policy and improve ranking.

System Architectures

Modelling Digital Libraries Based on Logic BIBAFull-Text 2-13
  Carlo Meghini; Nicolas Spyratos; Tsuyoshi Sugibuchi
We present a data model for digital libraries supporting identification, description and discovery of digital objects. The model is formalized as a first-order theory, certain models of which correspond to the intuitive notion of digital library. Our main objective is to lay the foundations for the design of an API offering the above functionality. Additionally, we use our formal framework to discuss the adequacy of the Resource Description Framework with respect to the requirements of digital libraries.
General-Purpose Digital Library Content Laboratory Systems BIBAFull-Text 14-21
  Paolo Manghi; Marko Mikulicic; Leonardo Candela; Michele Artini; Alessia Bardi
In this work, we name Digital Library Content Laboratories (DLCLs) software systems specially devised for aggregating and elaborating over information objects -- e.g., publications, experimental data, multimedia and compound objects -- collected from possibly heterogeneous and autonomous data sources. We present a general-purpose and cost-efficient system for the construction of customized DLCLs, based on the D-NET Software Toolkit. D-NET offers a service-oriented framework, where developers can choose the set of services they need, customize them to match domain requirements, and combine them in a "LEGO fashion" to obtain a personalized DLCL. D-NET is currently the enabling software of several DLCLs, operated by European Commission projects and national initiatives.
Component-Based Authoring of Complex, Petri net-based Digital Library Infrastructure BIBAKFull-Text 22-29
  Yungah Park; Unmil Karadkar; Richard Furuta
caT, a Petri net-based hypertext system, serves as a platform for unified modeling of digital library infrastructure and its governing policies, user characteristics, and their contextual information. Traditionally, users have created caT networks from scratch, thus limiting their use to small collections. In this paper we introduce TcAT, a component-based authoring tool, which enables the creation of large caT nets that can represent interaction-rich, real-life spaces such as libraries and museums. TcAT implements composition operations from Petri net theory, allowing authors to select and modify existing net fragments as templated building blocks for larger networks. Authors may switch between visual and textual modes at will, thus combining the strengths of expressing large nets textually and selecting net fragments via point-and-click interaction. A user evaluation of the new authoring mechanisms suggests that this is a promising tool for improving the efficiency of experienced users as well as that of novice users, who are unfamiliar with the Petri net formalism.
Keywords: caT; Petri net-based hypertext; digital library infrastructure


Uncovering Hidden Qualities -- Benefits of Quality Measures for Automatically Generated Metadata BIBAKFull-Text 30-37
  Sascha Tönnies; Wolf-Tilo Balke
Today, digital libraries more and more have to rely on semantic techniques during the workflows of metadata generation, search and navigational access. But, due to the statistical and/or collaborative nature of such techniques, the underlying quality of automatically generated metadata is questionable. Since data quality is essential in digital libraries, we present a user study on one hand evaluating metrics for quality assessment, on the other hand evaluating their benefit for the individual user during interaction. To observe the interaction of domain experts in the sample field of chemistry, we transferred the abstract metrics' outcome for a sample semantic technique into three different kinds of visualizations and asked the experts to evaluate these visualizations first without, later augmented with the quality information. We show that the generated quality information is indeed not only essential for data quality assurance in the curation step of digital libraries, but will also be helpful for designing intuitive interaction interfaces for end-users.
Keywords: Digital Libraries; Information Quality; Semantic Technologies
Query Transformation in a CIDOC CRM Based Cultural Metadata Integration Environment BIBAKFull-Text 38-45
  Manolis Gergatsoulis; Lina Bountouri; Panorea Gaitanou; Christos Papatheodorou
The wide use of a number of cultural heritage metadata schemas imposes the development of new interoperability techniques that facilitate unified access to cultural resources. In this paper, we focus on the ontology based semantic integration by proposing an expressive mapping language for the specification of the mappings between the XML-based metadata schemas and the CIDOC CRM ontology. We also present an algorithm for the transformation of XPath queries posed on XML-based metadata into equivalent queries on the CIDOC CRM ontology.
Keywords: Metadata interoperability; semantic integration; query transformation; mapping languages; metadata schemas
User-Contributed Descriptive Metadata for Libraries and Cultural Institutions BIBAKFull-Text 46-54
  Michael A. Zarro; Robert B. Allen
The Library of Congress and other cultural institutions are collecting highly informative user-contributed metadata as comments and notes expressing historical and factual information not previously identified with a resource. In this observational study we find a number of valuable annotations added to sets of images posted by the Library of Congress on the Flickr Commons. We propose a classification scheme to manage contributions and mitigate information overload issues. Implications for information retrieval and search are discussed. Additionally, the limits of a "collection" are becoming blurred as connections are being built via hyperlinks to related resources outside of the library collection, such as Wikipedia and locally relevant websites. Ideas are suggested for future projects, including interface design and institutional use of user-contributed information.
Keywords: Annotation; Descriptors; Metadata; Social Media

Multimedia IR

An Approach to Content-Based Image Retrieval Based on the Lucene Search Engine Library BIBAKFull-Text 55-66
  Claudio Gennaro; Giuseppe Amato; Paolo Bolettieri; Pasquale Savino
Content-based image retrieval is becoming a popular way for searching digital libraries as the amount of available multimedia data increases. However, the cost of developing from scratch a robust and reliable system with content-based image retrieval facilities for large databases is quite prohibitive.
   In this paper, we propose to exploit an approach to perform approximate similarity search in metric spaces developed by [3,6]. The idea at the basis of these techniques is that when two objects are very close one to each other they 'see' the world around them in the same way. Accordingly, we can use a measure of dissimilarity between the views of the world at different objects, in place of the distance function of the underlying metric space. To employ this idea the low level image features (such as colors and textures) are converted into a textual form and are indexed into the inverted index by means of the Lucene search engine library. The conversion of the features in textual form allows us to employ the Lucene's off-the-shelf indexing and searching abilities with a little implementation effort. In this way, we are able to set up a robust information retrieval system that combines full-text search with content-based image retrieval capabilities.
Keywords: Approximate Similarity Search; Access Methods; Lucene
Evaluation Constructs for Visual Video Summaries BIBAKFull-Text 67-79
  Stina Westman
This paper reports on a user-centered evaluation of visual video summaries. We evaluated four types of summaries (fast-forward, user-controlled fast-forward, scene clips and storyboard) with a set of existing performance and satisfaction measures. We further conducted a repertory grid elicitation with our participants gathering evaluation constructs related to both video summary content and controls. Results showed a lack of correlation between performance and satisfaction measures. User-supplied evaluation constructs were shown to span both the performance and satisfaction dimensions of the video summary evaluation space. Most constructs achieved moderate to good inter-rater agreement in a consequent survey.
Keywords: video summarization; evaluation measures; repertory grid
Visual Expression for Organizing and Accessing Music Collections in MusicWiz BIBAKFull-Text 80-91
  Konstantinos A. Meintanis; Frank M., III Shipman
Music services, media players and managers provide support for content classification and access based on filtering metadata values, statistics of access, and user ratings. This approach fails to capture characteristics of mood and personal history that are often the deciding factor when creating personal playlists and collections in music. This paper presents MusicWiz, a music management environment that combines traditional metadata with spatial hypertext-based expression and automatically extracted characteristics of music to generate personalized associations between songs. MusicWiz's similarity inference engine combines the personal expression in the workspace with assessments of similarity based on the artists, other metadata, lyrics, and the audio signal to make suggestions and to generate playlists. An evaluation of MusicWiz with and without the workspace and suggestion capabilities showed significant differences for organizing and playlist creation tasks. The workspace features were more valuable for organizing tasks while the suggestion features had more value for playlist creation activities.
Keywords: Spatial hypertext; media managers; music recommendation

Interaction and Interoperability

An Architecture for Supporting RFID-Enhanced Interactions in Digital Libraries BIBAFull-Text 92-103
  George Buchanan; Jennifer Pearson
In this paper, we report the design of an RFID sensing infrastructure for digital libraries. In addition to the architecture of the system, we report its deployment in three different applications to illustrate its use and integration with not only the core DL software, but also web browsers and software for reading documents (e.g. in PDF format). Through this, we demonstrate the utility of RFID support across the entire information seeking cycle.
New Evidence on the Interoperability of Information Systems within UK Universities BIBAKFull-Text 104-115
  Kathleen Menzies; Duncan Birrell; Gordon Dunsire
This paper will report on the key findings and implications of the JISC-funded Online Catalogue and Repository Interoperability Study (OCRIS), a 3 month project which investigated the interoperability of Online Public Access Catalogues (OPACs) and Institutional Repositories (IRs) within UK Higher Education Institutions (HEIs). The aims and objectives of the project included: surveying the extent to which repository content is in scope for OPACs and the extent to which it is already recorded there; listing the various services to managers, researchers, teachers and learners offered by these systems; identifying the potential for improvements in the links from repositories and/or OPACs to other institutional services such as finance or research administration.
   The project combined quantitative and qualitative methods; primarily, an online questionnaire distributed to staff within 85 UK HEIs, purposive sampling and two in-depth case studies conducted at the Universities of Cambridge and Glasgow.
Keywords: Interoperability; digital libraries; repositories; catalogues; standards; resource discovery platforms
Enhancing Digital Libraries with Social Navigation: The Case of Ensemble BIBAKFull-Text 116-123
  Peter Brusilovsky; Lillian N. Cassel; Lois M. L. Delcambre; Edward A. Fox; Richard Furuta; Daniel D. Garcia; Frank M., III Shipman; Paul Logasa, II Bogen; Michael Yudelson
A traditional library is a social place, however the social nature of the library is typically lost when the library goes digital. This paper argues social navigation, an important group of social information access techniques, could be used to replicate some social features of traditional libraries and to enhance the user experience. Using the case of Ensemble, a major educational digital library, the paper describes how social navigation could be used to extend digital library portals, how social wisdom can be collected, and how it can be used to guide portal users to valuable resources.
Keywords: social navigation; digital library; portal; navigation support

Digital Preservation

Automating Logical Preservation for Small Institutions with Hoppla BIBAFull-Text 124-135
  Stephan Strodl; Petar Petrov; Michael Greifeneder; Andreas Rauber
Preserving digital information over the long term becomes increasing important for large number of institutions. The required expertise and limited tool support discourage especially small institutions from operating archives with digital preservation capabilities. Hoppla is an archiving solution that combines back-up and fully automated migration services for data collections in environments with limited expertise and resources for digital preservation. The system allows user-friendly handling of services and outsources digital preservation expertise. This paper presents the automated logical preservation process of the Hoppla archiving system in detail. It describes the recommendation process for appropriate preservation strategies via a web update service. A set of two real world case studies were conducted based on a first rules set focused on common office documents. The promising results sustain the novel approach of automating logical preservation by outsourcing expertise.
Estimating Digitization Costs in Digital Libraries Using DiCoMo BIBAKFull-Text 136-147
  Alejandro Bia; Rafael Muñoz; Jaime Gómez
The estimate of digitization costs is a very difficult task. It is difficult to make exact predictions due to the great quantity of unknown factors. However, digitization projects need to have a precise idea of the economic costs and the times involved in the development of their contents. The common practice when we start digitizing a new collection is to set a schedule, and a firm commitment to fulfill it (both in terms of cost and deadlines), even before the actual digitization work starts. As it happens with software development projects, incorrect estimates produce delays and cause costs overdrafts.
   Based on methods used in Software Engineering for software development cost prediction like COCOMO and Function Points, and using historical data gathered during five years at the Miguel de Cervantes Digital Library, during the digitization of more than 12.000 books, we have developed a method for time and cost estimates named DiCoMo (Digitization Costs Model) for digital content production in general. This method can be adapted to different production processes, like the production of digital XML or HTML texts using scanning and OCR, and undergoing human proofreading and error correction, or for the production of digital facsimiles (scanning without OCR). The accuracy of the estimates improve with time, since the algorithms can be optimized by making adjustments based on historical data gathered from previous tasks.
Keywords: Cost and time estimates; Digitization; Contents Production; DL Project management
In Pursuit of an Expressive Vocabulary for Preserved New Media Art BIBAFull-Text 148-155
  Andrew McHugh; Leonidas Konstantelos
The status of the new media, interactive and performance art context appears to complicate our ability to follow conventional preservation approaches. Documentation of digital art materials has been determined to be an appropriate means of resolving associated difficulties, but this demands high levels of expressiveness to support the encapsulation of the myriad elements and qualities of content and context that may influence value and reproducibility. We discuss a proposed Vocabulary for Preserved New Media Works, a means of encapsulating the various information and material dimensions implicit within a work and required to ensure its ongoing availability.

Social Web/Web 2.0

Privacy-Aware Folksonomies BIBAFull-Text 156-167
  Clemens Heidinger; Erik Buchmann; Matthias Huber; Klemens Böhm; Jörn Müller-Quade
Many popular web sites use folksonomies to let people label objects like images (Flickr), music (Last.fm), or URLs (Delicous) with schema-free tags. Folksonomies may reveal personal information. For example, tags can contain sensitive information, the set of tagged objects might disclose interests, etc. While many users call for sophisticated privacy mechanisms, current folksonomy systems provide coarse mechanisms at most, and the system provider has access to all information. This paper proposes a privacy-aware folksonomy system. Our approach consists of a partitioning scheme that distributes the folksonomy data among four providers and makes use of encryption. A key sharing mechanism allows a user to control which party is able to access which data item she has generated. We prove that our approach generates folksonomy databases that are indistinguishable from databases consisting of random tuples.
Seamless Web Editing for Curated Content BIBAFull-Text 168-175
  David Bainbridge; Brook J. Novak
In this paper we present a new framework for editing that we have called Seaweed (short for seamless web editing) which enables authors to directly edit content on web pages within any common web browser -- much like a word-processor -- without the need of switching between modes. There are numerous ways to utilise the technique. This article reports on work integrating it with blogging software to support the direct creation and editing of curated content, and its subsequent evaluation through two field trials.
Automatic Classification of Social Tags BIBAFull-Text 176-183
  Christian Wartena
Collaborative tagging has become popular in recent years. As was noted in several studies completely different types of tags are found. Tags either can refer to the personal usage context of a tagger or can describe the tagged object. We investigate different types of tags found in LibraryThing, an online service in which books are tagged, and define a number of features that are typical for some of these classes. Finally, we show how these features can be used to classify tags automatically.

Search in Digital Libraries

Exploring the Impact of Search Interface Features on Search Tasks BIBAFull-Text 184-195
  Abdigani Diriye; Ann Blandford; Anastasios Tombros
There is growing recognition that exploratory search is less well supported by existing search interfaces than known-item search. In this paper, we report on a study in which three interfaces providing different levels of search support were developed and tested, for both known item and exploratory search tasks. A rich qualitative analysis of participants' search behaviours and perceptions was conducted. As expected, the simplest interface provided better support for known item than for exploratory search tasks. Conversely, richer search interface features were found to provide better support for exploratory search, but would distract people from the objective of more clearly defined search tasks. This study provides preliminary evidence that searching is most effective when supported by an interface that is tailored towards the search activities of the task.
Relevance in Technicolor BIBAFull-Text 196-207
  Ulises Cerviño Beresi; Yunhyong Kim; Dawei Song; Ian Ruthven; Mark Baillie
In this article we propose the concept of relevance criteria profiles, which provide a global view of user behaviour in judging the relevance of retrieved information. We further propose a plotting technique which provides a session based overview of the relevance judgement processes interlaced with interactions that allow the researcher to visualise and quickly detect emerging patterns in both interactions and relevance criteria usage. We discuss by example, using data from a user study conducted between the months of January and August of 2008, how these tools support the better understanding of task based user valuation of documents that is likely to lead to recommendations for improving end-user services in digital libraries.
Application of Session Analysis to Search Interface Design BIBAFull-Text 208-215
  Cathal Hoare; Humphrey Sorensen
Evaluations of search features used in digital library environments are generally results centric, focussing on the outcome of an evaluation -- for example, the number of relevant documents retrieved -- rather than garnering an understanding of why that result was achieved. This paper explores how search feature development benefits from user-centered evaluation. By examining the application of an established web analytics technique, session analysis, to the development of search features and interfaces, it will be shown that designers can better understand how users conduct evaluation tasks. The feedback provided by this technique allows for clearer evaluation of an interface and admits iteratively evolving designs that are based on empirical data.

(Meta) Analysis of Digital Libraries

An Analysis of the Evolving Coverage of Computer Science Sub-fields in the DBLP Digital Library BIBAFull-Text 216-227
  Florian Reitz; Oliver Hoffmann
Many scientists and research groups make use of the DBLP bibliographic project collection in various ways. Most of them are unaware of its internal structure, although it can have significant influence on their results. Prior work has shown that the collection does not cover all sub-fields of computer science in the same quality but has not provided an explanation for these differences. We introduce an extension of the DBLP data set which gives us a detailed picture on how DBLP has evolved since 1995. We show that the project started with a narrow focus on two sub-fields and discuss how additional themes have been added in recent years. We analyze the relations between sub-fields at different times and provide a model which explains the differences in coverage.
Analysis of Computer Science Communities Based on DBLP BIBAKFull-Text 228-235
  Maria Biryukov; Cailing Dong
It is popular nowadays to bring techniques from bibliometrics and scientometrics into the world of digital libraries to explore mechanisms which underlie community development. In this paper we use the DBLP data to investigate the author's scientific career, and analyze some of the computer science communities. We compare them in terms of productivity and population stability, and use these features to compare the sets of top-ranked conferences with their lower ranked counterparts.
Keywords: bibliographic databases; author profiling; scientific communities; bibliometrics
Citation Graph Based Ranking in Invenio BIBAKFull-Text 236-247
  Ludmila Marian; Jean-Yves LeMeur; Martin Rajman; Martin Vesely
Invenio is the web-based integrated digital library system developed at CERN. Within this framework, we present four types of ranking models based on the citation graph that complement the simple approach based on citation counts: time-dependent citation counts, a relevancy ranking which extends the PageRank model, a time-dependent ranking which combines the freshness of citations with PageRank and a ranking that takes into consideration the external citations. We present our analysis and results obtained on two main data sets: Inspire and CERN Document Server. Our main contributions are: (i) a study of the currently available ranking methods based on the citation graph; (ii) the development of new ranking methods that correct some of the identified limitations of the current methods such as treating all citations of equal importance, not taking time into account or considering the citation graph complete; (iii) a detailed study of the key parameters for these ranking methods.
Keywords: CDS; Invenio; Inspire; citation graph; PageRank; external citations; time decay

Query Log Analysis

A Search Log-Based Approach to Evaluation BIBAFull-Text 248-260
  Junte Zhang; Jaap Kamps
Anyone offering content in a digital library is naturally interested in assessing its performance: how well does my system meet the users' information needs? Standard evaluation benchmarks have been developed in information retrieval that can be used to test retrieval effectiveness. However, these generic benchmarks focus on a single document genre, language, media-type, and searcher stereotype that is radically different from the unique content and user community of a particular digital library. This paper proposes to derive a domain-specific test collection from readily available interaction data in search logs files that captures the domain-specificity of digital libraries. We use as case study an archival institution's complete search log that spans over multiple years, and derive a large-scale test collection. We manually derive a set of topics judged by human experts -- based on a set of e-mail reference questions and responses from archivists -- and use this for validation. Our main finding is that we can derive a reliable and domain-specific test collection from search log files.
Determining Time of Queries for Re-ranking Search Results BIBAFull-Text 261-272
  Nattiya Kanhabua; Kjetil Nørvåg
Recent work on analyzing query logs shows that a significant fraction of queries are temporal, i.e., relevancy is dependent on time, and temporal queries play an important role in many domains, e.g., digital libraries and document archives. Temporal queries can be divided into two types: 1) those with temporal criteria explicitly provided by users, and 2) those with no temporal criteria provided. In this paper, we deal with the latter type of queries, i.e., queries that comprise only keywords, and their relevant documents are associated to particular time periods not given by the queries. We propose a number of methods to determine the time of queries using temporal language models. After that, we show how to increase the retrieval effectiveness by using the determined time of queries to re-rank the search results. Through extensive experiments we show that our proposed approaches improve retrieval effectiveness.
Ranking Entities Using Web Search Query Logs BIBAFull-Text 273-281
  Bodo Billerbeck; Gianluca Demartini; Claudiu S. Firan; Tereza Iofciu; Ralf Krestel
Searching for entities is an emerging task in Information Retrieval for which the goal is finding well defined entities instead of documents matching the query terms. In this paper we propose a novel approach to Entity Retrieval by using Web search engine query logs. We use Markov random walks on (1) Click Graphs -- built from clickthrough data -- and on (2) Session Graphs -- built from user session information. We thus provide semantic bridges between different query terms, and therefore indicate meaningful connections between Entity Retrieval queries and related entities.

Cooperative Work in DLs

Examining Group Work: Implications for the Digital Library as Sharium BIBAKFull-Text 282-293
  Sandra Toze; Elaine G. Toms
Digital libraries have the potential to be rich interactive environments or "shariums" that support students who work in groups to complete course work. To understand how DLs might realize this potential, the processes of a single group working on a complex project over a semester were analyzed. Findings suggest that groups perform a range of tasks including administrative, communication and information seeking and retrieval, and use multiple tools and artifacts to accomplish their work. Over the course of the work, activities shift from the individual to group illustrating the need for a complex system that intertwines public and private work space. Currently DLs provide only one tool -- search -- that a group might use, but do not fully support groupwork.
Keywords: collaboration; group work; design; digital library; methodology
Architecture for a Collaborative Research Environment Based on Reading List Sharing BIBAKFull-Text 294-306
  Gabriella Kazai; Paolo Manghi; Katerina Iatropoulou; Tim Haughton; Marko Mikulicic; Antonis Lempesis; Natasa Milic-Frayling; Natalia Manola
Scholarly research involves a systematic study of information sources in order to establish facts and reach new conclusions. It encompasses survey, analysis, evaluation, and creation as distinct phases that are performed iteratively and often in parallel by accessing a range of local and remote resources. Throughout these activities scholars create collections of relevant work, ranging from publication references to new information acquired through experiments or correspondence with other scholars. We use the term reading list to refer to such collections. Existing software packages or web services for managing publication lists, like CiteULike, lack integration with researchers' workflow which may require access to both desktop and online resources. In this paper we describe the architecture and system design of ScholarLynk, a desktop tagging tool that enables researchers to build and maintain reading lists across distributed data stores, in collaboration with other researchers.
Keywords: Desktop tagging tool; scholarly research; reading lists
CritSpace: A Workspace for Critical Engagement within Cultural Heritage Digital Libraries BIBAFull-Text 307-314
  Neal Audenaert; George Lucchese; Richard Furuta
Cultural heritage digital libraries hold promise both as a new tool for representing the complex information structures frequently found in the humanities and social sciences and as interactive environments that enable scholars to work with this information in new ways throughout the research project. Much attention has been paid to digitization, textual encoding, metadata and dissemination of digital cultural heritage data. Scholars now routinely turn toward electronic sources as a first step in their information finding process. Considerably less attention, however, has been devoted to understanding how to support the formative stages of scholarly research.
   In this paper, we highlight our finding from a formative user study of scholarly analysis of source documents in several different fields. We discuss the implications of these results for our current research into designing a web-based creativity support environment for cultural heritage digital libraries.


German Encyclopedia Alignment Based on Information Retrieval Techniques BIBAFull-Text 315-326
  Roman Kern; Michael Granitzer
Collaboratively created online encyclopedias have become increasingly popular. Especially in terms of completeness they have begun to surpass their printed counterparts. Two German publishers of traditional encyclopedias have reacted to this challenge and decided to merge their corpora to create a single more complete encyclopedia. The crucial step in this merge process is the alignment of articles. We have developed a system to identify corresponding entries from different encyclopedic corpora. The base of our system is the alignment algorithm which incorporates various techniques developed in the field of information retrieval. We have evaluated the system on four real-world encyclopedias with a ground truth provided by domain experts. A combination of weighting and ranking techniques has been found to deliver a satisfying performance.
Lightweight Parsing of Classifications into Lightweight Ontologies BIBAFull-Text 327-339
  Aliaksandr Autayeu; Fausto Giunchiglia; Pierre Andrews
Understanding metadata written in natural language is a premise to successful automated integration of large scale, language-rich, classifications such as the ones used in digital libraries. We analyze the natural language labels within classification by exploring their syntactic structure, we then show how this structure can be used to detect patterns of language that can be processed by a lightweight parser with an average accuracy of 96.82%. This allows for a deeper understanding of natural language metadata semantics, which we show can improve by almost 18% the accuracy of the automatic translation of classifications into lightweight ontologies required by semantic matching, search and classification algorithms.
Measuring Effectiveness of Geographic IR Systems in Digital Libraries -- Evaluation Framework and Case Study BIBAFull-Text 340-351
  Damien Palacio; Guillaume Cabanac; Christian Sallaberry; Gilles Hubert
Common search engines process users' queries (i.e., information needs) by retrieving documents from pre-built term-based indexes. For digital libraries, such approaches are limited regarding particular contexts, such as specialized collections (e.g., cultural heritage collections) or specific retrieval criteria (e.g., multidimensional criteria). In this paper, we consider Information Retrieval systems exploiting geographic dimensions: spatial, temporal, and topical dimensions. Our contribution is twofold as we propose a Geographic Information Retrieval system evaluation framework and test the following hypothesis: combining spatial and temporal dimensions along with the topical dimension improves the effectiveness of Information Retrieval systems.

Domain-Specific DLs

A Visual Digital Library Approach for Time-Oriented Scientific Primary Data BIBAKFull-Text 352-363
  Jürgen Bernard; Jan Brase; Dieter W. Fellner; Oliver Koepler; Jörn Kohlhammer; Tobias Ruppert; Tobias Schreck; Irina Sens
Digital Library support for textual and certain types of non-textual documents has significantly advanced over the last years. While Digital Library support implies many aspects along the whole library workflow model, interactive and visual retrieval allowing effective query formulation and result presentation are important functions. Recently, new kinds of non-textual documents which merit Digital Library support, but yet cannot be accommodated by existing Digital Library technology, have come into focus. Scientific primary data, as produced for example, by scientific experimentation, earth observation, or simulation, is such a data type. We report on a concept and first implementation of Digital Library functionality, supporting visual retrieval and exploration in a specific important class of scientific primary data, namely, time-oriented data. The approach is developed in an interdisciplinary effort by experts from the library, natural sciences, and visual analytics communities. In addition to presenting the concept and discussing relevant challenges, we present results from a first implementation of our approach as applied on a real-world scientific primary data set.
Keywords: Visual Analysis; Visual Search; Content-Based Search; Scientific Primary Data; Visual Cluster Analysis
DINAH, A Philological Platform for the Construction of Multi-structured Documents BIBAFull-Text 364-375
  Pierre-Edouard Portier; Sylvie Calabretto
We consider how the construction of multi-structured documents implies the definition of structuration vocabularies. In a multi-users context, the growth of these vocabularies has to be controlled. Therefore, we propose using the trace of users activity to limit this growth and document the vocabularies. A user will, for example, be able to follow and annotate the track of a vocabulary concept: from its creation to the last time it was used. From a broader point of view, this work is grounded on our Web based philological platform, DINAH, and is mainly motivated by our collaboration with a group of philosophers studying the handwritten manuscripts of Jean-Toussaint Desanti.
The PROBADO Project -- Approach and Lessons Learned in Building a Digital Library System for Heterogeneous Non-textual Documents BIBAFull-Text 376-383
  René Berndt; Ina Blümel; Michael Clausen; David Damm; Jürgen Diet; Dieter W. Fellner; Christian Fremerey; Reinhard Klein; Frank Krahl; Maximilian Scherer
The PROBADO project is a research effort to develop and operate advanced Digital Library support for non-textual documents. The main goal is to contribute to all parts of the Digital Library work flow from content acquisition over indexing to search and presentation. While not limited in terms of supported document types, reference support is developed for classical digital music and 3D architectural models. In this paper, we review the overall goals, approaches taken, and lessons learned so far in a highly integrated effort of university researchers and library experts. We address the problem of technology transfer, aspects of repository compilation, and the problem of inter-domain retrieval. The experiences are relevant for other project efforts in the non-textual Digital Library domain.


Capacity-Constrained Query Formulation BIBAFull-Text 384-388
  Matthias Hagen; Benno Stein
Given a set of keyphrases, we analyze how Web queries with these phrases can be formed that, taken altogether, return a specified number of hits. The use case of this problem is a plagiarism detection system that searches the Web for potentially plagiarized passages in a given suspicious document. For the query formulation problem we develop a heuristic search strategy based on co-occurrence probabilities. Compared to the maximal termset strategy [3], which can be considered as the most sensible non-heuristic baseline, our expected savings are on average 50% when queries for 9 or 10 phrases are to be constructed.
AAT-Taiwan: Toward a Multilingual Access to Cultural Objects BIBAKFull-Text 389-392
  Shu-Jiun Chen; Diane Wu; Pei-Wen Peng; Yung-Ting Chang
This paper reports on current collaborative work between Taiwan e-Learning and Digital Archives Program (TELDAP) and Getty Research Institute (GRI) in developing the Chinese-language Art & Architecture Thesaurus (AAT-Taiwan) which supports the unification of terminology used by various archiving institutions for describing identical concepts. This work aims to establish a conceptual framework for the digital library by providing controlled vocabularies to index and catalogue the collection. With its multilingual nature, AAT Taiwan is able to bridge Western and Eastern culture in an integrated framework, and make our resources accessible worldwide. With its hierarchical structure, it also enhances the effectiveness and comprehensiveness of information retrieval in digital libraries.
Keywords: digital library; multilingual thesaurus; knowledge organization system
Using Pattern Language as a Framework for Future Metadata Structure BIBAFull-Text 393-396
  Esben Agerbæk Black
In the 1970's Christopher Alexander envisioned the "pattern language". It contains an underlying philosophy [1] of what to accomplish by using pattern language; it is this philosophy we tap into and apply to metadata planning.
   Different collections needs different metadata to be of future use; this information has a structure, we aim to reuse knowledge of, and standardize the creation of these structures. We further believe pattern language will ease the transition of existing digital collections.
i-TEL-u: A Query Suggestion Tool for Integrating Heterogeneous Contexts in a Digital Library BIBAFull-Text 397-400
  Maristella Agosti; Davide Cisco; Giorgio Maria Di Nunzio; Ivano Masiero; Massimo Melucci
This paper presents the design, implementation and evaluation of a query suggestion tool (named i-TEL-u) that allows for the management and the exploitation of different contexts in an integrated way within the same search interface for accessing the contents of The European Library portal. i-TEL-u allows users to seamlessly move from one context to another according to their information needs and to the way these needs evolve during the search session. The aim of this tool is to improve the search functionalities of the portal, attract many users and give them easy and effective access.
The Planets Testbed -- A Collaborative Research Environment for Digital Preservation BIBAFull-Text 401-404
  Brian Aitken; Seamus Ross; Andrew Lindley; Edith Michaeler; Andrew N. Jackson; Maurice van den Dobbelsteen
The digital objects that are so fundamental to 21st century life may have a precarious future due to the rapid pace of technological change. Digital preservation specialists have proposed an almost overwhelming variety of preservation actions and tools that may help to mitigate this risk, but there is a lack of empirical evidence to help librarians, archivists and non-specialists to make an informed decision about the most applicable and effective preservation tools. The Planets project has developed a digital preservation Testbed that aims to provide such an evidence-base.
A Functionality Perspective on Digital Library Interoperability BIBAFull-Text 405-408
  George Athanasopoulos; Edward A. Fox; Yannis E. Ioannidis; George Kakaletris; Natalia Manola; Carlo Meghini; Andreas Rauber; Dagobert Soergel
Digital Library (DL) interoperability requires addressing a variety of issues associated with functionality. We report on the analysis and solutions identified by the Functionality Working Group of the DL.org project during its deliberations on DL interoperability. Ultimately, we hope that work based on our perspective will lead to improved architectures and software, as well as to greater interoperability, for next-generation DL systems.
Overview and Results of the INEX 2009 Interactive Track BIBAFull-Text 409-412
  Thomas Beckers; Norbert Fuhr; Nils Pharo; Ragnar Nordlie; Khairun Nisa Fachry
We present results of the INEX 2009 Interactive Track which focussed on how users behave in interactive search systems. Three types of working tasks based on a collection of book metadata were regarded. The results show differences with respect to the task types and point out improvements and new research questions for the next track in 2010.
SciPlore Xtract: Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size) BIBAKFull-Text 413-416
  Jöran Beel; Bela Gipp; Ammar Shaker; Nick Friedrich
Extracting titles from a PDF's full text is an important task in information retrieval to identify PDFs. Existing approaches apply complicated and expensive (in terms of calculating power) machine learning algorithms such as Support Vector Machines and Conditional Random Fields. In this paper we present a simple rule based heuristic, which considers style information (font size) to identify a PDF's title. In a first experiment we show that this heuristic delivers better results (77.9% accuracy) than a support vector machine by CiteSeer (69.4% accuracy) in an 'academic search engine' scenario and better run times (8:19 minutes vs. 57:26 minutes).
Keywords: header extraction; title extraction; style information; document analysis
Academic Publication Management with PUMA -- Collect, Organize and Share Publications BIBAKFull-Text 417-420
  Dominik Benz; Andreas Hotho; Robert Jäschke; Gerd Stumme; Axel Halle; Angela Gerlach Sanches Lima; Helge Steenweg; Sven Stefani
The PUMA project fosters the Open Access movement und aims at a better support of the researcher's publication work. PUMA stands for an integrated solution, where the upload of a publication results automatically in an update of both the personal and institutional homepage, the creation of an entry in a social bookmarking systems like BibSonomy, an entry in the academic reporting system of the university, and its publication in the institutional repository. In this poster, we present the main features of our solution.
Keywords: Publication Management; Puma; BibSonomy; Open Access; Institutional Repository; Tagging; Bookmarking; Metadata Sharing
Using Mind Maps to Model Semistructured Documents BIBAKFull-Text 421-424
  Alejandro Bia; Rafael Muñoz; Jaime Gómez
We often use UML diagrams for our software development projects, and also for modeling XML DTDs and Schemas [1], finding that although UML diagrams can effectively be made to represent DTDs and Schemas (either using Class or Component diagrams), in real practice, complex DTDs and Schemas produce unreadable, unmanageable, complex UML diagrams. Recently we started exploring other types of diagrams and unconventional methods which can be both useful for designing and modeling semistructured data, and as teaching aids or thinking tools. This experience also served to open our minds to tools and methods other than the recognized mainstream practices.
   In this paper, we describe how we managed to use Mind Maps and a modified Freemind tool to successfully model, design, modify, import and export XML DTDs, XML Schemas (XSD and RNG) and also XML document instances, getting very manageable, easily comprehensible, folding diagrams. In this way, we converted a general purpose mind-mapping tool, into a very powerful tool for XML vocabulary design and simplification (and also for teaching XML markup, or for presentation purposes).
Keywords: Visual Modeling; Mind Maps; XML; DTD; Schema
Towards a Public Library Digital Service Taxonomy BIBAKFull-Text 425-428
  Steven Buchanan; David McMenemy
Recent research has identified inconsistency of public library digital services, and associated problems of disparity and duplication, as a key usability issue. The hypothesis of this research is that root cause is inconsistent definition and specification of digital services, and that a service taxonomy would facilitate resolution of this issue, providing a classification scheme and controlled vocabulary. Reporting on initial research to validate this hypothesis, which examined options available from 8 of 32 Scottish public library homepages; evidence of inconsistency of terminology and organisation schemes was found, with navigation not always straightforward due to a high number of loosely structured options being available from the majority of sites sampled. Initial findings are discussed including planned second stage research.
Keywords: digital services; usability; service taxonomy; public libraries
Multimodal Image Collection Visualization Using Non-negative Matrix Factorization BIBAFull-Text 429-432
  Jorge E. Camargo; Juan C. Caicedo; Fabio A. González
In this paper we address the problem of generating an image collection visualization in which images and text can be projected together. Given a collection of images with attached text annotations, we aim to find a common representation for both information sources to model latent correlations among the collection. Using the proposed latent representation, an image collection visualization is built, in which images and text can be projected simultaneously. The resulting image visualization allows to identify the relationships between images and text terms, allowing to understand the semantic structure of the collection.
A New Perspective on Collection Selection BIBAKFull-Text 433-436
  Helen Dodd; George Buchanan; Matt Jones
Collection selection is traditionally a sub-problem of meta-search, and identifies collections most likely to contain relevant documents. However, we propose to treat collection selection as an independent search task with the goal of identifying collections that are relevant as a whole; so the user may return to them to serve future (related) information needs. Using a new methodology and framework we evaluate the suitability of existing collection selection algorithms for this search task, compared with a new algorithm designed specifically for the task.
Keywords: Collection selection; database selection; collection ranking
Creating a Flexible Preservation Infrastructure for Electronic Records BIBAKFull-Text 437-440
  Karen Estlund; Heather Briston
As universities begin to address their first significant collections of electronic records, the needs of the collections often outstrip the resources and support available. This poster will illustrate the steps taken to transition and preserve a presidential electronic records collection into an university archives with limited systems support and preparation for future preservation needs. The infrastructure created was designed to quickly ingest at-risk records and allow for file migration and system evolution as future technologies are implemented.
Keywords: Digital Preservation; Digital Libraries; Preservation Planning; Institutional Archives; Migration
Matching Intellectual Works for Rights Management in the European Library BIBAKFull-Text 441-444
  Nuno Freire
This poster presents the work matching system implemented in The European Library for identifying different publications with the same underlying intellectual work. This work is contextualized in the rights management framework of project ARROW, where The European Library is the main source of bibliographic metadata as an aggregator of Europe's national library catalogues.
Keywords: copyright; entity matching; intellectual work; bibliographic metadata
Mopseus -- A Digital Library Management System Focused on Preservation BIBAKFull-Text 445-448
  Dimitris Gavrilis; Christos Papatheodorou; Panos Constantopoulos; Stavros Angelis
This paper presents Mopseus, a Fedora-commons based digital repository that focuses on preservation. An overview of the general architecture of the system is presented along with some more in-depth details of its semantic structures. Mopseus features dynamic RDF-based relations, a service for defining metadata schemas, a built-in RDBMS synchronization and indexing mechanism, a mechanism for migration from existing repositories and a built-in workflow engine.
Keywords: Digital libraries; repository; digital preservation
Link Proximity Analysis -- Clustering Websites by Examining Link Proximity BIBAKFull-Text 449-452
  Bela Gipp; Adriana Taylor; Jöran Beel
This research-in-progress paper presents a new approach called Link Proximity Analysis (LPA) for identifying related web pages based on link analysis. In contrast to current techniques, which ignore intra-page link analysis, the one put forth here examines the relative positioning of links to each other within websites. The approach uses the fact that a clear correlation between the proximity of links to each other and the subject-relatedness of the linked websites can be observed on nearly every web page. By statistically analyzing this relationship and measuring the amount of sentences, paragraphs, etc. between two links, related websites can be automatically, identified as a first study has proven.
Keywords: Web page; Website; clustering; Network Analysis; Link Analysis; Citation Proximity Analysis
SliDL: A Slide Digital Library Supporting Content Reuse in Presentations BIBAKFull-Text 453-456
  José Hilario Canós; María Isabel Marante; Manuel Llavador
Presentation building applications lack good support to slide reuse. In this paper, we introduce SliDL, a digital library that facilitates slide reuse by flattening the presentation-based structure of current systems and providing slide retrieval facilities. The service-oriented architecture of SliDL enables slide sharing between different applications. We have developed clients for Microsoft PowerPoint 2007 and OpenOffice.org Impress.
Keywords: Slide reuse; presentation management; Service-Oriented Architecture
Metadata Impact on Research Paper Similarity BIBAFull-Text 457-460
  Germán Hurtado Martín; Steven Schockaert; Chris Cornelis; Helga Naessens
While collaborative filtering and citation analysis have been well studied for research paper recommender systems, content-based approaches typically restrict themselves to straightforward application of the vector space model. However, various types of metadata containing potentially useful information are usually available as well. Our work explores several methods to exploit this information in combination with different similarity measures.
Exploring the Influence of Tagging Motivation on Tagging Behavior BIBAFull-Text 461-465
  Roman Kern; Christian Körner; Markus Strohmaier
The reasons why users tag have remained mostly elusive to quantitative investigations. In this paper, we distinguish between two types of motivation for tagging: While categorizers use tags mainly for categorizing resources for later browsing, describers use tags mainly for describing resources for later retrieval. To characterize users with regard to these different motivations, we introduce statistical measures and apply them to 7 different real-world tagging datasets. We show that while most taggers use tags for both categorizing and describing resources, different tagging systems lend themselves to different motivations for tagging. Additionally we show that the distinction between describers and categorizers can improve the performance of tag recommendation.
A Teaching Tool for Parasitology: Enhancing Learning with Annotation and Image Retrieval BIBAFull-Text 466-469
  Nádia P. Kozievitch; Ricardo da Silva Torres; Felipe S. P. Andrade; Uma Murthy; Edward A. Fox; Eric Hallerman
Parasitology is a basic course in life sciences curricula, but up to now it has few computer-assisted teaching tools. We present SuperIDR, a tool which supports annotation and search (based on a textual and a visual description) in the biodiversity domain. In addition, it provides a feature to aid comparison of morphological characteristics among different species. Preliminary results with two experiments show that students found the tool to be very useful, contributing to an alternative learning approach.
Framework for Logging and Exploiting the Information Retrieval Dialog BIBAFull-Text 470-473
  Paul Landwich; Claus-Peter Klas; Matthias Hemmje
In this paper we present first results for a new approach of an innovative user interface for digital library and information retrieval systems. The leading thought bases on the fact that only the dialog between user and system can establish a necessary information context in order to satisfy an information need. We introduce a framework for information retrieval systems to handle activities and sets elaborated during a search process and a prototype tool integrated in an existing interface framework. Finally a description of a user study and expert interviews and their evaluation results conducted on the basis of the tool is given.
Defining the Dynamicity and Diversity of Text Collections BIBAFull-Text 474-477
  Ilya Markov; Fabio Crestani
In Information Retrieval collections are often considered to be relatively dynamic or diverse, but no general definition has been given for these notions and no actual measure has been proposed to quantify them. We give intuitive definitions of the dynamicity and diversity properties of text collections and present measures for calculating them based on the notion of novelty. Experimental results show that the proposed measures are consistent with the definitions and can distinguish collections effectively according to their dynamicity and diversity properties.
Manuzio: A Model for Digital Annotated Text and Its Query/Programming Language BIBAFull-Text 478-481
  Marek Maurizio; Renzo Orsini
More and more large repositories of texts which must be automatically processed represent their content through the use of descriptive markup languages. This method has been diffused by the availability of widely adopted standards like SGML and, later, XML, which made possible the definition of specific formats for many kinds of text, from literary texts (TEI) to web pages (XHTML). The markup approach has, however, several noteworthy shortcomings. First, we can encode easily only texts with a strict hierarchical structure while text has often concurrent hierarchies. Then, extra-textual information, like metadata or annotations, can be tied only to the same structure of the text and must be expressed as strings of the markup language. Third, queries and programs for the retrieval and processing of text must be expressed in terms of languages like XQuery [4], in which every document is represented as a tree of nodes; for this reason, in documents where parallel, overlapping structures exists, the complexity of XQuery programs becomes significantly higher.
Effective Term Weighting for Sentence Retrieval BIBAFull-Text 482-485
  Saeedeh Momtazi; Matthew Lease; Dietrich Klakow
A well-known challenge of information retrieval is how to infer a user's underlying information need when the input query consists of only a few keywords. Question Answering (QA) systems face an equally important but opposite challenge: given a verbose question, how can the system infer the relative importance of terms in order to differentiate the core information need from supporting context? We investigate three simple term-weighting schemes for such estimation within the language modeling retrieval paradigm [6]. While the three schemes described are ad hoc, they address a principled estimation problem underlying the standard word unigram model. We also show these schemes enable better estimation of a state-of-the-art class model based on term clustering [5]. Using a TREC QA dataset, we evaluate the three weighting schemes for both word and class models on the QA subtask of sentence retrieval. Our inverse sentence frequency weighting scheme achieves over 5% absolute improvement in mean-average precision for the standard word model and nearly 2% absolute improvement for the class model.
User-Oriented Evaluation of Color Descriptors for Web Image Retrieval BIBAKFull-Text 486-489
  Otávio Augusto Bizetto Penatti; Ricardo da Silva Torres
This paper proposes a methodology for effectiveness evaluation in content-based image retrieval systems. The methodology is based on the opinion of real users. This paper also presents the results of using this methodology to evaluate color descriptors for Web image retrieval. The experiments were performed using a database containing more than 230 thousand heterogeneous images that represents the existing content on the Web.
Keywords: user evaluation; color descriptors; content-based image retrieval; web
A Topic-Specific Web Search System Focusing on Quality Pages BIBAKFull-Text 490-493
  Ari Pirkola; Tuomas Talvensaari
We describe a topic-specific Web search system focused on quality pages and argue that there is a need for such quality-based topic-specific search tools. The first implementation of the search system is available on the Web and it deals with climate change. The key idea is to crawl (using a focused crawling technique) in known trusted sites and in sites that are connected to them. We also discuss the further development of the system and our future research. Our project plan involves building a larger quality-based Web search system dealing with many globally significant topics (in addition to climate change).
Keywords: Digital libraries; Focused crawling; Vertical search engines; Web information retrieval
Reliable Preservation of Interactive Environments and Workflows BIBAFull-Text 494-497
  Klaus Rechert; Dirk von Suchodoletz; Randolph Welte; Felix Ruzzoli; Isgandar Valizada
The creation of most digital objects occurs solely in interactive graphical user interfaces which were available at a particular time period. Archiving and preservation organizations are posed with large amounts of such objects of various types. At some point they will need to automatically process these to make them available to their users or convert them to a commonly used format. We present methods and a system architecture for emulation services which enable the preservation of interactive environments and their workflows in a reliable manner. This system includes a framework for describing interactions with an interactive environment in an abstract manner, for supporting reliable playback in an automated way and finally for ensuring the preservation of specific operation knowledge by documenting and storing all components in a dedicated software archive.
Automated Country Name Disambiguation for Code Set Alignment BIBAFull-Text 498-501
  Gramm Richardson
Multiple standards and encodings for names of countries, as well as multiple renderings of the country names themselves cause problems for interoperability. This impacts both human and automated processing. This paper describes an automated method for aligning pairs of country code sets by examining the string similarity between the names of the countries in each set.
LIFE-SHARE Project: Developing a Digitisation Strategy Toolkit BIBAKFull-Text 502-505
  Beccy Shipman; Matthew Herring; Ned Potter; Bo Middleton
This poster will outline the Digitisation Strategy Toolkit created as part of the LIFE-SHARE project. The toolkit is based on the lifecycle model created by the LIFE project and explores the creation, acquisition, ingest, preservation (bit-stream and content) and access requirements for a digitisation strategy. This covers the policies and infrastructure required in libraries to establish successful practices. The toolkit also provides both internal and external resources to support the service. This poster will illustrate how the toolkit works effectively to support digitisation with examples from three case studies at the Universities of Leeds, Sheffield and York.
Keywords: digitisation; digital lifecycle; toolkit; strategies; libraries
Ensemble: A Distributed Portal for the Distributed Community of Computing Education BIBAKFull-Text 506-509
  Frank M., III Shipman; Lillian N. Cassel; Edward A. Fox; Richard Furuta; Lois M. L. Delcambre; Peter Brusilovsky; B. Stephen, II Carpenter; Gregory W. Hislop; Stephen H. Edwards; Daniel D. Garcia
NSF's NSDL is composed of domain-oriented pathways. Ensemble is the pathway for computing and supports the full range of computing education communities, providing a base for the development of programs that blend computing with other STEM areas (e.g., X-informatics and Computing + X), and producing digital library innovations that can be propagated to other NSDL pathways. Computing is a distributed community, including computer science, computer engineering, software engineering, information science, information systems, and information technology. Ensemble aims to provide much needed support for the many distinct yet overlapping educational programs in computing and their associated communities. To do this, Ensemble takes the form of a distributed portal providing access to the broad range of existing educational resources while preserving the collections and their associated curatorial processes. Ensemble encourages contribution, use, reuse, review, and evaluation of educational materials at multiple levels of granularity.
Keywords: Ensemble; distributed portal; distributed community
A New Focus on End Users: Eye-Tracking Analysis for Digital Libraries BIBAKFull-Text 510-513
  Jonathan Sykes; Milena Dobreva; Duncan Birrell; Emma McCulloch; Ian Ruthven; Yurdagül Ünal; Pierluigi Feliciati
Eye-tracking data was gathered as part of a user and functional evaluation of the Europeana v1.0 prototype, to determine which areas of the interface screen are most heavily used and which areas attract users' attention but are not effectively used in search. Outputs from eye-tracking data can offer insight into how advanced search functions can be made more intuitive for end users with differing interests and abilities, and can be used to inform continued interface development as digital libraries look to the future. Results led to recommendations for the future development of the Europeana digital library.
Keywords: digital libraries; eye-tracking; gaze plots; heat maps; user studies
Digital Library Educational Module Development Strategies and Sustainable Enhancement by the Community BIBAKFull-Text 514-517
  Seungwon Yang; Tarek Kanan; Edward A. Fox
The Digital Library Curriculum Development Project (http://curric.dlib.vt.edu) team has been developing educational modules and conducting field-tests internationally since January 2006. There had been three approaches for module development in the past. The first approach was that the project team members created draft modules (total of 9) and then those modules were reviewed by the experts in the field as well as by other members of the team. The second approach was that graduate student teams developed modules under the supervision of an instructor and the project team. Four members in each team collaborated for a single module. In total four modules were produced in this way. The last approach was that five graduate students developed a total of five modules, each module reviewed by two students. The completed modules were posted in Wikiversity.org for wider distribution and collaborative improvements by the community. The entire list of modules in the Digital Library Educational Framework also can be found in that location.
Keywords: digital libraries; curriculum; education; module development; development strategy; wiki


Approach to Cross-Language Retrieval for Japanese Traditional Fine Art: Ukiyo-e Database BIBAKFull-Text 518-521
  Biligsaikhan Batjargal; Fuminori Kimura; Akira Maeda
In this paper we introduce our system that retrieves Ukiyo-e databases using an English query by customizing and utilizing freely available open source software. In our system, the Ukiyo-e metadata elements were mapped to Dublin Core. We adopted a dictionary-based query translation approach and utilized the Greenstone Digital Library Software to make available our Ukiyo-e digital collections online. The preliminary result is an easy-to-use and useful system for users who do not understand Japanese, that allows to search and view Japanese Ukiyo-e databases in English.
Keywords: Ukiyo-e; Image database; Digital library; Cross-Language information retrieval
Open Source Historical OCR: The OCRopodium Project BIBAFull-Text 522-525
  Michael Bryant; Tobias Blanke; Mark Hedges; Richard Palmer
In this paper we present some initial results of OCRopodium project to build a scalable workflow for OCR of historical collections. Large-scale digitisation projects dealing with text-based historical material face challenges that are not well-catered-to by commercial software. Open source tools allow for better customisation to match these requirements, particularly with regard to character model training and per-project language modelling.
A Voice-Oriented Image Cataloguing Environment BIBAKFull-Text 526-529
  José Hilario Canós; Carlos J. Castillo; Pablo Muñoz; Héctor Valero; Manuel Llavador
VOICE is a tool for cataloguing digital images using a voice-based user interface. The goal of VOICE is to ease the introduction of descriptive metadata associated to single images or collections of pictures, so that the data entered can be used later for keyword-based image retrieval. We have developed two versions of the tool, standalone VOICE and VOICE4Picasa. The latter is and add-in to Picasa which calls the former without need to switch from one application to the other one. In our demonstration, we will show the features of both systems, adding metadata to pictures and using Picasa's retrieval features to find images in our collections.
Keywords: Image Cataloguing; Voice-based Interfaces; Speech Recognition
DMP Online: A Demonstration of the Digital Curation Centre's Web-Based Tool for Creating, Maintaining and Exporting Data Management Plans BIBAFull-Text 530-533
  Martin Donnelly; Sarah Jones; John W. Pattenden-Fail
Funding bodies increasingly require researchers to produce Data Management Plans (DMPs). The Digital Curation Centre (DCC) has created DMP Online, a web-based tool which draws upon an analysis of funders' requirements to enable researchers to create and export customisable DMPs, both at the grant application stage and during the project's lifetime.
DiLiA -- The Digital Library Assistant BIBAFull-Text 534-537
  Kathrin Eichler; Holmer Hemsen; Günter Neumann; Norbert Reithinger; Sven Schmeier; Kinga Schumacher; Inessa Seifert
In this paper we present the digital library assistant (DiLiA). The system aims at augmenting the search in digital libraries in several dimensions. In the project advanced information visualisation methods are developed for user controlled interactive search. The interaction model has been designed in a way that it is transparent to the user and easy to use. In addition, information extraction (IE) methods have been developed in DiLiA to make the content more easily accessible, this includes the identification and extraction of technical terms (TTs) -- single and multi word terms -- as well as the extraction of binary relations based on the extracted terms. In DiLiA we follow a hybrid information extraction approach -- a combination of metadata and document processing.
Xeproc©: A Model-Based Approach towards Document Process Preservation BIBAFull-Text 538-541
  Thierry Jacquin; Hervé Déjean; Jean-Pierre Chanod
Developed in the context of the EU Integrated Project SHAMAN, Xeproc© technology lets one define and design document processes while producing an abstract representation that is independent of the implementation. These representations capture the intent behind the workflow and can be preserved for reuse in future unknown infrastructures. Xeproc© is available under Eclipse Public Licence.
A Prototype Personalization System for the European Library Portal BIBAKFull-Text 542-545
  Marialena Kyriakidi; Lefteris Stamatogiannakis; Mei Li Triantafyllidi; Maria Vayanou; Yannis E. Ioannidis
In this demonstration, we present a flexible system that enables the provision of personalized functionalities to digital libraries. The system has been developed based on the needs of The European Library portal and will be demonstrated in that particular context, but could be applied more generally. It implements a broad set of data processing, analysis, and mining techniques over the portal's log files, using an environment called madIS. It enables on-line result contextualization and adaptation through the development of REST web services, which are responsible for retrieving and appropriately integrating the extracted information. The demonstration also features a web-based visualization tool for showing the output of the log analysis performed.
Keywords: Log mining; pattern extraction; profiling; personalization
Meta-Composer: Synthesizing Online FRBR Works from Library Resources BIBAFull-Text 546-549
  Michalis Sfakakis; Panagiotis Staikos; Sarantos Kapidakis
Next generation display and indexing of cataloging records are mainly influenced from the development of the FRBR conceptual model. While the process for collecting all relevant bibliographic records in a catalogue to an FRBR work entity has been extensively developed and tested in non interactive (offline) environment, the corresponding process has not been explored when meta-searching. This work presents the implementation and use of alternative clustering algorithms and similarity metrics for the composition of the FRBR work entities in the configurable meta-search engine meta-Composer. Moreover, it introduces a tool for the evaluation of the composition methods, which can be used either as complementary to the configuration process for the use of the best clustering methods to the searched catalogues or as a general testbed for the evaluation of the FRBR work entities composition process.
Digital Library in a 3D Virtual World: The Digital Bleek and Lloyd Collection in Second Life BIBAKFull-Text 550-553
  Rizmari Versfeld; Spencer J. Lee; Edward A. Fox; Hussein Suleman; Kyle Williams
This research explores and demonstrates the process of setting up a 3D representation of a typical web-based digital library called 'The Digital Bleek and Lloyd collection (lloydbleekcollection.cs.uct.ac.za)' in the popular 3D virtual world, 'Second Life'. The processes of building, scripting, and evaluation of the 3D exhibit are discussed. The report concludes that SL is a good platform for this kind of cultural representation. At a university level it could be used to showcase and share researchers' work.
Keywords: Second Life; virtual worlds; 3D; Digital Libraries; Bleek and Lloyd; Bushman heritage