[1]
Test-driven evaluation of linked data quality
Semantic web 2
/
Kontokostas, Dimitris
/
Westphal, Patrick
/
Auer, Sören
/
Hellmann, Sebastian
/
Lehmann, Jens
/
Cornelissen, Roland
/
Zaveri, Amrapali
Proceedings of the 2014 International Conference on the World Wide Web
2014-04-07
v.1
p.747-758
© Copyright 2014 ACM
Summary: Linked Open Data (LOD) comprises an unprecedented volume of structured data
on the Web. However, these datasets are of varying quality ranging from
extensively curated datasets to crowdsourced or extracted data of often
relatively low quality. We present a methodology for test-driven quality
assessment of Linked Data, which is inspired by test-driven software
development. We argue that vocabularies, ontologies and knowledge bases should
be accompanied by a number of test cases, which help to ensure a basic level of
quality. We present a methodology for assessing the quality of linked data
resources, based on a formalization of bad smells and data quality problems.
Our formalization employs SPARQL query templates, which are instantiated into
concrete quality test case queries. Based on an extensive survey, we compile a
comprehensive library of data quality test case patterns. We perform automatic
test case instantiation based on schema constraints or semi-automatically
enriched schemata and allow the user to generate specific test case
instantiations that are applicable to a schema or dataset. We provide an
extensive evaluation of five LOD datasets, manual test case instantiation for
five schemas and automatic test case instantiations for all available schemata
registered with Linked Open Vocabularies (LOV). One of the main advantages of
our approach is that domain specific semantics can be encoded in the data
quality test cases, thus being able to discover data quality problems beyond
conventional quality heuristics.
[2]
Databugger: a test-driven framework for debugging the web of data
WWW 2014 demonstrations
/
Kontokostas, Dimitris
/
Westphal, Patrick
/
Auer, Sören
/
Hellmann, Sebastian
/
Lehmann, Jens
/
Cornelissen, Roland
Companion Proceedings of the 2014 International Conference on the World Wide
Web
2014-04-07
v.2
p.115-118
© Copyright 2014 ACM
Summary: Linked Open Data (LOD) comprises of an unprecedented volume of structured
data on the Web. However, these datasets are of varying quality ranging from
extensively curated datasets to crowd-sourced or extracted data of often
relatively low quality. We present Databugger, a framework for test-driven
quality assessment of Linked Data, which is inspired by test-driven software
development. Databugger ensures a basic level of quality by accompanying
vocabularies, ontologies and knowledge bases with a number of test cases. The
formalization behind the tool employs SPARQL query templates, which are
instantiated into concrete quality test queries. The test queries can be
instantiated automatically based on a vocabulary or manually based on the data
semantics. One of the main advantages of our approach is that domain specific
semantics can be encoded in the data quality test cases, thus being able to
discover data quality problems beyond conventional quality heuristics.
[3]
EDITED BOOK
Search Computing: Broadening Web Search
Lecture Notes in Computer Science 7538
/
Ceri, Stefano
/
Brambilla, Marco
2012
n.16
p.254
Springer Berlin Heidelberg
DOI: 10.1007/978-3-642-34213-4
== Extraction and Integration ==
Web Data Reconciliation: Models and Experiences (1-15)
+ Blanco, Lorenzo
+ Crescenzi, Valter
+ Merialdo, Paolo
+ Papotti, Paolo
A Domain Independent Framework for Extracting Linked Semantic Data from Tables (16-33)
+ Mulwad, Varish
+ Finin, Tim
+ Joshi, Anupam
Knowledge Extraction from Structured Sources (34-52)
+ Unbehauen, Jörg
+ Hellmann, Sebastian
+ Auer, Sören
+ Stadler, Claus
Extracting Information from Google Fusion Tables (53-67)
+ Brambilla, Marco
+ Ceri, Stefano
+ Cinefra, Nicola
+ Sarma, Anish Das
+ Forghieri, Fabio
+ et al
Materialization of Web Data Sources (68-81)
+ Bozzon, Alessandro
+ Ceri, Stefano
+ Zagorac, Srdan
== Query and Visualization Paradigms ==
Natural Language Interfaces to Data Services (82-97)
+ Guerrisi, Vincenzo
+ Torre, Pietro La
+ Quarteroni, Silvia
Mobile Multi-domain Search over Structured Web Data (98-110)
+ Aral, Atakan
+ Akin, Ilker Zafer
+ Brambilla, Marco
Clustering and Labeling of Multi-dimensional Mixed Structured Data (111-126)
+ Brambilla, Marco
+ Zanoni, Massimiliano
Visualizing Search Results: Engineering Visual Patterns Development for the Web (127-142)
+ Morales-Chaparro, Rober
+ Preciado, Juan Carlos
+ Sánchez-Figueroa, Fernando
== Exploring Linked Data ==
Extending SPARQL Algebra to Support Efficient Evaluation of Top-K SPARQL Queries (143-156)
+ Bozzon, Alessandro
+ Valle, Emanuele Della
+ Magliacane, Sara
Thematic Clustering and Exploration of Linked Data (157-175)
+ Castano, Silvana
+ Ferrara, Alfio
+ Montanelli, Stefano
Support for Reusable Explorations of Linked Data in the Semantic Web (176-190)
+ Cohen, Marcelo
+ Schwabe, Daniel
== Games, Social Search and Economics ==
A Survey on Proximity Measures for Social Networks (191-206)
+ Cohen, Sara
+ Kimelfeld, Benny
+ Koutrika, Georgia
Extending Search to Crowds: A Model-Driven Approach (207-222)
+ Bozzon, Alessandro
+ Brambilla, Marco
+ Ceri, Stefano
+ Mauri, Andrea
BetterRelations: Collecting Association Strengths for Linked Data Triples with a Game (223-239)
+ Hees, Jörn
+ Roth-Berghofer, Thomas
+ Biedert, Ralf
+ Adrian, Benjamin
+ Dengel, Andreas
An Incentive-Compatible Revenue-Sharing Mechanism for the Economic Sustainability of Multi-domain Search Based on Advertising (240-254)
+ Brambilla, Marco
+ Ceppi, Sofia
+ Gatti, Nicola
+ Gerding, Enrico H.
[4]
Triplify: light-weight linked data publication from relational databases
Semantic/data web/session: linked data
/
Auer, Sören
/
Dietzold, Sebastian
/
Lehmann, Jens
/
Hellmann, Sebastian
/
Aumueller, David
Proceedings of the 2009 International Conference on the World Wide Web
2009-04-20
p.621-630
Keywords: data web, databases, geo data, linked data, rdf, semantic web, sql, web
application
© Copyright 2009 International World Wide Web Conference Committee (IW3C2)
Summary: In this paper we present Triplify -- a simplistic but effective approach to
publish Linked Data from relational databases. Triplify is based on mapping
HTTP-URI requests onto relational database queries. Triplify transforms the
resulting relations into RDF statements and publishes the data on the Web in
various RDF serializations, in particular as Linked Data. The rationale for
developing Triplify is that the largest part of information on the Web is
already stored in structured form, often as data contained in relational
databases, but usually published by Web applications only as HTML mixing
structure, layout and content. In order to reveal the pure structured
information behind the current Web, we have implemented Triplify as a
light-weight software component, which can be easily integrated into and
deployed by the numerous, widely installed Web applications. Our approach
includes a method for publishing update logs to enable incremental crawling of
linked data sources. Triplify is complemented by a library of configurations
for common relational schemata and a REST-enabled data source registry.
Triplify configurations containing mappings are provided for many popular Web
applications, including osCommerce, WordPress, Drupal, Gallery, and phpBB. We
will show that despite its light-weight architecture Triplify is usable to
publish very large datasets, such as 160GB of geo data from the OpenStreetMap
project.