GIR Tables of Contents: 10

Proceedings of the 2010 Workshop on Geographic Information Retrieval

Fullname:Proceedings of the 6th Workshop on Geographic Information Retrieval
Editors:Ross Purves; Paul Clough; Chris Jones
Location:Zurich, Switzerland
Dates:2010-Feb-18 to 2010-Feb-19
Standard No:ISBN: 1-60558-826-1, 978-1-60558-826-1; ACM DL: Table of Contents hcibib: GIR10
  1. Ontologies and natural language
  2. Georeferencing
  3. Geographic queries
  4. Toponym detection and vernacular names
  5. Scope and time
  6. Annotation, relevance ranking and evaluation

Ontologies and natural language

Linkable geographic ontologies BIBAKFull-Text 1
  Francisco J. Lopez-Pellicer; Mário J. Silva; Marcirio Chaves
The performance of some tasks in Information Retrieval is strongly related to the extent and quality of the geographic knowledge about named places. This paper presents a conceptualization of the geographic knowledge, the Geo-Net vocabulary, and a tool for building large knowledge bases of named places, the GKB management system, developed in the GREASE-II project. The Geo-Net vocabulary is a conceptual model for describing geographic places, including their names, types, relationships and footprints. It uses URIs and the RDF data model to expose, share and connect pieces of geographic knowledge each other and to related data on the Web. The GKB system is a multi-paradigm knowledge management system that enables the development of geographic ontologies with the Geo-Net vocabulary. This paper also presents a geographic ontology of Portugal, Geo-Net-PT 02, created with the Geo-Net vocabulary and the GKB system.
Keywords: geo-ontologies, geographic information retrieval, geographic knowledge base, linked data
Towards mapping of alpine route descriptions BIBAKFull-Text 2
  Michael Piotrowski; Samuel Läubli; Martin Volk
We describe a corpus of historic mountaineering accounts and on-going work on geocoding toponyms and route descriptions in these accounts. Mountaineering accounts contain a wealth of geographic information but its extraction for purposes of geographic information retrieval poses specific challenges, in particular the distinction between toponyms pertinent to route descriptions and those mentioned in descriptions of panoramas. We describe some preliminary considerations for natural language cues to distinguish between these two types of occurrences.
Keywords: cultural heritage data, geographic information retrieval, mountaineering accounts, route extraction, toponym resolution
Unnamed locations, underspecified regions, and other linguistic phenomena in geographic annotation of water-based locations BIBAKFull-Text 3
  Johannes Leveling
This short paper investigates how locations in or close to water masses in topics and documents (e.g. rivers, seas, oceans) are referred to. For this study, 13 topics from the GeoCLEF topics 2005-2008 aiming at documents on rivers, oceans, or sea names were selected and the corresponding relevant documents retrieved and manually annotated.
   Results of the geographic annotation indicate that i) topics aiming at locations close to water contain a wide variety of spatial relations (indicated by different prepositions), ii) unnamed locations can be generated on-the-fly by referring to movable objects (e.g. ships, planes) travelling along a path, iii) underspecified regions are referenced by proximity or distance or directional relations. In addition, several generic expressions (e.g. "in international waters") are frequently used, but refer to different underspecified regions.
Keywords: GIR, annotation, toponyms
An ontology of place and service types to facilitate place-affordance geographic information retrieval BIBAKFull-Text 4
  Ahmed N. Alazzawi; Alia I. Abdelmoty; Christopher B. Jones
In order to facilitate place-affordance queries on the Web, this work proposes the employment of an ontology of place and service types. While other works defined place-affordance by associating a place with its physical objects, the conceptual view of a place-affordance in this work is based on associating a place type with its typical service types, which is reflected in the ontology construction methodology. Preliminary results, as well as an overview of the current work, are briefly introduced.
Keywords: place ontology, place-affordance, semantic web


Towards automated georeferencing of Flickr photos BIBAKFull-Text 5
  Olivier Van Laere; Steven Schockaert; Bart Dhoedt
We explore the task of automatically assigning geographic coordinates to photos on Flickr. Using an approach based on k-medoids clustering and Naive Bayes classification, we demonstrate that the task is feasible, although high accuracy can only be expected for a portion of all photos. Based on this observation, we stress the importance of adaptive approaches that estimate locations at different granularities for different photos.
Keywords: georeferencing, naive Bayes classification, web 2.0
Geotagging: using proximity, sibling, and prominence clues to understand comma groups BIBAKFull-Text 6
  Michael D. Lieberman; Hanan Samet; Jagan Sankaranayananan
Geotagging is the process of recognizing textual references to geographic locations, known as toponyms, and resolving these references by assigning each lat/long values. Typical geotagging algorithms use a variety of heuristic evidence to select the correct interpretation for each toponym. A study is presented of one such heuristic which aids in recognizing and resolving lists of toponyms, referred to as comma groups. Comma groups of toponyms are recognized and resolved by inferring the common threads that bind them together, based on the toponyms' shared geographic attributes. Three such common threads are proposed and studied -- population-based prominence, distance-based proximity, and sibling relationships in a geographic hierarchy -- and examples of each are noted. In addition, measurements are made of these comma groups' usage and variety in a large dataset of news articles, indicating that the proposed heuristics, and in particular the proximity and sibling heuristics, are useful for resolving comma group toponyms.
Keywords: comma groups, geotagging, toponyms
Evaluation of georeferencing BIBAKFull-Text 7
  Richard Tobin; Claire Grover; Kate Byrne; James Reid; Jo Walsh
In this paper we describe a georeferencing system which first uses Information Extraction techniques to identify place names in textual documents and which then resolves the place names against a choice of gazetteers. We have used the system to georeference three digitised historical collections and have evaluated its performance against human annotated gold standard samples from the three collections. We have also evaluated its performance on the SpatialML corpus which is a geo-annotated corpus of newspaper text. The main focus of this paper is the evaluation of georesolution and we discuss evaluation methods and issues arising from the evaluation.
Keywords: evaluation, georeferencing, named entity recognition, toponym resolution

Geographic queries

A GIR architecture with semantic-flavored query reformulation BIBAKFull-Text 8
  Nuno Cardoso; Mário J. Silva
Most geographic queries include references to entities (geographic and non-geographic). Grounding such entities is essential to properly understand the user's information need. As statistical-based query reformulation strategies work at term level, not entity level, they don't use the semantic information given by such entities, which is considerably relevant for the types of queries that should be handled by GIR systems. We motivate the need of a semantic-flavored query reformulation approach for geographic information retrieval systems and describe a GIR architecture where query reformulation focuses on i) grounding entities in the query, ii) selecting a reasoning strategy according to the user information need, and iii) generating a reformulated query containing answers and related entities for a more focused retrieval step. Reformulated queries obtain the answers by accessing a knowledge base.
Keywords: evaluation, geographic ontology, geographical information retrieval, information management, query reformulation
OGC catalog service for heterogeneous earth observation metadata using extensible search indices BIBAKFull-Text 9
  Isao Kojima; Masahiro Kimoto; Akiyoshi Matono
In this paper, we propose an extensible information retrieval system based on data typed indices. The indices are constructed for various data types and are customized and extensible. Based on this system, we have implemented a catalog service of earth observation metadata. Using this system, it is possible to search through a large amount of metadata with heterogeneous schema. Fast response time is also achieved regardless of the number of individual query results.
Keywords: earth observation, heterogeneous data integration, information retrieval, open geospatial consortium, search engines
TWinner: understanding news queries with geo-content using Twitter BIBAKFull-Text 10
  Satyen Abrol; Latifur Khan
In the present world scenario, where the search engines wars are becoming fiercer than ever, it becomes necessary for each search engine to realize the intent of the user query to be able to provide him with more relevant search results. Amongst the various categories of search queries, a major portion is constituted by those having news intent. Seeing the tremendous growth of social media users, the spatial-temporal nature of the media can prove to be a very useful tool to improve the search quality. In our work we examine the development of such a tool that combines social media in improving the quality of web search and predicting whether the user is looking for news or not. We go one step beyond the previous research by mining Twitter messages, assigning weights to them and determining keywords that can be added to the search query to act as pointers to the existing search engine algorithms suggesting to it that the user is looking for news. We conduct a series of experiments and show the impact that TWinner has on the results.
Keywords: geographic information retrieval, news queries, search engines
Getting context on the go: mobile urban exploration with ambient tag clouds BIBAKFull-Text 11
  Matthias Baldauf; Rainer Simon
Tags clouds are a well-established concept for organizing and visualizing large amounts of user-generated content annotated with keywords. Applied on mobile devices, so-called 'ambient tag clouds' which are based on surrounding georeferenced and tagged resources may act as compact location descriptors. This paper presents our on-going work towards more expressive ambient tag clouds. By analyzing locative textual Web content, such representations summarizing available background information can be generated without explicitly assigned tags. Thus, these ambient tag clouds enable the mobile exploration of a place's semantic beyond visible objects and common points-of-interest.
Keywords: location-based service, tag cloud, user-generated content

Toponym detection and vernacular names

Geographical classification of documents using evidence from Wikipedia BIBAKFull-Text 12
  Rafael Odon de Alencar; Clodoveu Augusto, Jr. Davis; Marcos André Gonçalves
Obtaining or approximating a geographic location for search results often motivates users to include place names and other geography-related terms in their queries. Previous work shows that queries that include geography-related terms correspond to a significant share of the users' demand. Therefore, it is important to recognize the association of documents to places in order to adequately respond to such queries. This paper describes strategies for text classification into geography-related categories, using evidence extracted from Wikipedia. We use terms that correspond to entry titles and the connections between entries in Wikipedia's graph to establish a semantic network from which classification features are generated. Results of experiments using a news data-set, classified over Brazilian states, show that such terms constitute valid evidence for the geographical classification of documents, and demonstrate the potential of this technique for text classification.
Keywords: geographic information retrieval, geospatial evidence, text classification
Images and perceptions of neighbourhood extents BIBAKFull-Text 13
  Paul Clough; Robert Pasley
In this paper, we describe an experiment in which we use an online questionnaire to elicit people's perception of the extents of smaller vague regions, such as neighbourhoods. Our approach uses images of street scenes rather than landmarks or placenames.
Keywords: questionnaire, user study, vague regions
A web platform for the evaluation of vernacular place names in automatically constructed gazetteers BIBAKFull-Text 14
  Florian A. Twaroch; Christopher B. Jones
Vernacular place names pose a research challenge in geographic information retrieval. There is a long standing demand from investigators for a reference collection to train their methods and evaluate their models and data. However no large collection of informal place names associated with type and footprint data is currently available to the GIR community. The present contribution discusses the implementation of a web platform to collect such an evaluation data set. Design considerations of the user interface are addressed and we present first results of a nationwide attempt to collect the vernacular place names of Great Britain. Our result will aid further research in automatic gazetteer construction, considering vernacular place names.
Keywords: evaluation, gazetteer services, vernacular place names
Grounding toponyms in an Italian local news corpus BIBAKFull-Text 15
  Davide Buscaldi; Bernardo Magnini
In this paper we present a study carried out over toponyms contained in an Italian news collection, in order to determine the degree of ambiguity of toponyms and how difficult could be to resolve such ambiguities. The results show that frequent toponyms are usually less ambiguous than rare toponyms. The resolution of ambiguities on a sample of 1,042 toponyms with different features confirms that ambiguous toponyms are spatially autocorrelated.
Keywords: geographic information retrieval, toponym resolution

Scope and time

Extraction and exploration of spatio-temporal information in documents BIBAKFull-Text 16
  Jannik Strötgen; Michael Gertz; Pavel Popov
In the past couple of years, there have been significant advances in the areas of temporal information retrieval (TIR) and geographic information retrieval (GIR), each focusing on extracting and utilizing temporal and geographic information, respectively, from documents for search and exploration tasks. Interestingly, there is only little work that combines models, techniques and applications from these two areas to support scenarios and applications where temporal and geographic information in combination provide interesting meaningful nuggets in document exploration tasks, such as visualizing a chronological sequence of events with their locations.
   In this paper, we present an approach that combines the two areas of TIR and GIR. Using temporal and geographic information extracted from documents and recorded in temporal and geographic document profiles, we show how co-occurrences of such information are determined and spatio-temporal document profiles are computed. Such profiles then provide the basis for a variety of document search and exploration tasks, such as visualizing the sequences of events on a map. We present a prototypical implementation of our system and demonstrate the effectiveness of combining GIR and TIR in the context of document exploration tasks.
Keywords: UIMA, information retrieval, spatial data, temporal data, text mining
Leveraging back-of-the-book indices to enable spatial browsing of a historical document collection BIBAKFull-Text 17
  Michael Piotrowski
We describe ongoing work on detecting toponyms in back-of-the-book indices to geocode historical documents not available in full text; the goal is specifically to provide spatial browsing for the Collection of Swiss Law Sources. We discuss some of the peculiarities of handcrafted indices and approaches for coping with them.
Keywords: cultural heritage data, law sources, spatial browsing, toponym resolution
Using the geographic scopes of web documents for contextual advertising BIBAKFull-Text 18
  Ivo Anastácio; Bruno Martins; Pável Calado
Geotargeting is a specialization of contextual advertising where the objective is to target ads to Website visitors concentrated in well-defined areas. Current approaches involve targeting ads based on the physical location of the visitors, estimated through their IP addresses. However, there are many situations where it would be more interesting to target ads based on the geographic scope of the target pages, i.e., on the general area implied by the locations mentioned in the textual contents of the pages. Our proposal applies techniques from the area of geographic information retrieval to the problem of geotargeting. We address the task through a pipeline of processing stages, which involves (i) determining the geographic scope of target pages, (ii) classifying target pages according to locational relevance, and (iii) retrieving ads relevant to the target page, using both textual contents and geographic scopes. Experimental results attest for the adequacy of the proposed methods in each of the individual processing stages.
Keywords: contextual advertisement, geographic information retrieval, geographic text mining, geotargeting
Geographic signatures for semantic retrieval BIBAKFull-Text 19
  David S. Batista; Mário J. Silva; Francisco M. Couto; Bibek Behera
Annotation, relevance ranking and evaluation

Annotating data to support decision-making: a case study BIBAKFull-Text 20
  Carla Geovana N. Macário; Jefersson A. dos Santos; Claudia Bauzer Medeiros; Ricardo da S. Torres
Georeferenced data are a key factor in many decision-making systems. However, their interpretation is user and context dependent so that, for each situation, data analysts have to interpret them, a time-consuming task. One approach to alleviate this task, is the use of semantic annotations to store the produced information. Annotating data is however hard to perform and prone to errors, especially when executed manually. This difficulty increases with the amount of data to annotate. Moreover, annotation requires multi-disciplinary collaboration of researchers, with access to heterogeneous and distributed data sources and scientific computations. This paper illustrates our solution to approach this problem by means of a case study in agriculture. It shows how our implementation of a framework to automate the annotation of geospatial data can be used to process real data from remote sensing images and other official Brazilian data sources.
Keywords: geospatial data, geospatial standards, remote sensing image classification, semantic annotation
Learning to rank for geographic information retrieval BIBAKFull-Text 21
  Bruno Martins; Pável Calado
The task of Learning to Rank is currently getting increasing attention, providing a sound methodology for combining different sources of evidence. The goal is to design and apply machine learning methods to automatically learn a function from training data that can sort documents according to their relevance. Geographic information retrieval has also emerged as an active and growing research area, addressing the retrieval of textual documents according to geographic criteria of relevance. In this paper, we explore the usage of a learning to rank approach for geographic information retrieval, leveraging on the datasets made available in the context of the previous GeoCLEF evaluation campaigns. The idea is to combine different metrics of textual and geographic similarity into a single ranking function, through the use of the SV Mmap framework. Experimental results show that the proposed approach can outperform baselines based on heuristic combinations of features.
Keywords: geographic information retrieval, learning to rank
Spatial diversity, do users appreciate it? BIBAKFull-Text 22
  Jiayu Tang; Mark Sanderson
Spatial diversity is a relatively new branch of research in the context of spatial information retrieval. It tries to answer user's query with results that are not only relevant but also spatially diversified so that they are from many different locations. Although the assumption that spatially diversified results may meet users' needs better seems reasonable, there has been little hard evidence in the literature indicating so. In this paper, we will show our follow-up work on the novel approach to investigating user preference on spatial diversity by using Amazon Mechanical Turk.
Keywords: Amazon Mechanical Turk, spatial diversity, user study
A probabilistic model of geographic relevance BIBAKFull-Text 23
  Stefano De Sabbata; Tumasch Reichenbacher
In this paper, we present a new model for the assessment of Geographic Relevance. This model is drawn from Okapi BM25, thus it takes into account not only a score for each dimension of relevance but also the distribution of these scores within the collection. Preliminary results suggest that the relevance estimation of top-ranked objects is more sensitive to small changes in the user context.
Keywords: GRBM25, Okapi BM25, geographic relevance
How geographic was GikiCLEF?: a GIR-critical review BIBAKFull-Text 24
  Diana Santos; Nuno Cardoso; Luís Miguel Cabral
In this paper we draw a balance of GikiCLEF as far as its appropriateness for the evaluation of GIR systems is concerned. We measure its degree of dealing with geographic matter, and offer GIRA, the final resource, for GIR evaluation purposes.
Keywords: Wikipedia, crosslinguality, evaluation, geographical IR, multilinguality, question answering