HCI Bibliography Home | HCI Conferences | ESAIR Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
ESAIR Tables of Contents: 14

Proceedings of the 2014 International Workshop on Exploiting Semantic Annotations in Information Retrieval

Fullname:Proceedings of the 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval
Editors:Omar Alonso; Jaap Kamps; Jussi Karlgren
Location:Shanghai, China
Standard No:ISBN: 978-1-4503-1365-0; ACM DL: Table of Contents; hcibib: ESAIR14
Links:Workshop Website | Conference Website
  1. Keynote Address
  2. Boaster Session

Keynote Address

Semantic Search at Yahoo! BIBAFull-Text 1
  Peter Mika
Semantic search refers to a broad array of methods that aim to improve retrieval by interpreting queries beyond the traditional weighted bag of words model of document retrieval. In this talk, we will focus on the subset of these methods that rely on explicit semantic annotations, i.e. linking queries and text to items in a Knowledge Base. We will discuss techniques of entity linking on queries and documents, and the potential impact of these methods on improving performance on the classical ad-hoc document retrieval task. We will also discuss some novel tasks, including entity retrieval and related entity recommendations, and their implementation in Yahoo Search. We will close by considering some of the challenges that are specific to developing search services in a mobile context.
Linking to Web Knowledge Bases and Applications to Web Search BIBAFull-Text 3
  Silviu Petru Cucerzan
The development and availability of Web knowledge repositories, in particular Wikipedia, as the largest, inter-linked, and up-to-the-minute encyclopedic collection, have changed remarkably not only the way in which people fulfill their informational needs on the Web but also the way in which information can be organized and provided by Web search engines. The talk will focus on the task of entity extraction and linking to Wikipedia and other Web knowledge repositories. It will also cover work that employs such entity repositories in conjunction with query logs of commercial Web search engines to address Web information retrieval tasks such as context-aware search, query suggestion, question answering, retrieval of support for factual statements, and automatic aggregation of topic pages as an alternative to the ten blue links.

Boaster Session

Documents Search Using Semantics Criteria BIBAFull-Text 5-7
  Santiago Cotelo; Alejandro Makowski; Luis Chiruzzo; Dina Wonsever
Current Information Retrieval systems generally search documents using a keywords model, which is often not expressive enough for the user. In this paper we describe some directions for improving an Information Retrieval system by letting the user specify different semantics constraints in her query, using a language based on a simplified version of first-order logic. The user can write queries that express the association between objects and attributes, temporal constraints and negation of attributes, and also perform synonyms expansion of queries. In order to evaluate the relevance of a candidate document with respect to the query, the dependency parse tree of the document is used, as well as other linguistic resources. The system was evaluated using a set of queries and a corpus extracted from the British newspaper The Times. The results are compared against the newspaper's own search engine and they look promising, showing an important improvement in precision in the first documents returned by the query.
Towards Named-Entity-based Similarity Measures: Challenges and Opportunities BIBAFull-Text 9-11
  Tom De Nies; Christian Beecks; Wesley De Neve; Thomas Seidl; Erik Mannens; Rik Van de Walle
In this paper, we investigate challenges related to the adaptation of similarity measures used in the field of Information Retrieval to work with semantic features, i.e. Named Entities. The challenges to consider are numerous, including the accuracy of the annotation process, the adapted similarity measures, the quality of the Linked Data referred to, and the efficient access to the Semantic Web. We discuss each challenge in detail, as well as possible ways to tackle them.
Can Corpus Similarity-Based Self-Annotation Assist Information Retrieval? BIBAFull-Text 13-15
  Vinay Deolalikar
The use of external means to annotate corpora to assist in retrieval is gaining research attention. These external means include user provided annotations, use of linkages between documents (eg. the web), use of knowledge bases such as wikipedia, etc. However, for corpora that are internally generated within an enterprise, most of these methods are not readily applicable.
   In this work, we ask the question whether we can extract important terms from the corpora (without recourse to any external assistance), and then use these terms as a form of "self-annotation" to assist retrieval?
   Our approach is as follows: we extract "important terms" from the corpora using clustering and information-theoretic means. We then use these terms as "self-annotation" of the corpus. Specifically, we augment BM25 with two scores that measure the information-gain of important terms with respect to their corresponding cluster. The first score uses all the important terms in the cluster, regardless of whether they appear in the query or not. The second score uses only those terms that appear in the query.
   We experiment extensively with convex combinations of these two scores with BM25 on TREC corpora. To reflect other literature in clustering in IR, we benchmark both static clustering and query-specific clustering. Our results establish a clear pattern on the effect of including information gain upon precision of retrieved lists. Our conclusions are that while neither information-gain based score improves the retrieval precision over baseline BM25, the precision is better when all terms (not just query terms) are considered.
AIDA-Social: Entity Linking on the Social Stream BIBAFull-Text 17-19
  Yusra Ibrahim; Mohamed Amir Yosef; Gerhard Weikum
Named Entity Linking (NEL) in microblogs is a challenging task due to the use of cryptic abbreviations, insufficient contextual information, and the time-varying importance of entities. We propose three techniques to target these challenges: Mention Normalization, Contextual Enrichment, and Temporal Entity Importance. By combining these novel techniques, we achieve 13% improvement in precision over a state-of-the-art NEL tool.
A Probabilistic Concept Annotation for IT Service Desk Tickets BIBAFull-Text 21-23
  Ea-Ee Jan; Kuan-Yu Chen; Tsuyoshi Ide
Ticket annotation and search has become an important research subject in the IT service desk delivery. Millions of tickets are created yearly to address business users' IT related problems. In IT service desk management, it is critical to first capture the pain points for a group of tickets to determine root cause; secondly, to obtain the respective distributions in order to layout the priority of addressing these pain points. An advanced ticket analytics system utilizes a combination of topic modeling and clustering to address the above issues and the integration of these features into information architecture will allow for a wider distribution of this technology and progress to a remarkable financial impact for IT industry. Topic modeling has been used to extract topics from given documents; each topic is represented by unigram distributions. However, it is not clear how to interpret the results. Due to the inadequacy to render top concepts, in this paper, we propose a probabilistic framework, which integrates topic models, POS tags, query expansion and so on, for the practical challenge. The rigorously empirical experiments demonstrate the consistent and utility performance of the proposed method on real datasets.
Semantic Annotation with RescoredESA: Rescoring Concept Features Generated From Explicit Semantic Analysis BIBAFull-Text 25-27
  Zhuoren Jiang; Miao Chen; Xiaozhong Liu
Concepts have been used extensively in semantic annotating. Explicit Semantic Analysis (ESA) is a concept feature generator, which represents text by a concept-level vector, such as a vector of Wikipedia concepts. It is also considered a human-friendly way to annotate text -- it generates concept vector that can be easily interpreted by human. We propose an approach, RescoredESA, based on ESA, according to aspects upon which ESA can enhance: 1) sometimes the output vectors do not assign high scores to concepts relevant to the text; 2) it considers words in the text when representing the text to concept-level vector while not considering the concepts explicitly occurring in the text, which can be an important source for assigning scores to ESA vector dimensions. We evaluate it against the 20 newsgroup classification task, and the result shows a slight enhancement when combining vectors from RescoredESA and bag-of-words.
Using Semantic Role Labeling to Predict Answer Types BIBAFull-Text 29-31
  Zuyao Li; Peter Exner; Pierre Nugues
Most question answering systems feature a step to predict an expected answer type given a question. Li and Roth \cite{li2002learning} proposed an oft-cited taxonomy to the categorize the answer types as well as an annotated data set. While offering a framework compatible with supervised learning, this method builds on a fixed and rigid model that has to be updated when the question-answering domain changes. More recently, Pinchak and Lin \cite{pinchak2006} designed a dynamic method using a syntactic model of the answers that proved more versatile. They used syntactic dependencies to model the question context and evaluated the performance on an English corpus. However, syntactic properties may vary across languages and techniques applicable to English may fail with other languages. In this paper, we present a method for constructing a probability-based answer type model for each different question. We adapted and reproduced the original experiment of Pinchak and Lin \cite{pinchak2006} on a Chinese corpus and we extended their model to semantic dependencies. Our model evaluates the probability that a candidate answer fits into the semantic context of a given question. We carried out an evaluation on a set of questions either drawn from NTCIR corpus \cite{ntcir2005} or that we created manually.
Leverage the Associations between Documents, Subject Headings and Terms to Enhance Retrieval BIBAFull-Text 33-35
  Jin Mao; Kun Lu
Literatures in medical domain are often annotated with subject headings by professionals to help information seeking via manifesting the subjects of documents, where subject headings serve as the pivot language between documents and users. Current information retrieval methods using subject headings have not fully exploited the potential of subject headings yet. Both positive and negative results have been reported. In this paper, we explored the three-layer structure of documents annotated with subject headings, including document layer, concept layer (i.e. subject headings) and term layer, and then we proposed a concept-enhanced relevance model. The document-concept associations are mined to generate conceptual representations for documents and the concept-term associations are quantified and used to represent concepts as language models. By embedding these associations, subject headings are applied to enrich the document models in the estimation process of relevance model. The experiments carried out on two medical collections showed the improvements of our model by comparing with three state-of-the-art baselines. Therefore, if exploited appropriately, such manually curated annotations as subject headings can become an effective tool to enhance information retrieval.
Bringing Head Closer to the Tail with Entity Linking BIBAFull-Text 37-39
  Manisha Verma; Diego Ceccarelli
With the creation and rapid development of knowledge bases, it has become easier to understand the underlying semantics of unstructured text (short or long) on the web. In this work we especially look at the impact of entity linking on search logs. Search queries follow a Zipfian distribution wherein other than few popular queries (head queries), a significant percentage of queries (tail queries) occur rarely. Given a search log, there is sufficient data to analyze head queries but insufficient data (low frequency, limited clicks) to draw any conclusions about tail queries. In this work we focus on quantifying the extent of overlap between long tail and head queries by means of entity linking. We specifically analyze the frequency distribution of entities in head and tail queries. Our analysis shows that by means of entity linking, we can indeed bridge the gap between the head and tail.
A Fragment-Based Similarity Measure for Concept Hierarchies and Ontologies BIBAKFull-Text 41-42
  Hui Yang
Despite the popularity of concept hierarchies and ontologies, such as Yahoo! Directory, a similarity measure that considers both hierarchy content and topology and is highly efficient has not yet been reached. A commonly used metric, Tree Edit Distance, exhibits extreme inefficiency when measuring similarities between unordered hierarchies. In this paper, we propose a novel and feasible solution, Fragment-based Similarity (FBS), to serve as an efficient and effective measurement for hierarchy similarity evaluation. Experimental results and a user study show that FBS not only well-approximates but also is more efficient than Tree Edit Distance.
Keywords: Hierarchy Evaluation; Fragment-based Similarity
Exploiting Inference from Semantic Annotations for Information Retrieval: Reflections From Medical IR BIBAFull-Text 43-45
  Guido Zuccon; Bevan Koopman; Peter Bruza
The increasing amount of information that is annotated against standardised semantic resources offers opportunities to incorporate sophisticated levels of reasoning, or inference, into the retrieval process. In this position paper, we reflect on the need to incorporate semantic inference into retrieval (in particular for medical information retrieval) as well as previous attempts that have been made so far with mixed success. Medical information retrieval is a fertile ground for testing inference mechanisms to augment retrieval. The medical domain offers a plethora of carefully curated, structured, semantic resources, along with well established entity extraction and linking tools, and search topics that intuitively require a number of different inferential processes (e.g., conceptual similarity, conceptual implication, etc.). We argue that integrating semantic inference in information retrieval has the potential to uncover a large amount of information that otherwise would be inaccessible; but inference is also risky and, if not used cautiously, can harm retrieval.