Experimental Study on Semi-structured Peer-to-Peer Information Retrieval Network | | BIBAK | Full-Text | 3-14 | |
Rami S. Alkhawaldeh; Joemon M. Jose | |||
In the recent decades, retrieval systems deployed over peer-to-peer (P2P)
overlay networks have been investigated as an alternative to centralised search
engines. Although modern search engines provide efficient document retrieval,
they possess several drawbacks. In order to alleviate their problems, P2P
Information Retrieval (P2PIR) systems provide an alternative architecture to
the traditional centralised search engine. Users and creators of web content in
such networks have full control over what information they wish to share as
well as how they share it. The semi-structured P2P architecture has been
proposed where the underlying approach organises similar document in a peer,
often using clustering techniques, and promotes willing peers as super peers
(or hubs) to traffic queries to appropriate peers with relevant content.
However, no systematic evaluation study has been performed on such
architectures. In this paper, we study the performance of three cluster-based
semi-structured P2PIR models and explain the effectiveness of several important
design considerations and parameters on retrieval performance, as well as the
robustness of these types of network. Keywords: Semi-structured Peer-to-Peer; Clustering peers; Query routing; Resource
selection; Evaluation |
Evaluating Stacked Marginalised Denoising Autoencoders Within Domain Adaptation Methods | | BIBA | Full-Text | 15-27 | |
Boris Chidlovskii; Gabriela Csurka; Stephane Clinchant | |||
In this paper we address the problem of domain adaptation using multiple source domains. We extend the XRCE contribution to Clef'14 Domain Adaptation challenge [6] with the new methods and new datasets. We describe a new class of domain adaptation technique based on stacked marginalized denoising autoencoders (sMDA). It aims at extracting and denoising features common to both source and target domains in the unsupervised mode. Noise marginalization allows to obtain a closed form solution and to considerably reduce the training time. We build a classification system which compares sMDA combined with SVM or with Domain Specific Class Mean classifiers to the state-of-the-art in both unsupervised and semi-supervised settings. We report the evaluation results for a number of image and text datasets. |
Language Variety Identification Using Distributed Representations of Words and Documents | | BIBAK | Full-Text | 28-40 | |
Marc Franco-Salvador; Francisco Rangel; Paolo Rosso; Mariona Taulé; M. Antònia Martít | |||
Language variety identification is an author profiling subtask which aims to
detect lexical and semantic variations in order to classify different varieties
of the same language. In this work we focus on the use of distributed
representations of words and documents using the continuous Skip-gram model. We
compare this model with three recent approaches: Information GainWord-Patterns,
TF-IDF graphs and Emotion-labeled Graphs, in addition to several baselines. We
evaluate the models introducing the Hispablogs dataset, a new collection of
Spanish blogs from five different countries: Argentina, Chile, Mexico, Peru and
Spain. Experimental results show state-of-the-art performance in language
variety identification. In addition, our empirical analysis provides
interesting insights on the use of the evaluated approaches. Keywords: Author profiling; Language variety identification; Distributed
representations; Information Gain Word-Patterns; TF-IDF graphs; Emotion-labeled
Graphs |
Evaluating User Image Tagging Credibility | | BIBA | Full-Text | 41-52 | |
Alexandru Lucian Ginsca; Adrian Popescu; Mihai Lupu; Adrian Iftene; Ioannis Kanellos | |||
When looking for information on the Web, the credibility of the source plays an important role in the information seeking experience. While data source credibility has been thoroughly studied for Web pages or blogs, the investigation of source credibility in image retrieval tasks is an emerging topic. In this paper, we first propose a novel dataset for evaluating the tagging credibility of Flickr users built with the aim of covering a large variety of topics. We present the motivation behind the need for such a dataset, the methodology used for its creation and detail important statistics on the number of users, images and rater agreement scores. Next, we define both a supervised learning task in which we group the users in 5 credibility classes and a credible user retrieval problem. Besides a couple of credibility features described in previous work, we propose a novel set of credibility estimators, with an emphasis on text based descriptors. Finally, we prove the usefulness of our evaluation dataset and justify the performances of the proposed credibility descriptors by showing promising results for both of the proposed tasks. |
Tweet Expansion Method for Filtering Task in Twitter | | BIBAK | Full-Text | 55-64 | |
Payam Karisani; Farhad Oroumchian; Maseud Rahgozar | |||
In this article we propose a supervised method for expanding tweet contents
to improve the recall of tweet filtering task in online reputation management
systems. Our method does not use any external resources. It consists of
creating a K-NN classifier in three steps. In these steps the tweets labeled
related and unrelated in the training set are expanded by extracting and adding
the most discriminative terms, calculating and adding the most frequent terms,
and re-weighting the original tweet terms from training set. Our experiments in
RepLab 2013 data set show that our method improves the performance of filtering
task, in terms of F criterion, up to 13% over state-of-the-art classifiers such
as SVM. This data set consists of 61 entities from different domains of
automotive, banking, universities, and music. Keywords: Twitter; Classification; Filtering; Content expansion |
Real-Time Entity-Based Event Detection for Twitter | | BIBAK | Full-Text | 65-77 | |
Andrew J. McMinn; Joemon M. Jose | |||
In recent years there has been a surge of interest in using Twitter to
detect real-world events. However, many state-of-the-art event detection
approaches are either too slow for real-time application, or can detect only
specific types of events effectively. We examine the role of named entities and
use them to enhance event detection. Specifically, we use a clustering
technique which partitions documents based upon the entities they contain, and
burst detection and cluster selection techniques to extract clusters related to
on-going real-world events. We evaluate our approach on a large-scale corpus of
120 million tweets covering more than 500 events, and show that it is able to
detect significantly more events than current state-of-the-art approaches
whilst also improving precision and retaining low computational complexity. We
find that nouns and verbs play different roles in event detection and that the
use of hashtags and retweets lead to a decreases in effectiveness when using
our entitybase approach. Keywords: Event detection; Social media; Reproducibility; Twitter |
A Comparative Study of Click Models for Web Search | | BIBA | Full-Text | 78-90 | |
Artem Grotov; Aleksandr Chuklin; Ilya Markov; Luka Stout; Finde Xumara; Maarten de Rijke | |||
Click models have become an essential tool for understanding user behavior on a search engine result page, running simulated experiments and predicting relevance. Dozens of click models have been proposed, all aiming to tackle problems stemming from the complexity of user behavior or of contemporary result pages. Many models have been evaluated using proprietary data, hence the results are hard to reproduce. The choice of baseline models is not always motivated and the fairness of such comparisons may be questioned. In this study, we perform a detailed analysis of all major click models for web search ranging from very simplistic to very complex. We employ a publicly available dataset, open-source software and a range of evaluation techniques, which makes our results both representative and reproducible. We also analyze the query space to show what type of queries each model can handle best. |
Evaluation of Pseudo Relevance Feedback Techniques for Cross Vertical Aggregated Search | | BIBA | Full-Text | 91-102 | |
Hermann Ziak; Roman Kern | |||
Cross vertical aggregated search is a special form of meta search, were multiple search engines from different domains and varying behaviour are combined to produce a single search result for each query. Such a setting poses a number of challenges, among them the question of how to best evaluate the quality of the aggregated search results. We devised an evaluation strategy together with an evaluation platform in order to conduct a series of experiments. In particular, we are interested whether pseudo relevance feedback helps in such a scenario. Therefore we implemented a number of pseudo relevance feedback techniques based on knowledge bases, where the knowledge base is either Wikipedia or a combination of the underlying search engines themselves. While conducting the evaluations we gathered a number of qualitative and quantitative results and gained insights on how different users compare the quality of search result lists. In regard to the pseudo relevance feedback we found that using Wikipedia as knowledge base generally provides a benefit, unless for entity centric queries, which are targeting single persons or organisations. Our results will enable to help steering the development of cross vertical aggregated search engines and will also help to guide large scale evaluation strategies, for example using crowd sourcing techniques. |
Analysing the Role of Representation Choices in Portuguese Relation Extraction | | BIBAK | Full-Text | 105-116 | |
Sandra Collovini; Marcelo de Bairros P. Filho; Renata Vieira | |||
Relation Extraction is the task of identifying and classifying the semantic
relations between entities in text. This task is one of the main challenges in
Natural Language Processing. In this work, the relation extraction task is
treated as sequence labelling problem. We analysed the impact of different
representation schemes for the relation descriptors. In particular, we analysed
the BIO and IO schemes performance considering a Conditional Random Fields
classifier for the extraction of any relation descriptor occurring between
named entities in the Organisation domain (Person, Organisation, Place).
Overall, the classifier proposed here presents the best results using the IO
notation. Keywords: Natural Language Processing; Information Extraction; Relation extraction;
Organisation domain; Portuguese language |
An Investigation of Cross-Language Information Retrieval for User-Generated Internet Video | | BIBAK | Full-Text | 117-129 | |
Ahmad Khwileh; Debasis Ganguly; Gareth J. F. Jones | |||
Increasing amounts of user-generated video content are being uploaded to
online repositories. This content is often very uneven in quality and topical
coverage in different languages. The lack of material in individual languages
means that cross-language information retrieval (CLIR) within these collections
is required to satisfy the user's information need. Search over this content is
dependent on available metadata, which includes user-generated annotations and
often noisy transcripts of spoken audio. The effectiveness of CLIR depends on
translation quality between query and content languages. We investigate CLIR
effectiveness for the blip10000 archive of user-generated Internet video
content. We examine the retrieval effectiveness using the title and free-text
metadata provided by the uploader and automatic speech recognition (ASR)
generated transcripts. Retrieval is carried out using the Divergence From
Randomness models, and automatic translation using Google translate. Our
experimental investigation indicates that different sources of evidence have
different retrieval effectiveness and in particular differing levels of
performance in CLIR. Specifically, we find that the retrieval effectiveness of
the ASR source is significantly degraded in CLIR. Our investigation also
indicates that for this task the Title source provides the most robust source
of evidence for CLIR, and performs best when used in combination with other
sources of evidence. We suggest areas for investigation to give most effective
and robust CLIR performance for user-generated content. Keywords: Cross-Language Video Retrieval; User generated content; User generated
internet video search |
Benchmark of Rule-Based Classifiers in the News Recommendation Task | | BIBAK | Full-Text | 130-141 | |
Tomáš Kliegr; Jaroslav Kuchar | |||
In this paper, we present experiments evaluating Association Rule
Classification algorithms on on-line and off-line recommender tasks of the CLEF
NewsReel 2014 Challenge. The second focus of the experimental evaluation is to
investigate possible performance optimizations of the Classification Based on
Associations algorithm. Our findings indicate that pruning steps in CBA reduce
the number of association rules substantially while not affecting accuracy.
Using only part of the data employed for the rule learning phase in the pruning
phase may also reduce training time while not affecting accuracy significantly. Keywords: Recommender; Association rules; Rule learning; Decision trees |
Enhancing Medical Information Retrieval by Exploiting a Content-Based Recommender Method | | BIBAK | Full-Text | 142-153 | |
Wei Li; Gareth J. F. Jones | |||
Information Retrieval (IR) systems seek to find information which is
relevant to a searcher's information needs. Improving IR effectiveness using
personalization has been a significant focus of research attention in recent
years. However, in some situations there may be no opportunity to learn about
the interests of a specific user on a certain topic. This is a particular
problem for medical IR where individuals find themselves needing information on
topics for which they have never previously searched. However, in all
likelihood other users will have searched with the same information need
previously. This presents an opportunity to IR researchers attempting to
improve search effectiveness by exploiting previous user search behaviour. We
describe a method to enhance IR in the medical domain based on recommender
systems (RSs) by using a content-based recommender model in combination with a
standard IR model. We use search behaviour data from previous users with
similar interests to aid the current user to discover better search results. We
demonstrate the effectiveness of this method using a test dataset collected as
part of the EU FP7 Khresmoi project. Keywords: Information retrieval; Content-based filtering; Medical search |
Summarizing Citation Contexts of Scientific Publications | | BIBAK | Full-Text | 154-165 | |
Sandra Mitrovic; Henning Müller | |||
As the number of publications is increasing rapidly, it becomes increasingly
difficult for researchers to find existing scientific papers most relevant for
their work, even when the domain is limited. To overcome this, it is common to
use paper summarization techniques in specific domains. In difference to
approaches that exploit the paper content itself, in this paper we perform
summarization of the citation context of a paper. For this, we adjust and apply
existing summarization techniques and we come up with a hybrid method, based on
clustering and latent semantic analysis. We apply this on medical informatics
publications and compare performance of methods that outscore other techniques
on a standard database. Summarization of the citation context can be
complementary to full text summarization, particularly to find candidate
papers. The reached performance seems good for routine use even though it was
only tested on a small database. Keywords: Text summarization; Sentence similarity; Citation context |
A Multiple-Stage Approach to Re-ranking Medical Documents | | BIBAK | Full-Text | 166-177 | |
Heung-Seon Oh; Yuchul Jung; Kwang-Young Kim | |||
The widespread use of the Web has radically changed the way people acquire
medical information. Every day, patients, their caregivers, and doctors
themselves search for medical information to resolve their medical information
needs. However, search results provided by existing medical search engines
often contain irrelevant or uninformative documents that are not appropriate
for the purposes of the users. As a solution, this paper presents a method of
reranking medical documents. The key concept of our method is to compute
accurate similarity scores through multiple stages of re-ranking documents from
the initial documents retrieved by a search engine. Specifically, our method
combines query expansion with abbreviations, query expansion with discharge
summary, clustering-based document scoring, centrality-based document scoring,
and pseudo relevance feedback with relevance model. The experimental results
from participating in Task 3a of the CLEF 2014 eHealth show the performance of
our method. Keywords: Medical information retrieval; Document re-ranking; Medical abbreviations |
Exploring Behavioral Dimensions in Session Effectiveness | | BIBAK | Full-Text | 178-189 | |
Teemu Paakkonen; Kalervo Jarvelin; Jaana Kekalainen; Heikki Keskustalo; Feza Baskaya; David Maxwell; Leif Azzopardi | |||
Studies in interactive information retrieval (IIR) indicate that expert
searchers differ from novices in many ways. In the present paper, we identify a
number of behavioral dimensions along which searchers differ (e.g. cost, gain
and the accuracy of relevance assessment). We quantify these differences using
simulated, multi-query search sessions. We then explore each dimension in turn
to determine what differences are most effective in yielding superior retrieval
performance. The more precise action probabilities in assessing snippets and
documents contribute less to the overall cumulative gain during a session than
gain and cost structures. Keywords: Session-based evaluation; IR interaction; Behavioral dimensions; Simulation;
Multi-query scanning models |
Meta Text Aligner: Text Alignment Based on Predicted Plagiarism Relation | | BIBAK | Full-Text | 193-199 | |
Samira Abnar; Mostafa Dehghani; Azadeh Shakery | |||
Text alignment is one of the main steps of plagiarism detection in textual
environments. Considering the pattern in distribution of the common semantic
elements of the two given documents, different strategies may be suitable for
this task. In this paper we assume that the obfuscation level, i.e the
plagiarism type, is a function of the distribution of the common elements in
the two documents. Based on this assumption, we propose Meta Text Aligner which
predicts plagiarism relation of two given documents and employs the prediction
results to select the best text alignment strategy. Thus, it will potentially
perform better than the existing methods which use a same strategy for all
cases. As indicated by the experiments, we have been able to classify document
pairs based on plagiarism type with the precision of 89%. Furthermore
exploiting the predictions of the classifier for choosing the proper method or
the optimal configuration for each type we have been able to improve the
Plagdet score of the existing methods. Keywords: Meta Text Aligner; Plagiarism type; Text alignment; Plagiarism detection;
Patterns of distribution of common elements |
Automatic Indexing of Journal Abstracts with Latent Semantic Analysis | | BIBA | Full-Text | 200-208 | |
Joel Robert Adams; Steven Bedrick | |||
The BioASQ "Task on Large-Scale Online Biomedical Semantic Indexing" charges participants with assigning semantic tags to biomedical journal abstracts. We present a system that takes as input a biomedical abstract and uses latent semantic analysis to identify similar documents in the MEDLINE database. The system then uses a novel ranking scheme to select a list of MeSH tags from candidates drawn from the most similar documents. Our approach achieved better than baseline performance in both precision and recall. We suggest several possible strategies to improve the system's performance. |
Shadow Answers as an Intermediary in Email Answer Retrieval | | BIBAK | Full-Text | 209-214 | |
Alyaa Alfalahi; Gunnar Eriksson; Eriks Sneiders | |||
A set of standard answers facilitates answering emails at customer care
centers. Matching the text of user emails to the standard answers may not be
productive because they do not necessarily have the same wording. Therefore we
examine archived email-answer pairs and establish query-answer term
co-occurrences. When a new user email arrives, we replace query words with most
co-occurring answer words and obtain a "shadow answer", which is a new query to
retrieve standard answers. As a measure of term co-occurrence strength we test
raw term co-occurrences and Pointwise Mutual Information. Keywords: Email answering; Statistical word associations; Shadow answer |
Are Topically Diverse Documents Also Interesting? | | BIBAK | Full-Text | 215-221 | |
Hosein Azarbonyad; Ferron Saan; Mostafa Dehghani; Maarten Marx; Jaap Kamps | |||
Text interestingness is a measure of assessing the quality of documents from
users' perspective which shows their willingness to read a document. Different
approaches are proposed for measuring the interestingness of texts. Most of
these approaches suppose that interesting texts are also topically diverse and
estimate interestingness using topical diversity. In this paper, we investigate
the relation between interestingness and topical diversity. We do this on the
Dutch and Canadian parliamentary proceedings. We apply an existing measure of
interestingness, which is based on structural properties of the proceedings
(eg, how much interaction there is between speakers in a debate). We then
compute the correlation between this measure of interestingness and topical
diversity.
Our main findings are that in general there is a relatively low correlation between interestingness and topical diversity; that there are two extreme categories of documents: highly interesting, but hardly diverse (focused interesting documents) and highly diverse but not interesting documents. When we remove these two extreme types of documents there is a positive correlation between interestingness and diversity. Keywords: Text interestingness; Text topical diversity; Parliamentary proceedings |
Modeling of the Question Answering Task in the YodaQA System | | BIBAK | Full-Text | 222-228 | |
Petr Baudiš; Jan Šedivý | |||
We briefly survey the current state of art in the field of Question
Answering and present the YodaQA system, an open source framework for this task
and a baseline pipeline with reasonable performance. We take a holistic
approach, reviewing and aiming to integrate many different question answering
task definitions and approaches concerning classes of knowledge bases, question
representation and answer generation. To ease performance comparisons of
general-purpose QA systems, we also propose an effort in building a new
reference QA testing corpus which is a curated and extended version of the TREC
corpus. Keywords: Question answering; Information retrieval; Information extraction; Linked
data; Natural language processing |
Unfair Means: Use Cases Beyond Plagiarism | | BIBA | Full-Text | 229-234 | |
Paul Clough; Peter Willett; Jessie Lim | |||
The study of plagiarism and its detection is a highly popular field of research that has witnessed increased attention over recent years. In this paper we describe the range of problems that exist within academe in the area of 'unfair means', which encompasses a wider range of issues of attribution, ownership and originality. Unfair means offers a variety of problems that may benefit from the development of computational methods, thereby requiring appropriate evaluation resources. This may provide further areas of focus for large-scale evaluation activities, such as PAN, and researchers in the field more generally. |
Instance-Based Learning for Tweet Monitoring and Categorization | | BIBA | Full-Text | 235-240 | |
Julien Gobeill; Arnaud Gaudinat; Patrick Ruch | |||
The CLEF RepLab 2014 Track was the occasion to investigate the robustness of instance-based learning in a complete system for tweet monitoring and categorization based. The algorithm we implemented was a k-Nearest Neighbors. Dealing with the domain (automotive or banking) and the language (English or Spanish), the experiments showed that the categorizer was not affected by the choice of representation: even with all learning tweets merged into one single Knowledge Base (KB), the observed performances were close to those with dedicated KBs. Interestingly, English training data in addition to the sparse Spanish data were useful for Spanish categorization (+14% for accuracy for automotive, +26% for banking). Yet, performances suffered from an over-prediction of the most prevalent category. The algorithm showed the defects of its virtues: it was very robust, but not easy to improve. BiTeM/SIBtex tools for tweet monitoring are available within the DrugsListener Project page of the BiTeM website (http://bitem.hesge.ch/). |
Are Test Collections "Real"? Mirroring Real-World Complexity in IR Test Collections | | BIBAK | Full-Text | 241-247 | |
Melanie Imhof; Martin Braschler | |||
Objective evaluation of effectiveness is a major topic in the field of
information retrieval (IR), as emphasized by the numerous evaluation campaigns
in this area. The increasing pervasiveness of information has lead to a large
variety of IR application scenarios that involve different information types
(modalities), heterogeneous documents and context-enriched queries. In this
paper, we argue that even though the complexity of academic test collections
has increased over the years, they are still too structurally simple in
comparison to operational collections in real-world applications. Furthermore,
research has brought up retrieval methods for very specific modalities, such as
ratings, geographical coordinates and timestamps. However, it is still unclear
how to systematically incorporate new modalities in IR systems. We therefore
propose a categorization of modalities that not only allows analyzing the
complexity of a collection but also helps to generalize methods to entire
modality categories instead of being specific for a single modality. Moreover,
we discuss how such a complex collection can methodically be built for the
usage in an evaluation campaign. Keywords: Collection complexity; Modality categorization; Evaluation campaigns |
Evaluation of Manual Query Expansion Rules on a Domain Specific FAQ Collection | | BIBA | Full-Text | 248-253 | |
Mladen Karan; Jan Šnajder | |||
Frequently asked question (FAQ) knowledge bases are a convenient way to organize domain specific information. However, FAQ retrieval is challenging because the documents are short and the vocabulary is domain specific, giving rise to the lexical gap problem. To address this problem, in this paper we consider rule-based query expansion (QE) for domain specific FAQ retrieval. We build a small test collection and evaluate the potential of QE rules. While we observe some improvement for difficult queries, our results suggest that the potential of manual rule compilation is limited. |
Evaluating Learning Language Representations | | BIBAK | Full-Text | 254-260 | |
Jussi Karlgren; Jimmy Callin; Kevyn Collins-Thompson; Amaru Cuba Gyllensten; Ariel Ekgren; David Jurgens; Anna Korhonen; Fredrik Olsson; Magnus Sahlgren; Hinrich Schütze | |||
Machine learning offers significant benefits for systems that process and
understand natural language: (a) lower maintenance and upkeep costs than when
using manually-constructed resources, (b) easier portability to new domains,
tasks, or languages, and (c) robust and timely adaptation to situation-specific
settings. However, the behaviour of an adaptive system is less predictable than
when using an edited, stable resource, which makes quality control a continuous
issue. This paper proposes an evaluation benchmark for measuring the quality,
coverage, and stability of a natural language system as it learns word meaning.
Inspired by existing tests for human vocabulary learning, we outline measures
for the quality of semantic word representations, such as when learning word
embeddings or other distributed representations. These measures highlight
differences between the types of underlying learning processes as systems
ingest progressively more data. Keywords: Language representations; Semantic spaces; Word embeddings; Machine
learning; Evaluation |
Automatic Segmentation and Deep Learning of Bird Sounds | | BIBAK | Full-Text | 261-267 | |
Hendrik Vincent Koops; Jan van Balen; Frans Wiering | |||
We present a study on automatic birdsong recognition with deep neural
networks using the birdclef2014 dataset. Through deep learning, feature
hierarchies are learned that represent the data on several levels of
abstraction. Deep learning has been applied with success to problems in fields
such as music information retrieval and image recognition, but its use in
bioacoustics is rare. Therefore, we investigate the application of a common
deep learning technique (deep neural networks) in a classification task using
songs from Amazonian birds. We show that various deep neural networks are
capable of outperforming other classification methods. Furthermore, we present
an automatic segmentation algorithm that is capable of separating bird sounds
from non-bird sounds. Keywords: Deep learning; Feature learning; Bioacoustics; Segmentation |
The Impact of Noise in Web Genre Identification | | BIBA | Full-Text | 268-273 | |
Dimitrios Pritsos; Efstathios Stamatatos | |||
Genre detection of web documents fits an open-set classification task. The web documents not belonging to any predefined genre or where multiple genres co-exist is considered as noise. In this work we study the impact of noise on automated genre identification within an open-set classification framework. We examine alternative classification models and document representation schemes based on two corpora, one without noise and one with noise showing that the recently proposed RFSE model can remain robust with noise. Moreover, we show how that the identification of certain genres is not practically affected by the presence of noise. |
On the Multilingual and Genre Robustness of EmoGraphs for Author Profiling in Social Media | | BIBAK | Full-Text | 274-280 | |
Francisco Rangel; Paolo Rosso | |||
Author profiling aims at identifying different traits such as age and gender
of an author on the basis of her writings. We propose the novel EmoGraph
graph-based approach where morphosyntactic categories are enriched with
semantic and affective information. In this work we focus on testing the
robustness of EmoGraphs when applied to age and gender identification. Results
with PAN-AP-14 corpus show the competitiveness of the representation over
genres and languages. Finally, some interesting insights are shown, for example
with topic and emotion bounded genres such as hotel reviews. Keywords: Author profiling; Age identification; Gender identification; Emotion-labeled
graphs; EmoGraph |
Is Concept Mapping Useful for Biomedical Information Retrieval? | | BIBAK | Full-Text | 281-286 | |
Wei Shen; Jian-Yun Nie | |||
Concepts have been extensively used in biomedical information retrieval
(BIR); but the experimental results have often showed limited or no improvement
compared to a traditional bag-of-words method. In this paper, we analyze the
problems in concept mapping, and show how they can affect the results of BIR.
This suggests a flexible utilization of the identified concepts. Keywords: Biomedical information retrieval; Concept; MetaMap; UMLS |
Using Health Statistics to Improve Medical and Health Search | | BIBA | Full-Text | 287-292 | |
Tawan Sierek; Allan Hanbury | |||
We present a probabilistic information retrieval (IR) model that incorporates epidemiological data and simple patient profiles that are composed of a patient's sex and age. This approach is intended to improve retrieval effectiveness in the health and medical domain. We evaluated our approach on the TREC Clinical Decision Support Track 2014. The new approach performed better than a baseline run, however at this time, we cannot report any statistically significant improvements. |
Determining Window Size from Plagiarism Corpus for Stylometric Features | | BIBA | Full-Text | 293-299 | |
Šimon Suchomel; Michal Brandejs | |||
The sliding window concept is a common method for computing a profile of a document with unknown structure. This paper outlines an experiment with stylometric word-based feature in order to determine an optimal size of the sliding window. It was conducted for a vocabulary richness method called 'average word frequency class' using the PAN 2015 source retrieval training corpus for plagiarism detection. The paper shows the pros and cons of the stop words removal for the sliding window document profiling and discusses the utilization of the selected feature for intrinsic plagiarism detection. The experiment resulted in the recommendation of setting the sliding windows to around 100 words in length for computing the text profile using the average word frequency class stylometric feature. |
Effect of Log-Based Query Term Expansion on Retrieval Effectiveness in Patent Searching | | BIBAK | Full-Text | 300-305 | |
Wolfgang Tannebaum; Parvaz Mahdabi; Andreas Rauber | |||
In this paper we study the impact of query term expansion (QTE) using
synonyms on patent document retrieval. We use an automatically generated
lexical database from USPTO query logs, called PatNet, which provides synonyms
and equivalents for a query term. Our experiments on the CLEF-IP 2010 benchmark
dataset show that automatic query expansion using PatNet tends to decrease or
only slightly improve the retrieval effectiveness, with no significant
improvement. An analysis of the retrieval results shows that PatNet does not
have generally a negative effect on the retrieval effectiveness. Recall is
drastically improved for query topics, where the baseline queries achieve, on
average, only low recall values. But we have not detected any commonality that
allows us to characterize these queries. So we recommend using PatNet for
semi-automatic QTE in Boolean retrieval, where expanding query terms with
synonyms and equivalents with the aim of expanding the query scope is a common
practice. Keywords: Patent searching; Query term expansion; Query log analysis |
Integrating Mixed-Methods for Evaluating Information Access Systems | | BIBA | Full-Text | 306-311 | |
Simon Wakeling; Paul Clough | |||
The evaluation of information access systems is increasingly making use of multiple evaluation methods. While such studies represent forms of mixed-methods research, they are rarely acknowledged as such. This means that researchers are potentially failing to recognise the challenges and opportunities offered by multi-phase research, particularly in terms of data integration. This paper provides a brief case study of how one framework -- Bazely & Kemp's metaphors for integrated analysis -- was employed to formalise data integration for a large exploratory evaluation study. |
Teaching the IR Process Using Real Experiments Supported by Game Mechanics | | BIBAK | Full-Text | 312-317 | |
Thomas Wilhelm-Stein; Maximilian Eibl | |||
We present a web-based tool for teaching and learning the information
retrieval process. An interactive approach helps students gain practical
knowledge. Our focus is the arrangement and configuration of IR components and
their evaluation. The incorporation of game mechanics counteracts an
information overload and motivates progression. Keywords: Information retrieval; Teaching; Learning; Web application; Components; Game
mechanics |
Tweet Contextualization Using Association Rules Mining and DBpedia | | BIBAK | Full-Text | 318-323 | |
Meriem Amina Zingla; Chiraz Latiri; Yahya Slimani | |||
Tweets are short 140 characters-limited messages that do not always conform
to proper spelling rules. This spelling variation makes them hard to understand
without some kind of context. For these reasons, the tweet contextualization
task was introduced, aiming to provide automatic contexts to explain the
tweets. We present, in this paper, two tweet contextualization approaches. The
first is an inter-term association rules mining-based method, the second one,
however, makes use of the DBpedia ontology. These approaches allow us to
augment the vocabulary of a given tweet with a set of thematically related
words. We conducted an experimental study on the INEX2014 collection to prove
the effectiveness of our approaches, the obtained results are very promising. Keywords: Information retrieval; Tweet contextualization track; Query expansion;
DBpedia; Association rules |
Search-Based Image Annotation: Extracting Semantics from Similar Images | | BIBA | Full-Text | 327-339 | |
Petra Budikova; Michal Batko; Jan Botorek; Pavel Zezula | |||
The importance of automatic image annotation as a tool for handling large amounts of image data has been recognized for several decades. However, working tools have long been limited to narrow-domain problems with a few target classes for which precise models could be trained. With the advance of similarity searching, it now becomes possible to employ a different approach: extracting information from large amounts of noisy web data. However, several issues need to be resolved, including the acquisition of a suitable knowledge base, choosing a suitable visual content descriptor, implementation of effective and efficient similarity search engine, and extraction of semantics from similar images. In this paper, we address these challenges and present a working annotation system based on the search-based paradigm, which achieved good results in the 2014 ImageCLEF Scalable Concept Image Annotation challenge. |
NLP-Based Classifiers to Generalize Expert Assessments in E-Reputation | | BIBA | Full-Text | 340-351 | |
Jean-Valère Cossu; Emmanuel Ferreira; Killian Janod; Julien Gaillard; Marc El-Bèze | |||
Online Reputation Management (ORM) is currently dominated by expert abilities. One of the great challenges is to effectively collect annotated training samples, especially to be able to generalize a small pool of expert feedback from area scale to a more global scale. One possible solution is to use advanced Machine Learning (ML) techniques, to select annotations from training samples, and propagate effectively and concisely. We focus on the critical issue of understanding the different levels of annotations. Using the framework proposed by the RepLab contest we present a considerable number of experiments in Reputation Monitoring and Author Profiling. The proposed methods rely on a large variety of Natural Language Processing (NLP) methods exploiting tweet contents and some background contextual information. We show that simple algorithms only considering tweets content are effective against state-of-the-art techniques. |
A Method for Short Message Contextualization: Experiments at CLEF/INEX | | BIBAK | Full-Text | 352-363 | |
Liana Ermakova | |||
This paper presents the approach we developed for automatic multi-document
summarization applied to short message contextualization, in particular to
tweet contextualization. The proposed method is based on named entity
recognition, part-of-speech weighting and sentence quality measuring. In
contrast to previous research, we introduced an algorithm from smoothing from
the local context. Our approach exploits topic-comment structure of a text.
Moreover, we developed a graph-based algorithm for sentence reordering. The
method has been evaluated at INEX/CLEF tweet contextualization track. We
provide the evaluation results over the 4 years of the track. The method was
also adapted to snippet retrieval and query expansion. The evaluation results
indicate good performance of the approach. Keywords: Information retrieval; Tweet Contextualization; Summarization; Snippet;
Sentence extraction; Readability; Topic-comment structure |
Towards Automatic Large-Scale Identification of Birds in Audio Recordings | | BIBAK | Full-Text | 364-375 | |
Mario Lasseck | |||
This paper presents a computer-based technique for bird species
identification at large scale. It automatically identifies multiple species
simultaneously in a large number of audio recordings and provides the basis for
the best scoring submission to the LifeCLEF 2014 Bird Identification Task. The
method achieves a Mean Average Precision of 51.1% on the test set and 53.9% on
the training set with an Area Under the Curve of 91.5% during cross-validation.
Besides a general description of the underlying classification approach a
number of additional research questions are addressed regarding the choice of
features, selection of classifier hyperparameters and method of classification. Keywords: Bird Identification; Information retrieval; Biodiversity; Spectrogram
segmentation; Median Clipping; Template matching; Decision trees |
Optimizing and Evaluating Stream-Based News Recommendation Algorithms | | BIBA | Full-Text | 376-388 | |
Andreas Lommatzsch; Sebastian Werner | |||
Recommender algorithms are powerful tools helping users to find interesting
items in the overwhelming amount available data. Classic recommender algorithms
are trained based on a huge set of user-item interactions collected in the
past. Since the learning of models is computationally expensive, it is
difficult to integrate new knowledge into the recommender models. With the
growing importance of social networks, the huge amount of data generated by the
real-time web (e.g. news portals, micro-blogging services), and the ubiquity of
personalized web portals stream-based recommender systems get in the focus of
research.
In this paper we develop algorithms tailored to the requirements of a web-based news recommendation scenario. The algorithms address the specific challenges of news recommendations, such as a context-dependent relevance of news items and the short item lifecycle forcing the recommender algorithms to continuously adapt to the set of news articles. In addition, the scenario is characterized by a huge amount of messages (that must be processed per second) and by tight time constraints resulting from the fact that news recommendations should be embedded into webpages without a delay. For evaluating and optimizing the recommender algorithms we implement an evaluation framework, allowing us analyzing and comparing different recommender algorithms in different contexts. We discuss the strength and weaknesses both according to recommendation precision and technical complexity. We show how the evaluation framework enables us finding the optimal recommender algorithm for a specific scenarios and contexts. |
Information Extraction from Clinical Documents: Towards Disease/Disorder Template Filling | | BIBAK | Full-Text | 389-401 | |
Veera Raghavendra Chikka; Nestor Mariyasagayam; Yoshiki Niwa; Kamalakar Karlapalem | |||
In recent years there has been an increase in the generation of electronic
health records (EHRs), which lead to an increased scope for research on
biomedical literature. Many research works have been using various NLP,
information retrieval and machine learning techniques to extract information
from these records. In this paper, we provide a methodology to extract
information for understanding the status of the disease/disorder. The status of
disease/disorder is based on different attributes like temporal information,
severity and progression of the disease. Here, we consider ten attributes that
allow us to understand the majority details regarding the status of the
disease/disorder. They are Negation Indicator, Subject Class, Uncertainty
Indicator, Course Class, Severity Class, Conditional Class, Generic Class, Body
Location, DocTime Class, and Temporal Expression. In this paper, we present
rule-based and machine learning approaches to identify each of these attributes
and evaluate our system on attribute level and system level accuracies. This
project was done as a part of the ShARe/CLEF eHealth Evaluation Lab 2014. We
were able to achieve state-of-art accuracy (0.868) in identifying normalized
values of the attributes. Keywords: NLP; Information extraction; Unified Medical Language System (UMLS); Apache
cTAKES; Relation extraction; Machine learning |
Adaptive Algorithm for Plagiarism Detection: The Best-Performing Approach at PAN 2014 Text Alignment Competition | | BIBA | Full-Text | 402-413 | |
Miguel A. Sanchez-Perez; Alexander Gelbukh; Grigori Sidorov | |||
The task of (monolingual) text alignment consists in finding similar text fragments between two given documents. It has applications in plagiarism detection, detection of text reuse, author identification, authoring aid, and information retrieval, to mention only a few. We describe our approach to the text alignment subtask of the plagiarism detection competition at PAN 2014, which resulted in the best-performing system at the PAN 2014 competition and outperforms the best-performing system of the PAN 2013 competition by the cumulative evaluation measure Plagdet. Our method relies on a sentence similarity measure based on a tf-idf-like weighting scheme that permits us to consider stopwords without increasing the rate of false positives. We introduce a recursive algorithm to extend the ranges of matching sentences to maximal length passages. We also introduce a novel filtering method to resolve overlapping plagiarism cases. Our system is available as open source. |
Question Answering via Phrasal Semantic Parsing | | BIBA | Full-Text | 414-426 | |
Kun Xu; Yansong Feng; Songfang Huang; Dongyan Zhao | |||
Understanding natural language questions and converting them into structured queries have been considered as a crucial way to help users access large scale structured knowledge bases. However, the task usually involves two main challenges: recognizing users' query intention and mapping the involved semantic items against a given knowledge base (KB). In this paper, we propose an efficient pipeline framework to model a user's query intention as a phrase level dependency DAG which is then instantiated regarding a specific KB to construct the final structured query. Our model benefits from the efficiency of linear structured prediction models and the separation of KB-independent and KB-related modelings. We evaluate our model on two datasets, and the experimental results showed that our method outperforms the state-of-the-art methods on the Free917 dataset, and, with limited training data from Free917, our model can smoothly adapt to new challenging dataset, WebQuestion, without extra training efforts while maintaining promising performances. |
Overview of the CLEF eHealth Evaluation Lab 2015 | | BIBAK | Full-Text | 429-443 | |
Lorraine Goeuriot; Liadh Kelly; Hanna Suominen; Leif Hanlen; Aurélie Névéol; Cyril Grouin; João Palotti; Guido Zuccon | |||
This paper reports on the 3rd CLEFeHealth evaluation lab, which continues
our evaluation resource building activities for the medical domain. In this
edition of the lab, we focus on easing patients and nurses in authoring,
understanding, and accessing eHealth information. The 2015 CLEFeHealth
evaluation lab was structured into two tasks, focusing on evaluating methods
for information extraction (IE) and information retrieval (IR). The IE task
introduced two new challenges. Task 1a focused on clinical speech recognition
of nursing handover notes; Task 1b focused on clinical named entity recognition
in languages other than English, specifically French. Task 2 focused on the
retrieval of health information to answer queries issued by general consumers
seeking information to understand their health symptoms or conditions.
The number of teams registering their interest was 47 in Tasks 1 (2 teams in Task 1a and 7 teams in Task 1b) and 53 in Task 2 (12 teams) for a total of 20 unique teams. The best system recognized 4,984 out of 6,818 test words correctly and generated 2,626 incorrect words (i.e., 38.5% error) in Task 1a; had the F-measure of 0.756 for plain entity recognition, 0.711 for normalized entity recognition, and 0.872 for entity normalization in Task 1b; and resulted in P@10 of 0.5394 and nDCG@10 of 0.5086 in Task 2. These results demonstrate the substantial community interest and capabilities of these systems in addressing challenges faced by patients and nurses. As in previous years, the organizers have made data and tools available for future research and development. Keywords: Evaluation; Information retrieval; Information extraction; Medical
informatics; Nursing records; Patient handoff/handover; Speech recognition;
Test-set generation; Text classification; Text segmentation; Self-diagnosis |
General Overview of ImageCLEF at the CLEF 2015 Labs | | BIBA | Full-Text | 444-461 | |
Mauricio Villegas; Henning Müller; Andrew Gilbert; Luca Piras; Josiah Wang; Krystian Mikolajczyk; Alba G. Seco de Herrera; Stefano Bromuri; M. Ashraful Amin; Mahmood Kazi Mohammed; Burak Acar; Suzan Uskudarli; Neda B. Marvasti; José F. Aldana; María del Mar Roldán García | |||
This paper presents an overview of the ImageCLEF 2015 evaluation campaign, an event that was organized as part of the CLEF labs 2015. ImageCLEF is an ongoing initiative that promotes the evaluation of technologies for annotation, indexing and retrieval for providing information access to databases of images in various usage scenarios and domains. In 2015, the 13th edition of ImageCLEF, four main tasks were proposed: 1) automatic concept annotation, localization and sentence description generation for general images; 2) identification, multi-label classification and separation of compound figures from biomedical literature; 3) clustering of x-rays from all over the body; and 4) prediction of missing radiological annotations in reports of liver CT images. The x-ray task was the only fully novel task this year, although the other three tasks introduced modifications to keep up relevancy of the proposed challenges. The participation was considerably positive in this edition of the lab, receiving almost twice the number of submitted working notes papers as compared to previous years. |
LifeCLEF 2015: Multimedia Life Species Identification Challenges | | BIBA | Full-Text | 462-483 | |
Alexis Joly; Hervé Goöeau; Hervé Glotin; Concetto Spampinato; Pierre Bonnet; Willem-Pier Vellinga; Robert Planqué; Andreas Rauber; Simone Palazzo; Bob Fisher; Henning Müller | |||
Using multimedia identification tools is considered as one of the most promising solutions to help bridging the taxonomic gap and build accurate knowledge of the identity, the geographic distribution and the evolution of living species. Large and structured communities of nature observers (e.g. eBird, Xeno-canto, Tela Botanica, etc.) as well as big monitoring equipments have actually started to produce outstanding collections of multimedia records. Unfortunately, the performance of the state-of-the-art analysis techniques on such data is still not well understood and is far from reaching the real world's requirements. The LifeCLEF lab proposes to evaluate these challenges around 3 tasks related to multimedia information retrieval and fine-grained classification problems in 3 living worlds. Each task is based on large and real-world data and the measured challenges are defined in collaboration with biologists and environmental stakeholders in order to reflect realistic usage scenarios. This paper presents more particularly the 2015 edition of LifeCLEF. For each of the three tasks, we report the methodology and the data sets as well as the raw results and the main outcomes. |
Overview of the Living Labs for Information Retrieval Evaluation (LL4IR) CLEF Lab 2015 | | BIBAK | Full-Text | 484-496 | |
Anne Schuth; Krisztian Balog; Liadh Kelly | |||
In this paper we report on the first Living Labs for Information Retrieval
Evaluation (LL4IR) CLEF Lab. Our main goal with the lab is to provide a
benchmarking platform for researchers to evaluate their ranking systems in a
live setting with real users in their natural task environments. For this first
edition of the challenge we focused on two specific use-cases: product search
and web search. Ranking systems submitted by participants were experimentally
compared using interleaved comparisons to the production system from the
corresponding use-case. In this paper we describe how these experiments were
performed, what the resulting outcomes are, and conclude with some lessons
learned. Keywords: Information retrieval evaluation; Living labs; Product search; Web search |
Stream-Based Recommendations: Online and Offline Evaluation as a Service | | BIBAK | Full-Text | 497-517 | |
Benjamin Kille; Andreas Lommatzsch; Roberto Turrin; András Serény; Martha Larson; Torben Brodt; Jonas Seiler; Frank Hopfgartner | |||
Providing high-quality news recommendations is a challenging task because
the set of potentially relevant news items changes continuously, the relevance
of news highly depends on the context, and there are tight time constraints for
computing recommendations. The CLEF NewsREEL challenge is a campaign-style
evaluation lab allowing participants to evaluate and optimize news recommender
algorithms online and offline. In this paper, we discuss the objectives and
challenges of the NewsREEL lab. We motivate the metrics used for benchmarking
the recommender algorithms and explain the challenge dataset. In addition, we
introduce the evaluation framework that we have developed. The framework makes
possible the reproducible evaluation of recommender algorithms for stream data,
taking into account recommender precision as well as the technical complexity
of the recommender algorithms. Keywords: Recommender systems; News; Evaluation; Living lab; Stream-based recommender |
Overview of the PAN/CLEF 2015 Evaluation Lab | | BIBA | Full-Text | 518-538 | |
Efstathios Stamatatos; Martin Potthast; Francisco Rangel; Paolo Rosso; Benno Stein | |||
This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of text mining research focusing on the identification of personal traits of authors left behind in texts unintentionally. PAN 2015 comprises three tasks: plagiarism detection, author identification and author profiling studying important variations of these problems. In plagiarism detection, community-driven corpus construction is introduced as a new way of developing evaluation resources with diversity. In author identification, cross-topic and cross-genre author verification (where the texts of known and unknown authorship do not match in topic and/or genre) is introduced. A new corpus was built for this challenging, yet realistic, task covering four languages. In author profiling, in addition to usual author demographics, such as gender and age, five personality traits are introduced (openness, conscientiousness, extraversion, agreeableness, and neuroticism) and a new corpus of Twitter messages covering four languages was developed. In total, 53 teams participated in all three tasks of PAN 2015 and, following the practice of previous editions, software submissions were required and evaluated within the TIRA experimentation framework. |
Overview of the CLEF Question Answering Track 2015 | | BIBA | Full-Text | 539-544 | |
Anselmo Penas; Christina Unger; Georgios Paliouras; Ioannis Kakadiaris | |||
This paper describes the CLEF QA Track 2015. Following the scenario stated last year for the CLEF QA Track, the starting point for accessing information is always a Natural Language question. However, answering some questions may need to query Linked Data (especially if aggregations or logical inferences are required), some questions may need textual inferences and querying free-text, and finally, answering some queries may require both sources of information. In this edition, the Track was divided into four tasks: (i) QALD: focused on translating natural language questions into SPARQL; (ii) Entrance Exams: focused on answering questions to assess machine reading capabilities; (iii) BioASQ1 focused on large-scale semantic indexing and (iv) BioASQ2 for Question Answering in the biomedical domain. |
Overview of the CLEF 2015 Social Book Search Lab | | BIBA | Full-Text | 545-564 | |
Marijn Koolen; Toine Bogers; Maria Gäde; Mark Hall; Hugo Huurdeman; Jaap Kamps; Mette Skov; Elaine Toms; David Walsh | |||
The Social Book Search (SBS) Lab investigates book search in scenarios where users search with more than just a query, and look for more than objective metadata. Real-world information needs are generally complex, yet almost all research focuses instead on either relatively simple search based on queries or recommendation based on profiles. The goal is to research and develop techniques to support users in complex book search tasks. The SBS Lab has two tracks. The aim of the Suggestion Track is to develop test collections for evaluating ranking effectiveness of book retrieval and recommender systems. The aim of the Interactive Track is to develop user interfaces that support users through each stage during complex search tasks and to investigate how users exploit professional metadata and user-generated content. |