HCI Bibliography Home | HCI Conferences | About IR | IR Conf Proceedings | Detailed Records | RefWorks | EndNote | Hide Abstracts
IR Tables of Contents: 02030405060708091011121314 ⇐ MORE

Proceedings of the 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Fullname:Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Editors:William Hersh; Jamie Callan; Yoelle Maarek; Mark Sanderson
Location:Portland, Oregon
Dates:2012-Aug-12 to 2012-Aug-16
Publisher:ACM
Standard No:ISBN: 978-1-4503-1472-5; ACM DL: Table of Contents; hcibib: IR12
Papers:224
Pages:1201
Links:Conference Website
Summary:We are delighted to welcome you to the 35th edition of SIGIR, the ACM International Conference on Research and Development in Information Retrieval. The conference continues its tradition of being the premier forum for research and development information retrieval, the computer science discipline behind what many call "search". The high number of submitted papers, this year again, demonstrates both the breadth and depth of the research being done in this vibrant field, both in academia and industry. We have done our best to ensure that these papers meet high standards of quality in terms of technical contribution, innovation, presentation, reference to previous work, and methodology. At the same time, we have tried to be flexible in the application of these criteria in order to consider papers describing novel and innovative work that may be somewhat unconventional.
    The conference received 483 full paper submissions this year. Examining the country code of the paper's contact author, we found that 185 (38%) come from the Americas; 158 (33%), Asia and Pacific region; and 140 (29%) from Europe, the Middle East and Africa. Of these, 98 (20%) were accepted, essentially the same as last year's acceptance rate and up from the 16.7% rate of the year before. There was almost no difference in the acceptance rates across the three broad regions. The top five countries in terms of accepted papers were the U.S.A. (36), China (14), the U.K. & Spain (both 7), and the Netherlands (6). In addition, 208 short papers were submitted to the poster track, of which 76 (36.5%) were accepted. In the other categories, there were 17 (47.2%) demonstrations, 4 workshops, and 16 tutorials accepted. The top five technical areas (as inferred from the primary keyword assigned by the authors) covered by the accepted papers, were queries and query analysis (18%), retrieval models and ranking (14%), web IR & social media search (13%), document representation and content analysis (11%), and users and interactive IR (9%). This was a small re-ordering of the topics from last year.
    SIGIR this year again used a two-tier double-blind reviewing approach. In a first stage, at least three reviewers read every paper and provided ratings and comments. Then, in a second stage, the primary and secondary Area Chairs ensured the quality of the reviewing process by studying, validating, and summarizing these reviews, and adding their own feedback and ratings. When required, Area Chairs initiated a discussion among the reviewers to resolve any controversial issues or significant differences of opinion. Once the discussion stage was completed, the two Area Chairs made the final decisions for nearly all submitted papers. At the program committee meeting held in Haifa, Israel, the Program Chairs and the attending Area Chairs went over the reviews, verified the process, gathered additional input, and made decisions in the few cases for which assistance had been requested.
  1. Keynote address
  2. Query suggestion
  3. Multimedia 1
  4. Diversity 1
  5. Evaluation 1
  6. Structured data
  7. Recommender systems 1
  8. Users 1: personalization and user modeling
  9. Architectures 1
  10. Search log analysis
  11. User intent
  12. Efficiency
  13. Spam and abuse
  14. Users 2: exploratory search
  15. Multimedia 2
  16. Recommender systems 2
  17. Query expansion and reformulation
  18. Social media 1
  19. Query completion and correction
  20. Architectures 2
  21. Recommender systems 3
  22. Multimedia 3
  23. Entities
  24. Learning to rank
  25. Community QA
  26. Federated search
  27. Diversity 2
  28. Evaluation 2
  29. Representation
  30. Classification
  31. Doctoral submissions
  32. Demonstrations
  33. Industry talk abstracts
  34. Poster abstracts
  35. Tutorial presentations

Keynote address

Salton award lecture: information retrieval as engineering science BIBFull-Text 1-2
  Norbert Fuhr
Retrieving information from the book of humanity: the personalized medicine data tsunami crashes on the beach of jeopardy BIBAFull-Text 3-4
  Daniel R. Masys
From a mute but eloquent alphabet of 4 characters emerges a complex biological 'literature' whose highest expression is human existence. The rapidly advancing technologies of 'nextgen sequencing' will soon make it possible to inexpensively acquire and store the characters of our complete personal genetic instruction set and make it available for health assessment and disease management. This uniquely personal form of 'big data' brings with it challenges that will be discussed in this keynote presentation. Topics will include a brief introduction to the linguistic challenges of 'biology as literature', the impact of personal molecular variation on traditional approaches to disease prevention, diagnosis and treatment, and the challenges of information retrieval when a large volume of primary observations is made that is associated with an evanescent and rapidly changing corpus of scientific interpretation of those primary observations. Experience with extracting high quality pheonotypes from electronic medical records has shown that Natural Language Processing capability is an essential information extraction function for correlation of clinical events with personal genetic variation. Any powerful set of information can be used or misused, and put those who depend upon it in jeopardy. These issues, and a lesson from the long running Jeopardy TV series, will be discussed.

Query suggestion

Adaptation of the concept hierarchy model with search logs for query recommendation on intranets BIBAFull-Text 5-14
  Ibrahim Adepoju Adeyanju; Dawei Song; M-Dyaa Albakour; Udo Kruschwitz; Anne De Roeck; Maria Fasli
A concept hierarchy created from a document collection can be used for query recommendation on Intranets by ranking terms according to the strength of their links to the query within the hierarchy. A major limitation is that this model produces the same recommendations for identical queries and rebuilding it from scratch periodically can be extremely inefficient due to the high computational costs. We propose to adapt the model by incorporating query refinements from search logs. Our intuition is that the concept hierarchy built from the collection and the search logs provide complementary conceptual views on the same search domain, and their integration should continually improve the effectiveness of recommended terms. Two adaptation approaches using query logs with and without click information are compared. We evaluate the concept hierarchy models (static and adapted versions) built from the Intranet collections of two academic institutions and compare them with a state-of-the-art log-based query recommender, the Query Flow Graph, built from the same logs. Our adaptive model significantly outperforms its static version and the query flow graph when tested over a period of time on data (documents and search logs) from two institutions' Intranets.
Adaptive query suggestion for difficult queries BIBAFull-Text 15-24
  Yang Liu; Ruihua Song; Yu Chen; Jian-Yun Nie; Ji-Rong Wen
Query suggestion is a useful tool to help users formulate better queries. Although this has been found highly useful globally, its effect on different queries may vary. In this paper, we examine the impact of query suggestion on queries of different degrees of difficulty. It turns out that query suggestion is much more useful for difficult queries than easy queries. In addition, the suggestions for difficult queries should rely less on their similarity to the original query. In this paper, we use a learning-to-rank approach to select query suggestions, based on several types of features including a query performance prediction. As query suggestion has different impacts on different queries, we propose an adaptive suggestion approach that makes suggestions only for difficult queries. We carry out experiments on real data from a search engine. Our results clearly indicate that an approach targeting difficult queries can bring higher gain than a uniform suggestion approach.
Learning to suggest: a machine learning framework for ranking query suggestions BIBAFull-Text 25-34
  Umut Ozertem; Olivier Chapelle; Pinar Donmez; Emre Velipasaoglu
We consider the task of suggesting related queries to users after they issue their initial query to a web search engine. We propose a machine learning approach to learn the probability that a user may find a follow-up query both useful and relevant, given his initial query. Our approach is based on a machine learning model which enables us to generalize to queries that have never occurred in the logs as well. The model is trained on co-occurrences mined from the search logs, with novel utility and relevance models, and the machine learning step is done without any labeled data by human judges. The learning step allows us to generalize from the past observations and generate query suggestions that are beyond the past co-occurred queries. This brings significant gains in coverage while yielding modest gains in relevance. Both offline (based on human judges) and online (based on millions of user interactions) evaluations demonstrate that our approach significantly outperforms strong baselines.

Multimedia 1

Privacy-aware image classification and search BIBAFull-Text 35-44
  Sergej Zerr; Stefan Siersdorfer; Jonathon Hare; Elena Demidova
Modern content sharing environments such as Flickr or YouTube contain a large amount of private resources such as photos showing weddings, family holidays, and private parties. These resources can be of a highly sensitive nature, disclosing many details of the users' private sphere. In order to support users in making privacy decisions in the context of image sharing and to provide them with a better overview on privacy related visual content available on the Web, we propose techniques to automatically detect private images, and to enable privacy-oriented image search. To this end, we learn privacy classifiers trained on a large set of manually assessed Flickr photos, combining textual metadata of images with a variety of visual features. We employ the resulting classification models for specifically searching for private photos, and for diversifying query results to provide users with a better coverage of private and public content. Large-scale classification experiments reveal insights into the predictive performance of different visual and textual features, and a user evaluation of query result rankings demonstrates the viability of our approach.
Manhattan hashing for large-scale image retrieval BIBAFull-Text 45-54
  Weihao Kong; Wu-Jun Li; Minyi Guo
Hashing is used to learn binary-code representation for data with expectation of preserving the neighborhood structure in the original feature space. Due to its fast query speed and reduced storage cost, hashing has been widely used for efficient nearest neighbor search in a large variety of applications like text and image retrieval. Most existing hashing methods adopt Hamming distance to measure the similarity (neighborhood) between points in the hashcode space. However, one problem with Hamming distance is that it may destroy the neighborhood structure in the original feature space, which violates the essential goal of hashing. In this paper, Manhattan hashing (MH), which is based on Manhattan distance, is proposed to solve the problem of Hamming distance based hashing. The basic idea of MH is to encode each projected dimension with multiple bits of natural binary code (NBC), based on which the Manhattan distance between points in the hashcode space is calculated for nearest neighbor search. MH can effectively preserve the neighborhood structure in the data to achieve the goal of hashing. To the best of our knowledge, this is the first work to adopt Manhattan distance with NBC for hashing. Experiments on several large-scale image data sets containing up to one million points show that our MH method can significantly outperform other state-of-the-art methods.
Boosting multi-kernel locality-sensitive hashing for scalable image retrieval BIBAFull-Text 55-64
  Hao Xia; Pengcheng Wu; Steven C. H. Hoi; Rong Jin
Similarity search is a key challenge for multimedia retrieval applications where data are usually represented in high-dimensional space. Among various algorithms proposed for similarity search in high-dimensional space, Locality-Sensitive Hashing (LSH) is the most popular one, which recently has been extended to Kernelized Locality-Sensitive Hashing (KLSH) by exploiting kernel similarity for better retrieval efficacy. Typically, KLSH works only with a single kernel, which is often limited in real-world multimedia applications, where data may originate from multiple resources or can be represented in several different forms. For example, in content-based multimedia retrieval, a variety of features can be extracted to represent contents of an image. To overcome the limitation of regular KLSH, we propose a novel Boosting Multi-Kernel Locality-Sensitive Hashing (BMKLSH) scheme that significantly boosts the retrieval performance of KLSH by making use of multiple kernels. We conduct extensive experiments for large-scale content-based image retrieval, in which encouraging results show that the proposed method outperforms the state-of-the-art techniques.

Diversity 1

Diversity by proportionality: an election-based approach to search result diversification BIBAFull-Text 65-74
  Van Dang; W. Bruce Croft
This paper presents a different perspective on diversity in search results: diversity by proportionality. We consider a result list most diverse, with respect to some set of topics related to the query, when the number of documents it provides on each topic is proportional to the topic's popularity. Consequently, we propose a framework for optimizing proportionality for search result diversification, which is motivated by the problem of assigning seats to members of competing political parties. Our technique iteratively determines, for each position in the result ranked list, the topic that best maintains the overall proportionality. It then selects the best document on this topic for this position. We demonstrate empirically that our method significantly outperforms the top performing approach in the literature not only on our proposed metric for proportionality, but also on several standard diversity measures. This result indicates that promoting proportionality naturally leads to minimal redundancy, which is a goal of the current diversity approaches.
Explicit relevance models in intent-oriented information retrieval diversification BIBAFull-Text 75-84
  Saúl Vargas; Pablo Castells; David Vallet
The intent-oriented search diversification methods developed in the field so far tend to build on generative views of the retrieval system to be diversified. Core algorithm components in particular redundancy assessment are expressed in terms of the probability to observe documents, rather than the probability that the documents be relevant. This has been sometimes described as a view considering the selection of a single document in the underlying task model. In this paper we propose an alternative formulation of aspect-based diversification algorithms which explicitly includes a formal relevance model. We develop means for the effective computation of the new formulation, and we test the resulting algorithm empirically. We report experiments on search and recommendation tasks showing competitive or better performance than the original diversification algorithms. The relevance-based formulation has further interesting properties, such as unifying two well-known state of the art algorithms into a single version. The relevance-based approach opens alternative possibilities for further formal connections and developments as natural extensions of the framework. We illustrate this by modeling tolerance to redundancy as an explicit configurable parameter, which can be set to better suit the characteristics of the IR task, or the evaluation metrics, as we illustrate empirically.
AspecTiles: tile-based visualization of diversified web search results BIBAFull-Text 85-94
  Mayu Iwata; Tetsuya Sakai; Takehiro Yamamoto; Yu Chen; Yi Liu; Ji-Rong Wen; Shojiro Nishio
A diversified search result for an underspecified query generally contains web pages in which there are answers that are relevant to different aspects of the query. In order to help the user locate such relevant answers, we propose a simple extension to the standard Search Engine Result Page (SERP) interface, called AspecTiles. In addition to presenting a ranked list of URLs with their titles and snippets, AspecTiles visualizes the relevance degree of a document to each aspect by means of colored squares ("tiles"). To compare AspecTiles with the standard SERP interface in terms of usefulness, we conducted a user study involving 30 search tasks designed based on the TREC web diversity task topics as well as 32 participants. Our results show that AspecTiles has some advantages in terms of search performance, user behavior, and user satisfaction. First, AspecTiles enables the user to gather relevant information significantly more efficiently than the standard SERP interface for tasks where the user considers several different aspects of the query to be important at the same time (multi-aspect tasks). Second, AspecTiles affects the user's information seeking behavior: with this interface, we observed significantly fewer query reformulations, shorter queries and deeper examinations of ranked lists in multi-aspect tasks. Third, participants of our user study found AspecTiles significantly more useful for finding relevant information and easy to use than the standard SERP interface. These results suggest that simple interfaces like AspecTiles can enhance the search performance and search experience of the user when their queries are underspecified.

Evaluation 1

Time-based calibration of effectiveness measures BIBAFull-Text 95-104
  Mark D. Smucker; Charles L. A. Clarke
Many current effectiveness measures incorporate simplifying assumptions about user behavior. These assumptions prevent the measures from reflecting aspects of the search process that directly impact the quality of retrieval results as experienced by the user. In particular, these measures implicitly model users as working down a list of retrieval results, spending equal time assessing each document. In reality, even a careful user, intending to identify as much relevant material as possible, must spend longer on some documents than on others. Aspects such as document length, duplicates and summaries all influence the time required. In this paper, we introduce a time-biased gain measure, which explicitly accommodates such aspects of the search process. By conducting an appropriate user study, we calibrate and validate the measure against the TREC 2005 Robust Track test collection. We examine properties of the measure, contrasting it to traditional effectiveness measures, and exploring its extension to other aspects and environments. As its primary benefit, the measure allows us to evaluate system performance in human terms, while maintaining the simplicity and repeatability of system-oriented tests. Overall, we aim to achieve a clearer connection between user-oriented studies and system-oriented tests, allowing us to better transfer insights and outcomes from one to the other.
Time drives interaction: simulating sessions in diverse searching environments BIBAFull-Text 105-114
  Feza Baskaya; Heikki Keskustalo; Kalervo Järvelin
Real life information retrieval takes place in sessions, where users search by iterating between various cognitive, perceptual and motor subtasks through an interactive interface. The sessions may follow diverse strategies, which, together with the interface characteristics, affect user effort (cost), experience and session effectiveness. In this paper we propose a pragmatic evaluation approach based on scenarios with explicit subtask costs. We study the limits of effectiveness of diverse interactive searching strategies in two searching environments (the scenarios) under overall cost constraints. This is based on a comprehensive simulation of 20 million sessions in each scenario. We analyze the effectiveness of the session strategies over time, and the properties of the most and the least effective sessions in each case. Furthermore, we will also contrast the proposed evaluation approach with the traditional one, rank based evaluation, and show how the latter may hide essential factors that affect users' performance and satisfaction -- and gives even counter-intuitive results.
Evaluating aggregated search pages BIBAFull-Text 115-124
  Ke Zhou; Ronan Cummins; Mounia Lalmas; Joemon M. Jose
Aggregating search results from a variety of heterogeneous sources or verticals such as news, image and video into a single interface is a popular paradigm in web search. Although various approaches exist for selecting relevant verticals or optimising the aggregated search result page, evaluating the quality of an aggregated page is an open question.
   This paper proposes a general framework for evaluating the quality of aggregated search pages. We evaluate our approach by collecting annotated user preferences over a set of aggregated search pages for 56 topics and 12 verticals. We empirically demonstrate the fidelity of metrics instantiated from our proposed framework by showing that they strongly agree with the annotated user preferences of pairs of simulated aggregated pages.
   Furthermore, we show that our metrics agree with the majority preference more often than current diversity-based information retrieval metrics. Finally, we demonstrate the flexibility of our framework by showing that personalised historical preference data can be used to improve the performance of our proposed metrics.

Structured data

Combining inverted indices and structured search for ad-hoc object retrieval BIBAFull-Text 125-134
  Alberto Tonon; Gianluca Demartini; Philippe Cudré-Mauroux
Retrieving semi-structured entities to answer keyword queries is an increasingly important feature of many modern Web applications. The fast-growing Linked Open Data (LOD) movement makes it possible to crawl and index very large amounts of structured data describing hundreds of millions of entities. However, entity retrieval approaches have yet to find efficient and effective ways of ranking and navigating through those large data sets. In this paper, we address the problem of Ad-hoc Object Retrieval over large-scale LOD data by proposing a hybrid approach that combines IR and structured search techniques. Specifically, we propose an architecture that exploits an inverted index to answer keyword queries as well as a semi-structured database to improve the search effectiveness by automatically generating queries over the LOD graph. Experimental results show that our ranking algorithms exploiting both IR and graph indices outperform state-of-the-art entity retrieval techniques by up to 25% over the BM25 baseline.
Retrieving similar discussion forum threads: a structure based approach BIBAFull-Text 135-144
  Amit Singh; Deepak P; Dinesh Raghu
Online forums are becoming a popular way of finding useful information on the web. Search over forums for existing discussion threads so far is limited to keyword-based search due to the minimal effort required on part of the users. However, it is often not possible to capture all the relevant context in a complex query using a small number of keywords. Example-based search that retrieves similar discussion threads given one exemplary thread is an alternate approach that can help the user provide richer context and vastly improve forum search results. In this paper, we address the problem of finding similar threads to a given thread. Towards this, we propose a novel methodology to estimate similarity between discussion threads. Our method exploits the thread structure to decompose threads in to set of weighted overlapping components. It then estimates pairwise thread similarities by quantifying how well the information in the threads are mutually contained within each other using lexical similarities between their underlying components. We compare our proposed methods on real datasets against state-of-the-art thread retrieval mechanisms wherein we illustrate that our techniques outperform others by large margins on popular retrieval evaluation measures such as NDCG, MAP, Precision@k and MRR. In particular, consistent improvements of up to 10% are observed on all evaluation measures.
Summarizing highly structured documents for effective search interaction BIBAFull-Text 145-154
  Lanbo Zhang; Yi Zhang; Yunfei Chen
As highly structured documents with rich metadata (such as products, movies, etc.) become increasingly prevalent, searching those documents has become an important IR problem. Unfortunately existing work on document summarization, especially in the context of search, has been mainly focused on unstructured documents, and little attention has been paid to highly structured documents. Due to the different characteristics of structured and unstructured documents, the ideal approaches for document summarization might be different. In this paper, we study the problem of summarizing highly structured documents in a search context. We propose a new summarization approach based on query-specific facet selection. Our approach aims to discover the important facets hidden behind a query using a machine learning approach, and summarizes retrieved documents based on those important facets. In addition, we propose to evaluate summarization approaches based on a utility function that measures how well the summaries assist users in interacting with the search results. Furthermore, we develop a game on Mechanical Turk to evaluate different summarization approaches. The experimental results show that the new summarization approach significantly outperforms two existing ones.

Recommender systems 1

TFMAP: optimizing MAP for top-n context-aware recommendation BIBAFull-Text 155-164
  Yue Shi; Alexandros Karatzoglou; Linas Baltrunas; Martha Larson; Alan Hanjalic; Nuria Oliver
In this paper, we tackle the problem of top-N context-aware recommendation for implicit feedback scenarios. We frame this challenge as a ranking problem in collaborative filtering (CF). Much of the past work on CF has not focused on evaluation metrics that lead to good top-N recommendation lists in designing recommendation models. In addition, previous work on context-aware recommendation has mainly focused on explicit feedback data, i.e., ratings. We propose TFMAP, a model that directly maximizes Mean Average Precision with the aim of creating an optimally ranked list of items for individual users under a given context. TFMAP uses tensor factorization to model implicit feedback data (e.g., purchases, clicks) with contextual information.
   The optimization of MAP in a large data collection is computationally too complex to be tractable in practice. To address this computational bottleneck, we present a fast learning algorithm that exploits several intrinsic properties of average precision to improve the learning efficiency of TFMAP, and to ensure its scalability. We experimentally verify the effectiveness of the proposed fast learning algorithm, and demonstrate that TFMAP significantly outperforms state-of-the-art recommendation approaches.
Increasing temporal diversity with purchase intervals BIBAFull-Text 165-174
  Gang Zhao; Mong Li Lee; Wynne Hsu; Wei Chen
The development of Web 2.0 technology has led to huge economic benefits and challenges for both e-commerce websites and online shoppers. One core technology to increase sales and consumers' satisfaction is the use of recommender systems. Existing product recommender systems consider the order of items purchased by users to obtain a list of recommended items. However, they do not consider the time interval between the products purchased. For example, there is often an interval of 2-3 months between the purchase of printer ink cartridges or refills. Thus, recommending appropriate ink cartridges one week before the user needs to replace the depleted ink cartridges would increase the likelihood of a purchase decision. In this paper, we propose to utilize the purchase interval information to improve the performance of the recommender systems for e-commerce. We design an efficient algorithm to compute the purchase intervals between product pairs from users' purchase history and integrate this information into the marginal utility model. We evaluate our approach on a real world ecommerce dataset. Experimental results demonstrate that our approach significantly improves the conversion rate and temporal diversity compared to state-of-the-art algorithms.
Adaptive diversification of recommendation results via latent factor portfolio BIBAFull-Text 175-184
  Yue Shi; Xiaoxue Zhao; Jun Wang; Martha Larson; Alan Hanjalic
This paper studies result diversification in collaborative filtering. We argue that the diversification level in a recommendation list should be adapted to the target users' individual situations and needs. Different users may have different ranges of interests -- the preference of a highly focused user might include only few topics, whereas that of the user with broad interests may encompass a wide range of topics. Thus, the recommended items should be diversified according to the interest range of the target user. Such an adaptation is also required due to the fact that the uncertainty of the estimated user preference model may vary significantly between users. To reduce the risk of the recommendation, we should take the difference of the uncertainty into account as well.
   In this paper, we study the adaptive diversification problem theoretically. We start with commonly used latent factor models and reformulate them using the mean-variance analysis from the portfolio theory in text retrieval. The resulting Latent Factor Portfolio (LFP) model captures the user's interest range and the uncertainty of the user preference by employing the variance of the learned user latent factors. It is shown that the correlations between items (and thus the item diversity) can be obtained by using the correlations between latent factors (topical diversity), which in return significantly reduce the computation load. Our mathematical derivation also reveals that diversification is necessary, not only for risk-averse system behavior (non-adaptive), but also for the target users' individual situations (adaptive), which are represented by the distribution and the variance of the latent user factors. Our experiments confirm the theoretical insights and show that LFP succeeds in improving latent factor models by adaptively introducing recommendation diversity to fit the individual user's needs.

Users 1: personalization and user modeling

Modeling the impact of short- and long-term behavior on search personalization BIBAFull-Text 185-194
  Paul N. Bennett; Ryen W. White; Wei Chu; Susan T. Dumais; Peter Bailey; Fedor Borisyuk; Xiaoyuan Cui
User behavior provides many cues to improve the relevance of search results through personalization. One aspect of user behavior that provides especially strong signals for delivering better relevance is an individual's history of queries and clicked documents. Previous studies have explored how short-term behavior or long-term behavior can be predictive of relevance. Ours is the first study to assess how short-term (session) behavior and long-term (historic) behavior interact, and how each may be used in isolation or in combination to optimally contribute to gains in relevance through search personalization. Our key findings include: historic behavior provides substantial benefits at the start of a search session; short-term session behavior contributes the majority of gains in an extended search session; and the combination of session and historic behavior out-performs using either alone. We also characterize how the relative contribution of each model changes throughout the duration of a session. Our findings have implications for the design of search systems that leverage user behavior to personalize the search experience.
Improving searcher models using mouse cursor activity BIBAFull-Text 195-204
  Jeff Huang; Ryen W. White; Georg Buscher; Kuansan Wang
Web search components such as ranking and query suggestions analyze the user data provided in query and click logs. While this data is easy to collect and provides information about user behavior, it omits user interactions with the search engine that do not hit the server; these logs omit search data such as users' cursor movements. Just as clicks provide signals for relevance in search results, cursor hovering and scrolling can be additional implicit signals. In this work, we demonstrate a technique to extend models of the user's search result examination state to infer document relevance. We start by exploring recorded user interactions with the search results, both qualitatively and quantitatively. We find that cursor hovering and scrolling are signals telling us which search results were examined, and we use these interactions to reveal latent variables in searcher models to more accurately compute document attractiveness and satisfaction. Accuracy is evaluated by computing how well our model using these parameters can predict future clicks for a particular query. We are able to improve the click predictions compared to a basic searcher model for higher ranked search results using the additional log data.
Personalization of search results using interaction behaviors in search sessions BIBAFull-Text 205-214
  Chang Liu; Nicholas J. Belkin; Michael J. Cole
Personalization of search results offers the potential for significant improvement in information retrieval performance. User interactions with the system and documents during information-seeking sessions provide a wealth of information about user preferences and their task goals. In this paper, we propose methods for analyzing and modeling user search behavior in search sessions to predict document usefulness and then using information to personalize search results. We generate prediction models of document usefulness from behavior data collected in a controlled lab experiment with 32 participants, each completing uncontrolled searching for 4 tasks in the Web. The generated models are then tested with another data set of user search sessions in radically different search tasks and constrains. The documents predicted useful and not useful by the models are used to modify the queries in each search session using a standard relevance feedback technique. The results show that application of the models led to consistently improved performance over a baseline that did not take account of user interaction information. These findings have implications for designing systems for personalized search and improving user search experience.
User evaluation of query quality BIBAFull-Text 215-224
  Wan-Ching Wu; Diane Kelly; Kun Huang
Although a great deal of research has been conducted about automatic techniques for determining query quality, there have been relatively few studies about how people judge query quality. This study investigated this topic through a laboratory experiment with 40 subjects. Subjects were shown eight information problems (five fact-finding and three exploratory) and asked to evaluate queries for these problems according to several quality attributes. Subjects then evaluated search engine results pages (SERPs) for each query, which were manipulated to exhibit different levels of performance. Following this, subjects reevaluated the queries, were interviewed about their evaluation approaches and repeated the rating procedure for two information problems. Results showed that for fact-finding information problems, longer queries received higher ratings (both initial and post-SERP), and that post-SERP query ratings were more affected by the proportion of relevant documents viewed to all documents viewed rather than the ranks of the relevant documents. For exploratory information problems, subjects' ratings were highly correlated with the number of relevant documents in the SERP as well as the proportion of relevant documents viewed. Subjects adopted several approaches when evaluating query quality, which led to different quality ratings. Finally, during the reliability check subjects' initial evaluations were fairly stable, but their post-SERP evaluations significantly increased.

Architectures 1

Efficient in-memory top-k document retrieval BIBAFull-Text 225-234
  J. Shane Culpepper; Matthias Petri; Falk Scholer
For over forty years the dominant data structure for ranked document retrieval has been the inverted index. Inverted indexes are effective for a variety of document retrieval tasks, and particularly efficient for large data collection scenarios that require disk access and storage. However, many efficiency-bound search tasks can now easily be supported entirely in memory as a result of recent hardware advances. In this paper we present a hybrid algorithmic framework for in-memory bag of-words ranked document retrieval using a self-index derived from the FM-Index, wavelet tree, and the compressed suffix tree data structures, and evaluate the various algorithmic trade-offs for performing efficient queries entirely in-memory. We compare our approach with two classic approaches to bag-of-words queries using inverted indexes, term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing. We show that our framework is competitive with state-of-the-art indexing structures, and describe new capabilities provided by our algorithms that can be leveraged by future systems to improve effectiveness and efficiency for a variety of fundamental search operations.
Index maintenance for time-travel text search BIBAFull-Text 235-244
  Avishek Anand; Srikanta Bedathur; Klaus Berberich; Ralf Schenkel
Time-travel text search enriches standard text search by temporal predicates, so that users of web archives can easily retrieve document versions that are considered relevant to a given keyword query and existed during a given time interval. Different index structures have been proposed to efficiently support time-travel text search. None of them, however, can easily be updated as the Web evolves and new document versions are added to the web archive.
   In this work, we describe a novel index structure that efficiently supports time-travel text search and can be maintained incrementally as new document versions are added to the web archive. Our solution uses a sharded index organization, bounds the number of spuriously read index entries per shard, and can be maintained using small in-memory buffers and append-only operations. We present experiments on two large-scale real-world datasets demonstrating that maintaining our novel index structure is an order of magnitude more efficient than periodically rebuilding one of the existing index structures, while query-processing performance is not adversely affected.
Optimizing positional index structures for versioned document collections BIBAFull-Text 245-254
  Jinru He; Torsten Suel
Versioned document collections are collections that contain multiple versions of each document. Important examples are Web archives, Wikipedia and other wikis, or source code and documents maintained in revision control systems. Versioned document collections can become very large, due to the need to retain past versions, but there is also a lot of redundancy between versions that can be exploited. Thus, versioned document collections are usually stored using special differential (delta) compression techniques, and a number of researchers have recently studied how to exploit this redundancy to obtain more succinct full-text index structures.
   In this paper, we study index organization and compression techniques for such versioned full-text index structures. In particular, we focus on the case of positional index structures, while most previous work has focused on the non-positional case. Building on earlier work in [zs:redun], we propose a framework for indexing and querying in versioned document collections that integrates non-positional and positional indexes to enable fast top-k query processing. Within this framework, we define and study the problem of minimizing positional index size through optimal substring partitioning. Experiments on Wikipedia and web archive data show that our techniques achieve significant reductions in index size over previous work while supporting very fast query processing.
To index or not to index: time-space trade-offs in search engines with positional ranking functions BIBAFull-Text 255-264
  Diego Arroyuelo; Senén González; Mauricio Marin; Mauricio Oyarzún; Torsten Suel
Positional ranking functions, widely used in Web search engines, improve result quality by exploiting the positions of the query terms within documents. However, it is well known that positional indexes demand large amounts of extra space, typically about three times the space of a basic nonpositional index. Textual data, on the other hand, is needed to produce text snippets. In this paper, we study time-space trade-offs for search engines with positional ranking functions and text snippet generation. We consider both index-based and non-index based alternatives for positional data. We aim to answer the question of whether one should index positional data or not. We show that there is a wide range of practical time-space trade-offs. Moreover, we show that both position and textual data can be stored using about 71% of the space used by traditional positional indexes, with a minor increase in query time. This yields considerable space savings and outperforms, both in space and time, recent alternatives from the literature. We also propose several efficient compressed text representations for snippet generation, which are able to use about half of the space of current state-of-the-art alternatives with little impact in query processing time.

Search log analysis

Studies of the onset and persistence of medical concerns in search logs BIBAFull-Text 265-274
  Ryen W. White; Eric Horvitz
The Web provides a wealth of information about medical symptoms and disorders. Although this content is often valuable to consumers, studies have found that interaction with Web content may heighten anxiety and stimulate healthcare utilization. We present a longitudinal log-based study of medical search and browsing behavior on the Web. We characterize how users focus on particular medical concerns and how concerns persist and influence future behavior, including changes in focus of attention in searching and browsing for health information. We build and evaluate models that predict transitions from searches on symptoms to searches on health conditions, and escalations from symptoms to serious illnesses. We study the influence that the prior onset of concerns may have on future behavior, including sudden shifts back to searching on the concern amidst other searches. Our findings have implications for refining Web search and retrieval to support people pursuing diagnostic information.
A semi-supervised approach to modeling web search satisfaction BIBAFull-Text 275-284
  Ahmed Hassan
Web search is an interactive process that involves actions from Web search users and responses from the search engine. Many research efforts have been made to address the problem of understanding search behavior in general. Some of this work focused on predicting whether a particular user has succeeded in achieving her search goal or not. Most of these studies have faced the problem of the lack of reliable labeled data to learn from. Unlike labeled data, unlabeled data recording behavioral signals in Web search is widely available in search logs. In this work, we study the plausibility of using labeled and unlabeled data to learn better models of user behavior that can be used to predict search success more effectively. We present a semi-supervised approach to modeling Web search satisfaction. The proposed approach can use either labeled data only or both labeled and unlabeled data. We show that the proposed model outperforms previous methods for modeling search success using labeled data. We also show that adding unlabeled data improves the effectiveness of the proposed models and that the proposed method outperforms other strong semi-supervised baselines.
Social annotations: utility and prediction modeling BIBAFull-Text 285-294
  Patrick Pantel; Michael Gamon; Omar Alonso; Kevin Haas
Social features are increasingly integrated within the search results page of the main commercial search engines. There is, however, little understanding of the utility of social features in traditional search. In this paper, we study utility in the context of social annotations, which are markings indicating that a person in the social network of the user has liked or shared a result document. We introduce a taxonomy of social relevance aspects that influence the utility of social annotations in search, spanning query classes, the social network, and content relevance. We present the results of a user study quantifying the utility of social annotations and the interplay between social relevance aspects. Through the user study we gain insights on conditions under which social annotations are most useful to a user. Finally, we present machine learned models for predicting the utility of a social annotation using the user study judgments as an optimization criterion. We model the learning task with features drawn from web usage logs, and show empirical evidence over real-world head and tail queries that the problem is learnable and that in many cases we can predict the utility of a social annotation.
An exploration of ranking heuristics in mobile local search BIBAFull-Text 295-304
  Yuanhua Lv; Dimitrios Lymberopoulos; Qiang Wu
Users increasingly rely on their mobile devices to search local entities, typically businesses, while on the go. Even though recent work has recognized that the ranking signals in mobile local search (e.g., distance and customer rating score of a business) are quite different from general Web search, they have mostly treated these signals as a black-box to extract very basic features (e.g., raw distance values and rating scores) without going inside the signals to understand how exactly they affect the relevance of a business. However, as it has been demonstrated in the development of general information retrieval models, it is critical to explore the underlying behaviors/heuristics of a ranking signal to design more effective ranking features.
   In this paper, we follow a data-driven methodology to study the behavior of these ranking signals in mobile local search using a large-scale query log. Our analysis reveals interesting heuristics that can be used to guide the exploitation of different signals. For example, users often take the mean value of a signal (e.g., rating) from the business result list as a "pivot" score, and tend to demonstrate different click behaviors on businesses with lower and higher signal values than the pivot; the clickrate of a business generally is sublinearly decreasing with its distance to the user, etc. Inspired by the understanding of these heuristics, we further propose different transformation methods to generate more effective ranking features. We quantify the improvement of the proposed new features using real mobile local search logs over a period of 14 months and show that the mean average precision can be improved by over 7%.

User intent

Mining query subtopics from search log data BIBAFull-Text 305-314
  Yunhua Hu; Yanan Qian; Hang Li; Daxin Jiang; Jian Pei; Qinghua Zheng
Most queries in web search are ambiguous and multifaceted. Identifying the major senses and facets of queries from search log data, referred to as query subtopic mining in this paper, is a very important issue in web search. Through search log analysis, we show that there are two interesting phenomena of user behavior that can be leveraged to identify query subtopics, referred to as 'one subtopic per search' and 'subtopic clarification by keyword'. One subtopic per search means that if a user clicks multiple URLs in one query, then the clicked URLs tend to represent the same sense or facet. Subtopic clarification by keyword means that users often add an additional keyword or keywords to expand the query in order to clarify their search intent. Thus, the keywords tend to be indicative of the sense or facet. We propose a clustering algorithm that can effectively leverage the two phenomena to automatically mine the major subtopics of queries, where each subtopic is represented by a cluster containing a number of URLs and keywords. The mined subtopics of queries can be used in multiple tasks in web search and we evaluate them in aspects of the search result presentation such as clustering and re-ranking. We demonstrate that our clustering algorithm can effectively mine query subtopics with an F1 measure in the range of 0.896-0.956. Our experimental results show that the use of the subtopics mined by our approach can significantly improve the state-of-the-art methods used for search result clustering. Experimental results based on click data also show that the re-ranking of search result based on our method can significantly improve the efficiency of users' ability to find information.
Search, interrupted: understanding and predicting search task continuation BIBAFull-Text 315-324
  Eugene Agichtein; Ryen W. White; Susan T. Dumais; Paul N. Bennet
Many important search tasks require multiple search sessions to complete. Tasks such as travel planning, large purchases, or job searches can span hours, days, or even weeks. Inevitably, life interferes, requiring the searcher either to recover the "state" of the search manually (most common), or plan for interruption in advance (unlikely). The goal of this work is to better understand, characterize, and automatically detect search tasks that will be continued in the near future. To this end, we analyze a query log from the Bing Web search engine to identify the types of intents, topics, and search behavior patterns associated with long-running tasks that are likely to be continued. Using our insights, we develop an effective prediction algorithm that significantly outperforms both the previous state-of-the-art method, and even the ability of human judges, to predict future task continuation. Potential applications of our techniques would allow a search engine to preemptively "save state" for a searcher (e.g., by caching search results), perform more targeted personalization, and otherwise better support the searcher experience for interrupted search tasks.
Multi-aspect query summarization by composite query BIBAFull-Text 325-334
  Wei Song; Qing Yu; Zhiheng Xu; Ting Liu; Sheng Li; Ji-Rong Wen
Conventional search engines usually return a ranked list of web pages in response to a query. Users have to visit several pages to locate the relevant parts. A promising future search scenario should involve: (1) understanding user intents; (2) providing relevant information directly to satisfy searchers' needs, as opposed to relevant pages. In this paper, we present a search paradigm to summarize a query's information from different aspects. Query aspects could be aligned to user intents. The generated summaries for query aspects are expected to be both specific and informative, so that users can easily and quickly find relevant information. Specifically, we use a Composite Query for Summarization" method, where a set of component queries are used for providing additional information for the original query. The system leverages the search engine to proactively gather information by submitting multiple component queries according to the original query and its aspects. In this way, we could get more relevant information for each query aspect and roughly classify information. By comparative mining the search results of different component queries, it is able to identify query (dependent) aspect words, which help to generate more specific and informative summaries. The experimental results on two data sets, Wikipedia and TREC ClueWeb2009, are encouraging. Our method outperforms two baseline methods on generating informative summaries.
Language intent models for inferring user browsing behavior BIBAFull-Text 335-344
  Manos Tsagkias; Roi Blanco
Modeling user browsing behavior is an active research area with tangible real-world applications, e.g., organizations can adapt their online presence to their visitors browsing behavior with positive effects in user engagement, and revenue. We concentrate on online news agents, and present a semi-supervised method for predicting news articles that a user will visit after reading an initial article. Our method tackles the problem using language intent models trained on historical data which can cope with unseen articles. We evaluate our method on a large set of articles and in several experimental settings. Our results demonstrate the utility of language intent models for predicting user browsing behavior within online news sites.

Efficiency

Efficient query recommendations in the long tail via center-piece subgraphs BIBAFull-Text 345-354
  Francesco Bonchi; Raffaele Perego; Fabrizio Silvestri; Hossein Vahabi; Rossano Venturini
We present a recommendation method based on the well-known concept of center-piece subgraph, that allows for the time/space efficient generation of suggestions also for rare, i.e., long-tail queries. Our method is scalable with respect to both the size of datasets from which the model is computed and the heavy workloads that current web search engines have to deal with. Basically, we relate terms contained into queries with highly correlated queries in a query-flow graph. This enables a novel recommendation generation method able to produce recommendations for approximately 99% of the workload of a real-world search engine. The method is based on a graph having term nodes, query nodes, and two kinds of connections: term-query and query-query. The first connects a term to the queries in which it is contained, the second connects two query nodes if the likelihood that a user submits the second query after having issued the first one is sufficiently high. On such large graph we need to compute the center-piece subgraph induced by terms contained into queries. In order to reduce the cost of the above computation, we introduce a novel and efficient method based on an inverted index representation of the model. We experiment our solution on two real-world query logs and we show that its effectiveness is comparable (and in some case better) than state-of-the-art methods for head-queries. More importantly, the quality of the recommendations generated remains very high also for long-tail queries, where other methods fail even to produce any suggestion. Finally, we extensively investigate scalability and efficiency issues and we show the viability of our method in real world search engines.
Supporting efficient top-k queries in type-ahead search BIBAFull-Text 355-364
  Guoliang Li; Jiannan Wang; Chen Li; Jianhua Feng
Type-ahead search can on-the-fly find answers as a user types in a keyword query. A main challenge in this search paradigm is the high-efficiency requirement that queries must be answered within milliseconds. In this paper we study how to answer top-k queries in this paradigm, i.e., as a user types in a query letter by letter, we want to efficiently find the k best answers. Instead of inventing completely new algorithms from scratch, we study challenges when adopting existing top-k algorithms in the literature that heavily rely on two basic list-access methods: random access and sorted access. We present two algorithms to support random access efficiently. We develop novel techniques to support efficient sorted access using list pruning and materialization. We extend our techniques to support fuzzy type-ahead search which allows minor errors between query keywords and answers. We report our experimental results on several real large data sets to show that the proposed techniques can answer top-k queries efficiently in type-ahead search.
SimFusion+: extending SimFusion towards efficient estimation on large and dynamic networks BIBAFull-Text 365-374
  Weiren Yu; Xuemin Lin; Wenjie Zhang; Ying Zhang; Jiajin Le
SimFusion has become a captivating measure of similarity between objects in a web graph. It is iteratively distilled from the notion that "the similarity between two objects is reinforced by the similarity of their related objects". The existing SimFusion model usually exploits the Unified Relationship Matrix (URM) to represent latent relationships among heterogeneous data, and adopts an iterative paradigm for SimFusion computation. However, due to the row normalization of URM, the traditional SimFusion model may produce the trivial solution; worse still, the iterative computation of SimFusion may not ensure the global convergence of the solution. This paper studies the revision of this model, providing a full treatment from complexity to algorithms. (1) We propose SimFusion+ based on a notion of the Unified Adjacency Matrix (UAM), a modification of the URM, to prevent the trivial solution and the divergence issue of SimFusion. (2) We show that for any vertex-pair, SimFusion+ can be performed in O(1) time and O(n) space with an O(km)-time precomputation done only once, as opposed to the O(kn{cubed}) time and O(n{squared}) space of its traditional counterpart, where n, m, and k denote the number of vertices, edges, and iterations respectively. (3) We also devise an incremental algorithm for further improving the computation of SimFusion+ when networks are dynamically updated, with performance guarantees for similarity estimation. We experimentally verify that these algorithms scale well, and the revised notion of SimFusion is able to converge to a non-trivial solution, and allows us to identify more sensible structure information in large real-world networks.
Group matrix factorization for scalable topic modeling BIBAFull-Text 375-384
  Quan Wang; Zheng Cao; Jun Xu; Hang Li
Topic modeling can reveal the latent structure of text data and is useful for knowledge discovery, search relevance ranking, document classification, and so on. One of the major challenges in topic modeling is to deal with large datasets and large numbers of topics in real-world applications. In this paper, we investigate techniques for scaling up the non-probabilistic topic modeling approaches such as RLSI and NMF. We propose a general topic modeling method, referred to as Group Matrix Factorization (GMF), to enhance the scalability and efficiency of the non-probabilistic approaches. GMF assumes that the text documents have already been categorized into multiple semantic classes, and there exist class-specific topics for each of the classes as well as shared topics across all classes. Topic modeling is then formalized as a problem of minimizing a general objective function with regularizations and/or constraints on the class-specific topics and shared topics. In this way, the learning of class-specific topics can be conducted in parallel, and thus the scalability and efficiency can be greatly improved. We apply GMF to RLSI and NMF, obtaining Group RLSI (GRLSI) and Group NMF (GNMF) respectively. Experiments on a Wikipedia dataset and a real-world web dataset, each containing about 3 million documents, show that GRLSI and GNMF can greatly improve RLSI and NMF in terms of scalability and efficiency. The topics discovered by GRLSI and GNMF are coherent and have good readability. Further experiments on a search relevance dataset, containing 30,000 labeled queries, show that the use of topics learned by GRLSI and GNMF can significantly improve search relevance.

Spam and abuse

Detecting quilted web pages at scale BIBAFull-Text 385-394
  Marc Najork
Web-based advertising and electronic commerce, combined with the key role of search engines in driving visitors to ad-monetized and e-commerce web sites, has given rise to the phenomenon of web spam: web pages that are of little value to visitors, but that are created mainly to mislead search engines into driving traffic to target web sites. A large fraction of spam web pages is automatically generated, and some portion of these pages is generated by stitching together parts (sentences or paragraphs) of other web pages. This paper presents a scalable algorithm for detecting such "quilted" web pages. Previous work by the author and his collaborators introduced a sampling-based algorithm that was capable of detecting some, but by far not all quilted web pages in a collection. By contrast, the algorithm presented in this work identifies all quilted web pages, and it is scalable to very large corpora. We tested the algorithm on the half-billion page English-language subset of the ClueWeb09 collection, and evaluated its effectiveness in detecting web spam by manually inspecting small samples of the detected quilted pages. This manual inspection guided us in iteratively refining the algorithm to be more efficient in detecting real-world spam.
Fighting against web spam: a novel propagation method based on click-through data BIBAFull-Text 395-404
  Chao Wei; Yiqun Liu; Min Zhang; Shaoping Ma; Liyun Ru; Kuo Zhang
Combating Web spam is one of the greatest challenges for Web search engines. State-of-the-art anti-spam techniques focus mainly on detecting varieties of spam strategies, such as content spamming and link-based spamming. Although these anti-spam approaches have had much success, they encounter problems when fighting against a continuous barrage of new types of spamming techniques. We attempt to solve the problem from a new perspective, by noticing that queries that are more likely to lead to spam pages/sites have the following characteristics: 1) they are popular or reflect heavy demands for search engine users and 2) there are usually few key resources or authoritative results for them. From these observations, we propose a novel method that is based on click-through data analysis by propagating the spamicity score iteratively between queries and URLs from a few seed pages/sites. Once we obtain the seed pages/sites, we use the link structure of the click-through bipartite graph to discover other pages/sites that are likely to be spam. Experiments show that our algorithm is both efficient and effective in detecting Web spam. Moreover, combining our method with some popular anti-spam techniques such as TrustRank achieves improvement compared with each technique taken individually.
Learning hash codes for efficient content reuse detection BIBAFull-Text 405-414
  Qi Zhang; Yan Wu; Zhuoye Ding; Xuanjing Huang
Content reuse is extremely common in user generated mediums. Reuse detection serves as be the basis for many applications. However, along with the explosion of Internet and continuously growing uses of user generated mediums, the task becomes more critical and difficult. In this paper, we present a novel efficient and scalable approach to detect content reuse. We propose a new signature generation algorithm, which is based on learned hash functions for words. In order to deal with tens of billions of documents, we implement the detection approach on graphical processing units (GPUs). The experimental comparison in this paper involves studies of efficiency and effectiveness of the proposed approach in different types of document collections, including ClueWeb09, Tweets2011, and so on. Experimental results show that the proposed approach can achieve the same detection rates with state-of-the-art systems while uses significantly less execution time than them (from 400X to 1500X speedup).

Users 2: exploratory search

Explanatory semantic relatedness and explicit spatialization for exploratory search BIBAFull-Text 415-424
  Brent Hecht; Samuel H. Carton; Mahmood Quaderi; Johannes Schöning; Martin Raubal; Darren Gergle; Doug Downey
Exploratory search, in which a user investigates complex concepts, is cumbersome with today's search engines. We present a new exploratory search approach that generates interactive visualizations of query concepts using thematic cartography (e.g. choropleth maps, heat maps). We show how the approach can be applied broadly across both geographic and non-geographic contexts through explicit spatialization, a novel method that leverages any figure or diagram -- from a periodic table, to a parliamentary seating chart, to a world map -- as a spatial search environment. We enable this capability by introducing explanatory semantic relatedness measures. These measures extend frequently-used semantic relatedness measures to not only estimate the degree of relatedness between two concepts, but also generate human-readable explanations for their estimates by mining Wikipedia's text, hyperlinks, and category structure. We implement our approach in a system called Atlasify, evaluate its key components, and present several use cases.
A subjunctive exploratory search interface to support media studies researchers BIBAFull-Text 425-434
  Marc Bron; Jasmijn van Gorp; Frank Nack; Maarten de Rijke; Andrei Vishneuski; Sonja de Leeuw
Media studies concerns the study of production, content, and/or reception of various types of media. Today's continuous production and storage of media is changing the way media studies researchers work and requires the development of new search models and tools.
   We investigate the research cycle of media studies researchers and find that it is an iterative process consisting of several search processes in which data is gathered and the research question is refined. Changes in the research question, however, trigger new data gathering processes.
   Based on these outcomes we propose a subjunctive exploratory search interface to support media studies researchers in refining their research question in an earlier stage of their research. To assess the subjunctive interface we conduct a user study and compare to a traditional exploratory search interface.
   We find that with the subjunctive interface users explore more diverse topics than with the standard interface and that users formulate more specific research questions. Although the subjunctive interface is more complex, this does not decrease its usability.
   These findings suggest that the subjunctive interface supports media studies researchers. The advantage of a subjunctive interface for exploration suggests a new direction for the development of exploratory search systems.
Task complexity, vertical display and user interaction in aggregated search BIBAFull-Text 435-444
  Jaime Arguello; Wan-Ching Wu; Diane Kelly; Ashlee Edwards
Aggregated search is the task of blending results from specialized search services or verticals into the Web search results. While many studies have focused on aggregated search techniques, few studies have tried to better understand how users interact with aggregated search results. This study investigates how task complexity and vertical display (the blending of vertical results into the web results) affect the use of vertical content. Twenty-nine subjects completed six search tasks of varying levels of task complexity using two aggregated search interfaces: one that blended vertical results into the web results and one that only provided indirect vertical access. Our results show that more complex tasks required significantly more interaction and that subjects completing these tasks examined more vertical results. While the amount of interaction was the same between interfaces, subjects clicked on more vertical results when these were blended into the web results. Our results also show an interaction between task complexity and vertical display; subjects clicked on more verticals when completing the more complex tasks with the interface that blended vertical results. Subjects' evaluations of the two interfaces were nearly identical, but when analyzed with respect to their interface preferences, we found a positive relationship between system evaluations and individual preferences. Subjects justified their preference using similar rationales and their comments illustrate how the display itself can influence judgments of information quality, especially in cases when the vertical results might not be relevant to the search task.

Multimedia 2

Image ranking based on user browsing behavior BIBAFull-Text 445-454
  Michele Trevisiol; Luca Chiarandini; Luca Maria Aiello; Alejandro Jaimes
Ranking of images is difficult because many factors determine their importance (e.g., popularity, quality, entertainment value, context, etc.). In social media platforms, ranking also depends on social interactions and on the visibility of the images both inside and outside those platforms. In this context, the application of standard ranking methods is not clearly understood, and neither are the subtleties associated with taking into account social interaction, internal, and external factors. In this paper, we use a large Flickr dataset and investigate these factors by performing an in-depth analysis of several ranking algorithms using both internal (i.e., within Flickr) and external (i.e., links from outside of Flickr) factors. We analyze rankings given by common metrics used in image retrieval (e.g., number of favorites), and compare them with metrics based on page views (e.g., time spent, number of views). In addition, we represent users' navigation by a graph and combine session models with some of these metrics, comparing with PageRank and BrowseRank. Our experiments show significant differences between the rankings, providing insights on the impact of social interactions, internal, and external factors in image ranking.
Modeling concept dynamics for large scale music search BIBAFull-Text 455-464
  Jialie Shen; HweeHwa Pang; Meng Wang; Shuicheng Yan
Continuing advances in data storage and communication technologies have led to an explosive growth in digital music collections. To cope with their increasing scale, we need effective Music Information Retrieval (MIR) capabilities like tagging, concept search and clustering. Integral to MIR is a framework for modelling music documents and generating discriminative signatures for them. In this paper, we introduce a multimodal, layered learning framework called DMCM. Distinguished from the existing approaches that encode music as an ensemble of order-less feature vectors, our framework extracts from each music document a variety of acoustic features, and translates them into low-level encodings over the temporal dimension. From them, DMCM elucidates the concept dynamics in the music document, representing them with a novel music signature scheme called Stochastic Music Concept Histogram (SMCH) that captures the probability distribution over all the concepts. Experiment results with two large music collections confirm the advantages of the proposed framework over existing methods on various MIR tasks.
Finding translations in scanned book collections BIBAFull-Text 465-474
  Ismet Zeki Yalniz; R. Manmatha
This paper describes an approach for identifying translations of books in large scanned book collections with OCR errors. The method is based on the idea that although individual sentences do not necessarily preserve the word order when translated, a book must preserve the linear progression of ideas for it to be a valid translation. Consider two books in two different languages, say English and German. The English book in the collection is represented by the sequence of words (in the order they appear in the text) which appear only once in the book. Similarly, the book in German is represented by its sequence of words which appear only once. An English-German dictionary is used to transform the word sequence of the English book into German by translating individual words in place. It is not necessary to translate all the words and this method works even with small dictionaries. Both sequences are now in German and can, therefore, be aligned using a Longest Common Subsequence (LCS) algorithm. We describe two scoring functions TRANS-cs and TRANS-its which account for both the LCS length and the lengths of the original word sequences. Experiments demonstrate that TRANS-its is particularly successful in finding translations of books and outperforms several baselines including metadata search based on matching titles and authors. Experiments performed on a Europarl parallel corpus for four language pairs, English-Finnish, English-French, English-German, English-Spanish, and a scanned book collection of 50K English-German books show that the proposed method retrieves translations of books with an average MAP score of 1.0 and a speed of 10K book pair comparisons per second on a single core.

Recommender systems 2

Predicting the ratings of multimedia items for making personalized recommendations BIBAFull-Text 475-484
  Rani Qumsiyeh; Yiu-Kai Ng
Existing multimedia recommenders suggest a specific type of multimedia items rather than items of different types personalized for a user based on his/her preference. Assume that a user is interested in a particular family movie, it is appealing if a multimedia recommendation system can suggest other movies, music, books, and paintings closely related to the movie. We propose a comprehensive, personalized multimedia recommendation system, denoted MudRecS, which makes recommendations on movies, music, books, and paintings similar in content to other movies, music, books, and/or paintings that a MudRecS user is interested in. MudRecS does not rely on users' access patterns/histories, connection information extracted from social networking sites, collaborated filtering methods, or user personal attributes (such as gender and age) to perform the recommendation task. It simply considers the users' ratings, genres, role players (authors or artists), and reviews of different multimedia items, which are abundant and easy to find on the Web. MudRecS predicts the ratings of multimedia items that match the interests of a user to make recommendations. The performance of MudRecS has been compared with current state-of-the-art multimedia recommenders using various multimedia datasets, and the experimental results show that MudRecS significantly outperforms other systems in accurately predicting the ratings of multimedia items to be recommended.
Personalized click shaping through Lagrangian duality for online recommendation BIBAFull-Text 485-494
  Deepak Agarwal; Bee-Chung Chen; Pradheep Elango; Xuanhui Wang
Online content recommendation aims to identify trendy articles in a continuously changing dynamic content pool. Most of existing works rely on online user feedback, notably clicks, as the objective and maximize it by showing articles with highest click-through rates. Recently, click shaping was introduced to incorporate multiple objectives in a constrained optimization framework. The work showed that significant tradeoff among the competing objectives can be observed and thus it is important to consider multiple objectives. However, the proposed click shaping approach is segment-based and can only work with a few non-overlapping user segments. It remains a challenge of how to enable deep personalization in click shaping. In this paper, we tackle the challenge by proposing personalized click shaping. The main idea is to work with the Lagrangian duality formulation and explore strong convexity to connect dual and primal solutions. We show that our formulation not only allows efficient conversion from dual to primal for online personalized serving, but also enables us to solve the optimization faster by approximation. We conduct extensive experiments on a large real data set and our experimental results show that the personalized click shaping can significantly outperform the segmented one, while achieving the same ability to balance competing objectives.
What reviews are satisfactory: novel features for automatic helpfulness voting BIBAFull-Text 495-504
  Yu Hong; Jun Lu; Jianmin Yao; Qiaoming Zhu; Guodong Zhou
This paper focuses on exploring the features of product reviews that satisfy users, by which to improve the automatic helpfulness voting for the reviews on commercial websites. Compared to the previous work, which single-mindedly adopts the textual features to assess the review helpfulness, we propose that user preferences are more explicit clues to infer the opinions of users on the review helpfulness. By using the user-preference based features, we firstly implement a binary helpfulness based review classification system to divide helpful reviews and useless, and on the basis, we secondly build a Ranking SVM based automatic helpfulness voting system (AHV) which rank reviews based on their helpfulness. Experiments used a large scale dataset containing over 34,266 reviews on 1289 products to test the systems, which achieves promising performances with accuracy of up to 0.72 and NDCG@10 of 0.25, and at least 9% accuracy improvement compared to the textual-feature based helpfulness assessment.

Query expansion and reformulation

Automatic refinement of patent queries using concept importance predictors BIBAFull-Text 505-514
  Parvaz Mahdabi; Linda Andersson; Mostafa Keikha; Fabio Crestani
Patent prior art queries are full patent applications which are much longer than standard web search topics. Such queries are composed of hundreds of terms and do not represent a focused information need. One way to make the queries more focused is to select a group of key terms as representatives. Existing works show that such a selection to reduce patent queries is a challenging task mainly because of the presence of ambiguous terms. Given this setup, we present a query modeling approach where we utilize patent-specific characteristics to generate more precise queries. We propose to automatically disambiguate query terms by employing noun phrases that are extracted using the global analysis of the patent collection. We further introduce a method for predicting whether expansion using noun phrases would improve the retrieval effectiveness.
   Our experiments show that we can obtain almost 20% improvement by performing query expansion using the true importance of the noun phrase queries. Based on this observation, we introduce various features that can be used to estimate the importance of the noun phrase query. We evaluated the effectiveness of the proposed method on the patent prior art search collection CLEF-IP 2010. Our experimental results indicate that the proposed features make good predictors of the noun phrase importance, and selective application of noun phrase queries using the importance predictors outperforms existing query generation methods.
Automatic term mismatch diagnosis for selective query expansion BIBAFull-Text 515-524
  Le Zhao; Jamie Callan
People are seldom aware that their search queries frequently mismatch a majority of the relevant documents. This may not be a big problem for topics with a large and diverse set of relevant documents, but would largely increase the chance of search failure for less popular search needs. We aim to address the mismatch problem by developing accurate and simple queries that require minimal effort to construct. This is achieved by targeting retrieval interventions at the query terms that are likely to mismatch relevant documents. For a given topic, the proportion of relevant documents that do not contain a term measures the probability for the term to mismatch relevant documents, or the term mismatch probability. Recent research demonstrates that this probability can be estimated reliably prior to retrieval. Typically, it is used in probabilistic retrieval models to provide query dependent term weights. This paper develops a new use: Automatic diagnosis of term mismatch. A search engine can use the diagnosis to suggest manual query reformulation, guide interactive query expansion, guide automatic query expansion, or motivate other responses. The research described here uses the diagnosis to guide interactive query expansion, and create Boolean conjunctive normal form (CNF) structured queries that selectively expand 'problem' query terms while leaving the rest of the query untouched. Experiments with TREC Ad-hoc and Legal Track datasets demonstrate that with high quality manual expansion, this diagnostic approach can reduce user effort by 33%, and produce simple and effective structured queries that surpass their bag of word counterparts.
Generating reformulation trees for complex queries BIBAFull-Text 525-534
  Xiaobing Xue; W. Bruce Croft
Search queries have evolved beyond keyword queries. Many complex queries such as verbose queries, natural language question queries and document-based queries are widely used in a variety of applications. Processing these complex queries usually requires a series of query operations, which results in multiple sequences of reformulated queries. However, previous query representations, either the "bag of words" method or the recently proposed "query distribution" method, cannot effectively model these query sequences, since they ignore the relationships between two queries. In this paper, a reformulation tree framework is proposed to organize multiple sequences of reformulated queries as a tree structure, where each path of the tree corresponds to a sequence of reformulated queries. Specifically, a two-level reformulation tree is implemented for verbose queries. This tree effectively combines two query operations, i.e., subset selection and query substitution, within the same framework. Furthermore, a weight estimation approach is proposed to assign weights to each node of the reformulation tree by considering the relationships with other nodes and directly optimizing retrieval performance. Experiments on TREC collections show that this reformulation tree based representation significantly outperforms the state-of-the-art techniques.
Proximity-based Rocchio's model for pseudo relevance BIBAFull-Text 535-544
  Jun Miao; Jimmy Xiangji Huang; Zheng Ye
Rocchio's relevance feedback model is a classic query expansion method and it has been shown to be effective in boosting information retrieval performance. The selection of expansion terms in this method, however, does not take into account the relationship between the candidate terms and the query terms (e.g., term proximity). Intuitively, the proximity between candidate expansion terms and query terms can be exploited in the process of query expansion, since terms closer to query terms are more likely to be related to the query topic.
   In this paper, we study how to incorporate proximity information into the Rocchio's model, and propose a proximity-based Rocchio's model, called PRoc, with three variants. In our PRoc models, a new concept (proximity-based term frequency, ptf) is introduced to model the proximity information in the pseudo relevant documents, which is then used in three kinds of proximity measures. Experimental results on TREC collections show that our proposed PRoc models are effective and generally superior to the state-of-the-art relevance feedback models with optimal parameters. A direct comparison with positional relevance model (PRM) on the GOV2 collection also indicates our proposed model is at least competitive to the most recent progress.

Social media 1

Modeling user posting behavior on social media BIBAFull-Text 545-554
  Zhiheng Xu; Yang Zhang; Yao Wu; Qing Yang
User generated content is the basic element of social media websites. Relatively few studies have systematically analyzed the motivation to create and share content, especially from the perspective of a common user. In this paper, we perform a comprehensive analysis of user posting behavior on a popular social media website, Twitter. Specifically, we assume that user behavior is mainly influenced by three factors: breaking news, posts from social friends and user's intrinsic interest, and propose a mixture latent topic model to combine all these factors. We evaluated our model on a large-scale Twitter dataset from three different perspectives: the perplexity of held-out content, the performance of predicting retweets and the quality of generated latent topics. The results were encouraging, our model clearly outperformed its competitors.
Friend or frenemy?: predicting signed ties in social networks BIBAFull-Text 555-564
  Shuang-Hong Yang; Alexander J. Smola; Bo Long; Hongyuan Zha; Yi Chang
We study the problem of labeling the edges of a social network graph (e.g., acquaintance connections in Facebook) as either positive (i.e., trust, true friendship) or negative (i.e., distrust, possible frenemy) relations. Such signed relations provide much stronger signal in tying the behavior of online users than the unipolar Homophily effect, yet are largely unavailable as most social graphs only contain unsigned edges.
   We show the surprising fact that it is possible to infer signed social ties with good accuracy solely based on users' behavior of decision making (or using only a small fraction of supervision information) via unsupervised and semi-supervised algorithms. This work hereby makes it possible to turn an unsigned acquaintance network (e.g. Facebook, Myspace) into a signed trust-distrust network (e.g. Epinion, Slashdot). Our results are based on a mixed effects framework that simultaneously captures users' behavior, social interactions as well as the interplay between the two. The framework includes a series of latent factor models and it also encodes the principles of balance and status from Social psychology. Experiments on Epinion and Yahoo! Pulse networks illustrate that (1) signed social ties can be predicted with high-accuracy even in fully unsupervised settings, and (2) the predicted signed ties are significantly more useful for social behavior prediction than simple Homophily.
Social-network analysis using topic models BIBAFull-Text 565-574
  Youngchul Cha; Junghoo Cho
In this paper, we discuss how we can extend probabilistic topic models to analyze the relationship graph of popular social-network data, so that we can group or label the edges and nodes in the graph based on their topic similarity. In particular, we first apply the well-known Latent Dirichlet Allocation (LDA) model and its existing variants to the graph-labeling task and argue that the existing models do not handle popular nodes (nodes with many incoming edges) in the graph very well. We then propose possible extensions to this model to deal with popular nodes. Our experiments show that the proposed extensions are very effective in labeling popular nodes, showing significant improvements over the existing methods. Our proposed methods can be used for providing, for instance, more relevant friend recommendations within a social network.
Cognos: crowdsourcing search for topic experts in microblogs BIBAFull-Text 575-590
  Saptarshi Ghosh; Naveen Sharma; Fabricio Benevenuto; Niloy Ganguly; Krishna Gummadi
Finding topic experts on microblogging sites with millions of users, such as Twitter, is a hard and challenging problem. In this paper, we propose and investigate a new methodology for discovering topic experts in the popular Twitter social network. Our methodology relies on the wisdom of the Twitter crowds -- it leverages Twitter Lists, which are often carefully curated by individual users to include experts on topics that interest them and whose meta-data (List names and descriptions) provides valuable semantic cues to the experts' domain of expertise. We mined List information to build Cognos, a system for finding topic experts in Twitter. Detailed experimental evaluation based on a real-world deployment shows that: (a) Cognos infers a user's expertise more accurately and comprehensively than state-of-the-art systems that rely on the user's bio or tweet content, (b) Cognos scales well due to built-in mechanisms to efficiently update its experts' database with new users, and (c) Despite relying only on a single feature, namely crowdsourced Lists, Cognos yields results comparable to, if not better than, those given by the official Twitter experts search engine for a wide range of queries in user tests. Our study highlights Lists as a potentially valuable source of information for future content or expert search systems in Twitter.

Query completion and correction

Automatic suggestion of query-rewrite rules for enterprise search BIBAFull-Text 591-600
  Zhuowei Bao; Benny Kimelfeld; Yunyao Li
Enterprise search is challenging for several reasons, notably the dynamic terminology and jargon that are specific to the enterprise domain. This challenge is partly addressed by having domain experts maintaining the enterprise search engine and adapting it to the domain specifics. Those administrators commonly address user complaints about relevant documents missing from the top matches. For that, it has been proposed to allow administrators to influence search results by crafting query-rewrite rules, each specifying how queries of a certain pattern should be modified or augmented with additional queries. Upon a complaint, the administrator seeks a semantically coherent rule that is capable of pushing the desired documents up to the top matches. However, the creation and maintenance of rewrite rules is highly tedious and time consuming. Our goal in this work is to ease the burden on search administrators by automatically suggesting rewrite rules. This automation entails several challenges. One major challenge is to select, among many options, rules that are "natural" from a semantic perspective (e.g., corresponding to closely related and syntactically complete concepts). Towards that, we study a machine-learning classification approach. The second challenge is to accommodate the cross-query effect of rules -- a rule introduced in the context of one query can eliminate the desired results for other queries and the desired effects of other rules. We present a formalization of this challenge as a generic computational problem. As we show that this problem is highly intractable in terms of complexity theory, we present heuristic approaches and optimization thereof. In an experimental study within IBM intranet search, those heuristics achieve near-optimal quality and well scale to large data sets.
Time-sensitive query auto-completion BIBAFull-Text 601-610
  Milad Shokouhi; Kira Radinsky
Query auto-completion (QAC) is a common feature in modern search engines. High quality QAC candidates enhance search experience by saving users time that otherwise would be spent on typing each character or word sequentially.
   Current QAC methods rank suggestions according to their past popularity. However, query popularity changes over time, and the ranking of candidates must be adjusted accordingly. For instance, while HAlloween might be the right suggestion after typing ha in October, HArry potter might be better any other time. Surprisingly, despite the importance of QAC as a key feature in most online search engines, its temporal dynamics have been under-studied.
   In this paper, we propose a time-sensitive approach for query auto-completion. Instead of ranking candidates according to their past popularity, we apply time-series and rank candidates according their forecasted frequencies. Our experiments on 846K queries and their daily frequencies sampled over a period of 4.5 years show that predicting the popularity of queries solely based on their past frequency can be misleading, and the forecasts obtained by time-series modeling are substantially more reliable. Our results also suggest that modeling the temporal trends of queries can significantly improve the ranking of QAC candidates.
A generalized hidden Markov model with discriminative training for query spelling correction BIBAFull-Text 611-620
  Yanen Li; Huizhong Duan; ChengXiang Zhai
Query spelling correction is a crucial component of modern search engines. Existing methods in the literature for search query spelling correction have two major drawbacks. First, they are unable to handle certain important types of spelling errors, such as concatenation and splitting. Second, they cannot efficiently evaluate all the candidate corrections due to the complex form of their scoring functions, and a heuristic filtering step must be applied to select a working set of top-K most promising candidates for final scoring, leading to non-optimal predictions. In this paper we address both limitations and propose a novel generalized Hidden Markov Model with discriminative training that can not only handle all the major types of spelling errors, including splitting and concatenation errors, in a single unified framework, but also efficiently evaluate all the candidate corrections to ensure the finding of a globally optimal correction. Experiments on two query spelling correction datasets demonstrate that the proposed generalized HMM is effective for correcting multiple types of spelling errors. The results also show that it significantly outperforms the current approach for generating top-K candidate corrections, making it a better first-stage filter to enable any other complex spelling correction algorithm to have access to a better working set of candidate corrections as well as to cover splitting and concatenation errors, which no existing method in academic literature can correct.

Architectures 2

Learning to predict response times for online query scheduling BIBAFull-Text 621-630
  Craig Macdonald; Nicola Tonellotto; Iadh Ounis
Dynamic pruning strategies permit efficient retrieval by not fully scoring all postings of the documents matching a query -- without degrading the retrieval effectiveness of the top-ranked results. However, the amount of pruning achievable for a query can vary, resulting in queries taking different amounts of time to execute. Knowing in advance the execution time of queries would permit the exploitation of online algorithms to schedule queries across replicated servers in order to minimise the average query waiting and completion times. In this work, we investigate the impact of dynamic pruning strategies on query response times, and propose a framework for predicting the efficiency of a query. Within this framework, we analyse the accuracy of several query efficiency predictors across 10,000 queries submitted to in-memory inverted indices of a 50-million-document Web crawl. Our results show that combining multiple efficiency predictors with regression can accurately predict the response time of a query before it is executed. Moreover, using the efficiency predictors to facilitate online scheduling algorithms can result in a 22% reduction in the mean waiting time experienced by queries before execution, and a 7% reduction in the mean completion time experienced by users.
Prefetching query results and its impact on search engines BIBAFull-Text 631-640
  Simon Jonassen; B. Barla Cambazoglu; Fabrizio Silvestri
We investigate the impact of query result prefetching on the efficiency and effectiveness of web search engines. We propose offline and online strategies for selecting and ordering queries whose results are to be prefetched. The offline strategies rely on query log analysis and the queries are selected from the queries issued on the previous day. The online strategies select the queries from the result cache, relying on a machine learning model that estimates the arrival times of queries. We carefully evaluate the proposed prefetching techniques via simulation on a query log obtained from Yahoo! web search. We demonstrate that our strategies are able to improve various performance metrics, including the hit rate, query response time, result freshness, and query degradation rate, relative to a state-of-the-art baseline.
Online result cache invalidation for real-time web search BIBAFull-Text 641-650
  Xiao Bai; Flavio P. Junqueira
Caches of results are critical components of modern Web search engines, since they enable lower response time to frequent queries and reduce the load to the search engine backend. Results in long-lived cache entries may become stale, however, as search engines continuously update their index to incorporate changes to the Web. Consequently, it is important to provide mechanisms that control the degree of staleness of cached results, ideally enabling the search engine to always return fresh results. In this paper, we present a new mechanism that identifies and invalidates query results that have become stale in the cache online. The basic idea is to evaluate at query time and against recent changes if cache hits have had their results have changed. For enhancing invalidation efficiency, the generation time of cached queries and their chronological order with respect to the latest index update are used to early prune unaffected queries. We evaluate the proposed approach using documents that change over time and query logs of the Yahoo! search engine. We show that the proposed approach ensures good query results (50% fewer stale results) and high invalidation accuracy (90% fewer unnecessary invalidations) compared to a baseline approach that makes invalidation decisions off-line. More importantly, the proposed approach induces less processing overhead, ensuring an average throughput 73% higher than that of the baseline approach.

Recommender systems 3

Learning to rank social update streams BIBAFull-Text 651-660
  Liangjie Hong; Ron Bekkerman; Joseph Adler; Brian D. Davison
As online social media further integrates deeper into our lives, we spend more time consuming social update streams that come from our online connections. Although social update streams provide a tremendous opportunity for us to access information on-the-fly, we often complain about its relevance. Some of us are flooded with a steady stream of information and simply cannot process it in full. Ranking the incoming content becomes the only solution for the overwhelmed users. For some others, in contrast, the incoming information stream is pretty weak, and they have to actively search for relevant information which is quite tedious. For these users, augmenting their incoming content flow with relevant information from outside their first-degree network would be a viable solution. In that case, the problem of relevance becomes even more prominent. In this paper, we start an open discussion on how to build effective systems for ranking social updates from a unique perspective of LinkedIn -- the largest professional network in the world. More specifically, we address this problem as an intersection of learning to rank, collaborative filtering, and clickthrough modeling, while leveraging ideas from information retrieval and recommender systems. We propose a novel probabilistic latent factor model with regressions on explicit features and compare it with a number of non-trivial baselines. In addition to demonstrating superior performance of our model, we shed some light on the nature of social updates on LinkedIn and how users interact with them, which might be applicable to social update streams in general.
Collaborative personalized tweet recommendation BIBAFull-Text 661-670
  Kailong Chen; Tianqi Chen; Guoqing Zheng; Ou Jin; Enpeng Yao; Yong Yu
Twitter has rapidly grown to a popular social network in recent years and provides a large number of real-time messages for users. Tweets are presented in chronological order and users scan the followees' timelines to find what they are interested in. However, an information overload problem has troubled many users, especially those with many followees and thousands of tweets arriving every day. In this paper, we focus on recommending useful tweets that users are really interested in personally to reduce the users' effort to find useful information. Many kinds of information on Twitter are available for helping recommendation, including the user's own tweet history, retweet history and social relations between users. We propose a method of making tweet recommendations based on collaborative ranking to capture personal interests. It can also conveniently integrate the other useful contextual information. Our final method considers three major elements on Twitter: tweet topic level factors, user social relation factors and explicit features such as authority of the publisher and quality of the tweet. The experiments show that all the proposed elements are important and our method greatly outperforms several baseline methods.
Exploring social influence for recommendation: a generative model approach BIBAFull-Text 671-680
  Mao Ye; Xingjie Liu; Wang-Chien Lee
Social friendship has been shown beneficial for item recommendation for years. However, existing approaches mostly incorporate social friendship into recommender systems by heuristics. In this paper, we argue that social influence between friends can be captured quantitatively and propose a probabilistic generative model, called social influenced selection (SIS), to model the decision making of item selection (e.g., what book to buy or where to dine). Based on SIS, we mine the social influence between linked friends and the personal preferences of users through statistical inference. To address the challenges arising from multiple layers of hidden factors in SIS, we develop a new parameter learning algorithm based on expectation maximization (EM). Moreover, we show that the mined social influence and user preferences are valuable for group recommendation and viral marketing. Finally, we conduct a comprehensive performance evaluation using real datasets crawled from last.fm and whrrl.com to validate our proposal. Experimental results show that social influence captured based on our SIS model is effective for enhancing both item recommendation and group recommendation, essential for viral marketing, and useful for various user analysis.

Multimedia 3

See-to-retrieve: efficient processing of spatio-visual keyword queries BIBAFull-Text 681-690
  Chao Zhang; Lidan Shou; Ke Chen; Gang Chen
The wide proliferation of powerful smart phones equipped with multiple sensors, 3D graphical engine, and 3G connection has nurtured the creation of a new spectrum of visual mobile applications. These applications require novel data retrieval techniques which we call What-You-Retrieve-Is-What-You-See (WYRIWYS). However, state-of-the-art spatial retrieval methods are mostly distance-based and thus inapplicable for supporting WYRIWYS. Motivated by this problem, we propose a novel query called spatio-visual keyword (SVK) query, to support retrieving spatial Web objects that are both visually conspicuous and semantically relevant to the user. To capture the visual features of spatial Web objects with extents, we introduce a novel visibility metric which computes object visibility in a cumulative manner. We propose an incremental method called Complete Occlusion-map based Retrieval (COR) to answer SVK queries. This method exploits effective heuristics to prune the search space and construct a data structure called Occlusion-Map. Then the method adopts the best-first strategy to return relevant objects incrementally. Extensive experiments on real and synthetic data sets suggest that our method is effective and efficient when processing SVK queries.
Placing images on the world map: a microblog-based enrichment approach BIBAFull-Text 691-700
  Claudia Hauff; Geert-Jan Houben
Estimating the geographic location of images is a task which has received increasing attention recently. Large numbers of images uploaded to platforms such as Flickr do not contain GPS-based latitude/longitude coordinates. Obtaining such geographic information is beneficial for a variety of applications including travelogues, visual place descriptions and personalized travel recommendations. While most works in this area only exploit an image's textual meta-data (tags, title, etc.) to estimate at what geographic location the image was taken, we consider an additional textual dimension: the image owner's traces on the social Web. Specifically, we hypothesize that information extracted from a person's microblog stream(s) can be utilized to improve the accuracy with which the geographic location of the images is estimated. In this paper, we investigate this hypothesis on the example of Twitter streams and find it to be confirmed. The median error distance in kilometres decreases by up to 67% in comparison to existing state-of-the-art. The best results are achieved when tweets that were posted up to two days before and after an image was taken are considered. Moreover, we also find another type of additional information useful: population density data.
Where is who: large-scale photo retrieval by facial attributes and canvas layout BIBAFull-Text 701-710
  Yu-Heng Lei; Yan-Ying Chen; Bor-Chun Chen; Lime Iida; Winston H. Hsu
The ubiquitous availability of digital cameras has made it easier than ever to capture moments of life, especially the ones accompanied with friends and family. It is generally believed that most family photos are with faces that are sparsely tagged. Therefore, a better solution to manage and search in the tremendously growing personal or group photos is highly anticipated. In this paper, we propose a novel way to search for face photos by simultaneously considering attributes (e.g., gender, age, and race), positions, and sizes of the target faces. To better match the content and layout of the multiple faces in mind, our system allows the user to graphically specify the face positions and sizes on a query "canvas," where each attribute combination is defined as an icon for easier representation. As a secondary feature, the user can even place specific faces from the previous search results for appearance-based retrieval. The scenario has been realized on a tablet device with an intuitive touch interface. Experimenting with a large-scale Flickr dataset of more than 200k faces, the proposed formulation and joint ranking have made us achieve a hit rate of 0.420 at rank 100, significantly improving from 0.036 of the prior search scheme using attributes alone. We have also achieved an average running time of 0.0558 second by the proposed block-based indexing approach.

Entities

Mining the web for points of interest BIBAFull-Text 711-720
  Adam Rae; Vanessa Murdock; Adrian Popescu; Hugues Bouchard
A point of interest (POI) is a focused geographic entity such as a landmark, a school, an historical building, or a business. Points of interest are the basis for most of the data supporting location-based applications. In this paper we propose to curate POIs from online sources by bootstrapping training data from Web snippets, seeded by POIs gathered from social media. This large corpus is used to train a sequential tagger to recognize mentions of POIs in text. Using Wikipedia data as the training data, we can identify POIs in free text with an accuracy that is 116% better than the state of the art POI identifier in terms of precision, and 50% better in terms of recall. We show that using Foursquare and Gowalla checkins as seeds to bootstrap training data from Web snippets, we can improve precision between 16% and 52%, and recall between 48% and 187% over the state-of-the-art. The name of a POI is not sufficient, as the POI must also be associated with a set of geographic coordinates. Our method increases the number of POIs that can be localized nearly three-fold, from 134 to 395 in a sample of 400, with a median localization accuracy of less than one kilometer.
TwiNER: named entity recognition in targeted Twitter stream BIBAFull-Text 721-730
  Chenliang Li; Jianshu Weng; Qi He; Yuxia Yao; Anwitaman Datta; Aixin Sun; Bu-Sung Lee
Many private and/or public organizations have been reported to create and monitor targeted Twitter streams to collect and understand users' opinions about the organizations. Targeted Twitter stream is usually constructed by filtering tweets with user-defined selection criteria e.g. tweets published by users from a selected region, or tweets that match one or more predefined keywords. Targeted Twitter stream is then monitored to collect and understand users' opinions about the organizations. There is an emerging need for early crisis detection and response with such target stream. Such applications require a good named entity recognition (NER) system for Twitter, which is able to automatically discover emerging named entities that is potentially linked to the crisis. In this paper, we present a novel 2-step unsupervised NER system for targeted Twitter stream, called TwiNER. In the first step, it leverages on the global context obtained from Wikipedia and Web N-Gram corpus to partition tweets into valid segments (phrases) using a dynamic programming algorithm. Each such tweet segment is a candidate named entity. It is observed that the named entities in the targeted stream usually exhibit a gregarious property, due to the way the targeted stream is constructed. In the second step, TwiNER constructs a random walk model to exploit the gregarious property in the local context derived from the Twitter stream. The highly-ranked segments have a higher chance of being true named entities. We evaluated TwiNER on two sets of real-life tweets simulating two targeted streams. Evaluated using labeled ground truth, TwiNER achieves comparable performance as with conventional approaches in both streams. Various settings of TwiNER have also been examined to verify our global context + local context combo idea.
Adaptive context features for toponym resolution in streaming news BIBAFull-Text 731-740
  Michael D. Lieberman; Hanan Samet
News sources around the world generate constant streams of information, but effective streaming news retrieval requires an intimate understanding of the geographic content of news. This process of understanding, known as geotagging, consists of first finding words in article text that correspond to location names (toponyms), and second, assigning each toponym its correct lat/long values. The latter step, called toponym resolution, can also be considered a classification problem, where each of the possible interpretations for each toponym is classified as correct or incorrect. Hence, techniques from supervised machine learning can be applied to improve accuracy. New classification features to improve toponym resolution, termed adaptive context features, are introduced that consider a window of context around each toponym, and use geographic attributes of toponyms in the window to aid in their correct resolution. Adaptive parameters controlling the window's breadth and depth afford flexibility in managing a tradeoff between feature computation speed and resolution accuracy, allowing the features to potentially apply to a variety of textual domains. Extensive experiments with three large datasets of streaming news demonstrate the new features' effectiveness over two widely-used competing methods.

Learning to rank

Structural relationships for large-scale learning of answer re-ranking BIBAFull-Text 741-750
  Aliaksei Severyn; Alessandro Moschitti
Supervised learning applied to answer re-ranking can highly improve on the overall accuracy of question answering (QA) systems. The key aspect is that the relationships and properties of the question/answer pair composed of a question and the supporting passage of an answer candidate, can be efficiently compared with those captured by the learnt model.
   In this paper, we define novel supervised approaches that exploit structural relationships between a question and their candidate answer passages to learn a re-ranking model. We model structural representations of both questions and answers and their mutual relationships by just using an off-the-shelf shallow syntactic parser. We encode structures in Support Vector Machines (SVMs) by means of sequence and tree kernels, which can implicitly represent question and answer pairs in huge feature spaces. Such models together with the latest approach to fast kernel-based learning enabled the training of our rerankers on hundreds of thousands of instances, which previously rendered intractable for kernelized SVMs. The results on two different QA datasets, e.g., Answerbag and Jeopardy! data, show that our models deliver large improvement on passage re-ranking tasks, reducing the error in Recall of BM25 baseline by about 18%. One of the key findings of this work is that, despite its simplicity, shallow syntactic trees allow for learning complex relational structures, which exhibits a steep learning curve with the increase in the training size.
Top-k learning to rank: labeling, ranking and evaluation BIBAFull-Text 751-760
  Shuzi Niu; Jiafeng Guo; Yanyan Lan; Xueqi Cheng
In this paper, we propose a novel top-k learning to rank framework, which involves labeling strategy, ranking model and evaluation measure. The motivation comes from the difficulty in obtaining reliable relevance judgments from human assessors when applying learning to rank in real search systems. The traditional absolute relevance judgment method is difficult in both gradation specification and human assessing, resulting in high level of disagreement on judgments. While the pairwise preference judgment, as a good alternative, is often criticized for increasing the complexity of judgment from O(n) to (n log n). Considering the fact that users mainly care about top ranked search results, we propose a novel top-k labeling strategy which adopts the pairwise preference judgment to generate the top k ordering items from n documents (i.e. top-k ground-truth) in a manner similar to that of HeapSort. As a result, the complexity of judgment is reduced to O(n log k). With the top-k ground-truth, traditional ranking models (e.g. pairwise or listwise models) and evaluation measures (e.g. NDCG) no longer fit the data set. Therefore, we introduce a new ranking model, namely FocusedRank, which fully captures the characteristics of the top-k ground-truth. We also extend the widely used evaluation measures NDCG and ERR to be applicable to the top-k ground-truth, referred as K-NDCG and K-ERR, respectively. Finally, we conduct extensive experiments on benchmark data collections to demonstrate the efficiency and effectiveness of our top-k labeling strategy and ranking models.
Robust ranking models via risk-sensitive optimization BIBAFull-Text 761-770
  Lidan Wang; Paul N. Bennett; Kevyn Collins-Thompson
Many techniques for improving search result quality have been proposed. Typically, these techniques increase average effectiveness by devising advanced ranking features and/or by developing sophisticated learning to rank algorithms. However, while these approaches typically improve average performance of search results relative to simple baselines, they often ignore the important issue of robustness. That is, although achieving an average gain overall, the new models often hurt performance on many queries. This limits their application in real-world retrieval scenarios. Given that robustness is an important measure that can negatively impact user satisfaction, we present a unified framework for jointly optimizing effectiveness and robustness. We propose an objective that captures the tradeoff between these two competing measures and demonstrate how we can jointly optimize for these two measures in a principled learning framework. Experiments indicate that ranking models learned this way significantly decreased the worst ranking failures while maintaining strong average effectiveness on par with current state-of-the-art models.

Community QA

Dual role model for question recommendation in community question answering BIBAFull-Text 771-780
  Fei Xu; Zongcheng Ji; Bin Wang
Question recommendation that automatically recommends a new question to suitable users to answer is an appealing and challenging problem in the research area of Community Question Answering (CQA). Unlike in general recommender systems where a user has only a single role, each user in CQA can play two different roles (dual roles) simultaneously: as an asker and as an answerer. To the best of our knowledge, this paper is the first to systematically investigate the distinctions between the two roles and their different influences on the performance of question recommendation in CQA. Moreover, we propose a Dual Role Model (DRM) to model the dual roles of users effectively. With different independence assumptions, two variants of DRM are achieved. Finally, we present the DRM based approach to question recommendation which provides a mechanism for naturally integrating the user relation between the answerer and the asker with the content re-levance between the answerer and the question into a unified probabilistic framework. Experiments using a real-world data crawled from Yahoo! Answers show that: (1) there are evident distinctions between the two roles of users in CQA. Additionally, the answerer role is more effective than the asker role for modeling candidate users in question recommendation; (2) compared with baselines utilizing a single role or blended roles based methods, our DRM based approach consistently and significantly improves the performance of question recommendation, demonstrating that our approach can model the user in CQA more reasonably and precisely.
Vote calibration in community question-answering systems BIBAFull-Text 781-790
  Bee-Chung Chen; Anirban Dasgupta; Xuanhui Wang; Jie Yang
User votes are important signals in community question-answering (CQA) systems. Many features of typical CQA systems, e.g. the best answer to a question, status of a user, are dependent on ratings or votes cast by the community. In a popular CQA site, Yahoo! Answers, users vote for the best answers to their questions and can also thumb up or down each individual answer. Prior work has shown that these votes provide useful predictors for content quality and user expertise, where each vote is usually assumed to carry the same weight as others. In this paper, we analyze a set of possible factors that indicate bias in user voting behavior -- these factors encompass different gaming behavior, as well as other eccentricities, e.g., votes to show appreciation of answerers. These observations suggest that votes need to be calibrated before being used to identify good answers or experts. To address this problem, we propose a general machine learning framework to calibrate such votes. Through extensive experiments based on an editorially judged CQA dataset, we show that our supervised learning method of content-agnostic vote calibration can significantly improve the performance of answer ranking and expert ranking.
Category hierarchy maintenance: a data-driven approach BIBAFull-Text 791-800
  Quan Yuan; Gao Cong; Aixin Sun; Chin-Yew Lin; Nadia Magnenat Thalmann
Category hierarchies often evolve at a much slower pace than the documents reside in. With newly available documents kept adding into a hierarchy, new topics emerge and documents within the same category become less topically cohesive. In this paper, we propose a novel automatic approach to modifying a given category hierarchy by redistributing its documents into more topically cohesive categories. The modification is achieved with three operations (namely, sprout, merge, and assign) with reference to an auxiliary hierarchy for additional semantic information; the auxiliary hierarchy covers a similar set of topics as the hierarchy to be modified. Our user study shows that the modified category hierarchy is semantically meaningful. As an extrinsic evaluation, we conduct experiments on document classification using real data from Yahoo! Answers and AnswerBag hierarchies, and compare the classification accuracies obtained on the original and the modified hierarchies. Our experiments show that the proposed method achieves much larger classification accuracy improvement compared with several baseline methods for hierarchy modification.
When web search fails, searchers become askers: understanding the transition BIBAFull-Text 801-810
  Qiaoling Liu; Eugene Agichtein; Gideon Dror; Yoelle Maarek; Idan Szpektor
While Web search has become increasingly effective over the last decade, for many users' needs the required answers may be spread across many documents, or may not exist on the Web at all. Yet, many of these needs could be addressed by asking people via popular Community Question Answering (CQA) services, such as Baidu Knows, Quora, or Yahoo! Answers. In this paper, we perform the first large-scale analysis of how searchers become askers. For this, we study the logs of a major web search engine to trace the transformation of a large number of failed searches into questions posted on a popular CQA site. Specifically, we analyze the characteristics of the queries, and of the patterns of search behavior that precede posting a question; the relationship between the content of the attempted queries and of the posted questions; and the subsequent actions the user performs on the CQA site. Our work develops novel insights into searcher intent and behavior that lead to asking questions to the community, providing a foundation for more effective integration of automated web search and social information seeking.

Federated search

Content-based retrieval for heterogeneous domains: domain adaptation by relative aggregation points BIBAFull-Text 811-820
  Makoto P. Kato; Hiroaki Ohshima; Katsumi Tanaka
We introduce the problem of domain adaptation for content-based retrieval and propose a domain adaptation method based on relative aggregation points (RAPs). Content-based retrieval including image retrieval and spoken document retrieval enables a user to input examples as a query, and retrieves relevant data based on the similarity to the examples. However, input examples and relevant data can be dissimilar, especially when domains from which the user selects examples and from which the system retrieves data are different. In content-based geographic object retrieval, for example, suppose that a user who lives in Beijing visits Kyoto, Japan, and wants to search for relatively inexpensive restaurants serving popular local dishes by means of a content-based retrieval system. Since such restaurants in Beijing and Kyoto are dissimilar due to the difference in the average cost and areas' popular dishes, it is difficult to find relevant restaurants in Kyoto based on examples selected in Beijing. We propose a solution for this problem by assuming that RAPs in different domains correspond, which may be dissimilar but play the same role. A RAP is defined as the expectation of instances in a domain that are classified into a certain class, e.g. the most expensive restaurant, average restaurant, and restaurant serving the most popular dishes. Our proposed method constructs a new feature space based on RAPs estimated in each domain and bridges the domain difference for improving content-based retrieval in heterogeneous domains. To verify the effectiveness of our proposed method, we evaluated various methods with a test collection developed for content-based geographic object retrieval. Experimental results show that our proposed method achieved significant improvements over baseline methods. Moreover, we observed that the search performance of content-based retrieval in heterogeneous domains was significantly lower than that in homogeneous domains. This finding suggests that relevant data for the same search intent depend on the search context, that is, the location where the user searches and the domain from which the system retrieves data.
Mixture model with multiple centralized retrieval algorithms for result merging in federated search BIBAFull-Text 821-830
  Dzung Hong; Luo Si
Result merging is an important research problem in federated search for merging documents retrieved from multiple ranked lists of selected information sources into a single list. The state-of-the-art result merging algorithms such as Semi-Supervised Learning (SSL) and Sample-Agglomerate Fitting Estimate (SAFE) try to map document scores retrieved from different sources to comparable scores according to a single centralized retrieval algorithm for ranking those documents. Both SSL and SAFE arbitrarily select a single centralized retrieval algorithm for generating comparable document scores, which is problematic in a heterogeneous federated search environment, since a single centralized algorithm is often suboptimal for different information sources. Based on this observation, this paper proposes a novel approach for result merging by utilizing multiple centralized retrieval algorithms. One simple approach is to learn a set of combination weights for multiple centralized retrieval algorithms (e.g., logistic regression) to compute comparable document scores. The paper shows that this simple approach generates suboptimal results as it is not flexible enough to deal with heterogeneous information sources. A mixture probabilistic model is thus proposed to learn more appropriate combination weights with respect to different types of information sources with some training data. An extensive set of experiments on three datasets have proven the effectiveness of the proposed new approach.
Reactive index replication for distributed search engines BIBAFull-Text 831-840
  Flavio P. Junqueira; Vincent Leroy; Matthieu Morel
Distributed search engines comprise multiple sites deployed across geographically distant regions, each site being specialized to serve the queries of local users. When a search site cannot accurately compute the results of a query, it must forward the query to other sites. This paper considers the problem of selecting the documents indexed by each site focusing on replication to increase the fraction of queries processed locally. We propose RIP, an algorithm for replicating documents and posting lists that is practical and has two important features. RIP evaluates user interests in an online fashion and uses only local data of a site. Being an online approach simplifies the operational complexity, while locality enables higher performance when processing queries and documents. The decision procedure, on top of being online and local, incorporates document popularity and user queries, which is critical when assuming a replication budget for each site. Having a replication budget reflects the hardware constraints of any given site. We evaluate RIP against the approach of replicating popular documents statically, and show that we achieve significant gains, while having the additional benefit of supporting incremental indexes.

Diversity 2

Personalized diversification of search results BIBAFull-Text 841-850
  David Vallet; Pablo Castells
Search personalization and diversification are often seen as opposing alternatives to cope with query uncertainty, where, given an ambiguous query, it is either preferable to adapt the search result to a specific aspect that may interest the user (personalization) or to regard multiple aspects in order to maximize the probability that some query aspect is relevant to the user (diversification). In this work, we question this antagonistic view, and hypothesize that these two directions may in fact be effectively combined and enhance each other. We research the introduction of the user as an explicit random variable in state of the art diversification methods, thus developing a generalized framework for personalized diversification. In order to evaluate our hypothesis, we conduct an evaluation with real users using crowdsourcing services. The obtained results suggest that the combination of personalization and diversification achieves competitive performance, improving the base-line, plain personalization, and plain diversification approaches in terms of both diversity and accuracy measures.
Combining implicit and explicit topic representations for result diversification BIBAFull-Text 851-860
  Jiyin He; Vera Hollink; Arjen de Vries
Result diversification deals with ambiguous or multi-faceted queries by providing documents that cover as many subtopics of a query as possible. Various approaches to subtopic modeling have been proposed. Subtopics have been extracted internally, e.g., from retrieved documents, and externally, e.g., from Web resources such as query logs. Internally modeled subtopics are often implicitly represented, e.g., as latent topics, while externally modeled subtopics are often explicitly represented, e.g., as reformulated queries.
   We propose a framework that: i)combines both implicitly and explicitly represented subtopics; and ii)allows flexible combination of multiple external resources in a transparent and unified manner. Specifically, we use a random walk based approach to estimate the similarities of the explicit subtopics mined from a number of heterogeneous resources: click logs, anchor text, and web n-grams. We then use these similarities to regularize the latent topics extracted from the top-ranked documents, i.e., the internal (implicit) subtopics. Empirical results show that regularization with explicit subtopics extracted from the right resource leads to improved diversification results, indicating that the proposed regularization with (explicit) external resources forms better (implicit) topic models. Click logs and anchor text are shown to be more effective resources than web n-grams under current experimental settings. Combining resources does not always lead to better results, but achieves a robust performance. This robustness is important for two reasons: it cannot be predicted which resources will be most effective for a given query, and it is not yet known how to reliably determine the optimal model parameters for building implicit topic models.
Using preference judgments for novel document retrieval BIBAFull-Text 861-870
  Praveen Chandar; Ben Carterette
There has been considerable interest in incorporating diversity in search results to account for redundancy and the space of possible user needs. Most work on this problem is based on subtopics: diversity rankers score documents against a set of hypothesized subtopics, and diversity rankings are evaluated by assigning a value to each ranked document based on the number of novel (and redundant) subtopics it is relevant to. This can be seen as modeling a user who is always interested in seeing more novel subtopics, with progressively decreasing interest in seeing the same subtopic multiple times. We put this model to test: if it is correct, then users, when given a choice, should prefer to see a document that has more value to the evaluation. We formulate some specific hypotheses from this model and test them with actual users in a novel preference-based design in which users express a preference for document A or document B given document C. We argue that while the user study shows the subtopic model is good, there are many other factors apart from novelty and redundancy that may be influencing user preferences. From this, we introduce a new framework to construct an ideal diversity ranking using only preference judgments, with no explicit subtopic judgments whatsoever.

Evaluation 2

Quality through flow and immersion: gamifying crowdsourced relevance assessments BIBAFull-Text 871-880
  Carsten Eickhoff; Christopher G. Harris; Arjen P. de Vries; Padmini Srinivasan
Crowdsourcing is a market of steadily-growing importance upon which both academia and industry increasingly rely. However, this market appears to be inherently infested with a significant share of malicious workers who try to maximise their profits through cheating or sloppiness. This serves to undermine the very merits crowdsourcing has come to represent. Based on previous experience as well as psychological insights, we propose the use of a game in order to attract and retain a larger share of reliable workers to frequently-requested crowdsourcing tasks such as relevance assessments and clustering. In a large-scale comparative study conducted using recent TREC data, we investigate the performance of traditional HIT designs and a game-based alternative that is able to achieve high quality at significantly lower pay rates, facing fewer malicious submissions.
An IR-based evaluation framework for web search query segmentation BIBAFull-Text 881-890
  Rishiraj Saha Roy; Niloy Ganguly; Monojit Choudhury; Srivatsan Laxman
This paper presents the first evaluation framework for Web search query segmentation based directly on IR performance. In the past, segmentation strategies were mainly validated against manual annotations. Our work shows that the goodness of a segmentation algorithm as judged through evaluation against a handful of human annotated segmentations hardly reflects its effectiveness in an IR-based setup. In fact, state-of the-art algorithms are shown to perform as good as, and sometimes even better than human annotations a fact masked by previous validations. The proposed framework also provides us an objective understanding of the gap between the present best and the best possible segmentation algorithm. We draw these conclusions based on an extensive evaluation of six segmentation strategies, including three most recent algorithms, vis-a-vis segmentations from three human annotators. The evaluation framework also gives insights about which segments should be necessarily detected by an algorithm for achieving the best retrieval results. The meticulously constructed dataset used in our experiments has been made public for use by the research community.
On per-topic variance in IR evaluation BIBAFull-Text 891-900
  Stephen E. Robertson; Evangelos Kanoulas
We explore the notion, put forward by Cormack & Lynam and Robertson, that we should consider a document collection used for Cranfield-style experiments as a sample from some larger population of documents. In this view, any per-topic metric (such as average precision) should be regarded as an estimate of that metric's true value for that topic in the full population, and therefore as carrying its own per-topic variance or estimate precision or noise. As in the two mentioned papers, we explore this notion by simulating other samples from the same large population. We investigate different ways of performing this simulation. One use of this analysis is to refine the notion of statistical significance of a difference between two systems (in most such analyses, each per-topic measurement is treated as equally precise). We propose a mixed-effects model method to measure significance, and compare it experimentally with the traditional t-test.
An uncertainty-aware query selection model for evaluation of IR systems BIBAFull-Text 901-910
  Mehdi Hosseini; Ingemar J. Cox; Natasa Milic-Frayling; Milad Shokouhi; Emine Yilmaz
We propose a mathematical framework for query selection as a mechanism for reducing the cost of constructing information retrieval test collections. In particular, our mathematical formulation explicitly models the uncertainty in the retrieval effectiveness metrics that is introduced by the absence of relevance judgments. Since the optimization problem is computationally intractable, we devise an adaptive query selection algorithm, referred to as Adaptive, that provides an approximate solution. Adaptive selects queries iteratively and assumes that no relevance judgments are available for the query under consideration. Once a query is selected, the associated relevance assessments are acquired and then used to aid the selection of subsequent queries. We demonstrate the effectiveness of the algorithm on two TREC test collections as well as a test collection of an online search engine with 1000 queries. Our experimental results show that the queries chosen by Adaptive produce reliable performance ranking of systems. The ranking is better correlated with the actual systems ranking than the rankings produced by queries that were selected using the considered baseline methods.

Representation

Improving retrieval of short texts through document expansion BIBAFull-Text 911-920
  Miles Efron; Peter Organisciak; Katrina Fenlon
Collections containing a large number of short documents are becoming increasingly common. As these collections grow in number and size, providing effective retrieval of brief texts presents a significant research problem. We propose a novel approach to improving information retrieval (IR) for short texts based on aggressive document expansion. Starting from the hypothesis that short documents tend to be about a single topic, we submit documents as pseudo-queries and analyze the results to learn about the documents themselves. Document expansion helps in this context because short documents yield little in the way of term frequency information. However, as we show, the proposed technique helps us model not only lexical properties, but also temporal properties of documents. We present experimental results using a corpus of microblog (Twitter) data and a corpus of metadata records from a federated digital library. With respect to established baselines, results of these experiments show that applying our proposed document expansion method yields significant improvements in effectiveness. Specifically, our method improves the lexical representation of documents and the ability to let time influence retrieval.
Extending BM25 with multiple query operators BIBAFull-Text 921-930
  Roi Blanco; Paolo Boldi
Traditional probabilistic relevance frameworks for informational retrieval refrain from taking positional information into account, due to the hurdles of developing a sound model while avoiding an explosion in the number of parameters. Nonetheless, the well-known BM25F extension of the successful Okapi ranking function can be seen as an embryonic attempt in that direction. In this paper, we proceed along the same line, defining the notion of virtual region: a virtual region is a part of the document that, like a BM25F-field, can provide a (larger or smaller, depending on a tunable weighting parameter) evidence of relevance of the document; differently from BM25F fields, though, virtual regions are generated implicitly by applying suitable (usually, but not necessarily, positional-aware) operators to the query. This technique fits nicely in the eliteness model behind BM25 and provides a principled explanation to BM25F; it specializes to BM25(F) for some trivial operators, but has a much more general appeal. Our experiments (both on standard collections, such as TREC, and on Web-like repertoires) show that the use of virtual regions is beneficial for retrieval effectiveness.
Rhetorical relations for information retrieval BIBAFull-Text 931-940
  Christina Lioma; Birger Larsen; Wei Lu
Typically, every part in most coherent text has some plausible reason for its presence, some function that it performs to the overall semantics of the text. Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts of a text are linked to each other. Knowledge about this so-called discourse structure has been applied successfully to several natural language processing tasks. This work studies the use of rhetorical relations for Information Retrieval (IR): Is there a correlation between certain rhetorical relations and retrieval performance? Can knowledge about a document's rhetorical relations be useful to IR? We present a language model modification that considers rhetorical relations when estimating the relevance of a document to a query. Empirical evaluation of different versions of our model on TREC settings shows that certain rhetorical relations can benefit retrieval effectiveness notably (>10% in mean average precision over a state-of-the-art baseline).
Modeling higher-order term dependencies in information retrieval using query hypergraphs BIBAFull-Text 941-950
  Michael Bendersky; W. Bruce Croft
Many of the recent, and more effective, retrieval models have incorporated dependencies between the terms in the query. In this paper, we advance this query representation one step further, and propose a retrieval framework that models higher-order term dependencies, i.e., dependencies between arbitrary query concepts rather than just query terms. In order to model higher-order term dependencies, we represent a query using a hypergraph structure -- a generalization of a graph, where a (hyper)edge connects an arbitrary subset of vertices. A vertex in a query hypergraph corresponds to an individual query concept, and a dependency between a subset of these vertices is modeled through a hyperedge. An extensive empirical evaluation using both newswire and web corpora demonstrates that query representation using hypergraphs is highly beneficial for verbose natural language queries. For these queries, query hypergraphs significantly improve the retrieval effectiveness of several state-of-the-art models that do not employ higher-order term dependencies.

Classification

Confidence-aware graph regularization with heterogeneous pairwise features BIBAFull-Text 951-960
  Yuan Fang; Bo-June (Paul) Hsu; Kevin Chen-Chuan Chang
Conventional classification methods tend to focus on features of individual objects, while missing out on potentially valuable pairwise features that capture the relationships between objects. Although recent developments on graph regularization exploit this aspect, existing works generally assume only a single kind of pairwise feature, which is often insufficient. We observe that multiple, heterogeneous pairwise features can often complement each other and are generally more robust in modeling the relationships between objects. Furthermore, as some objects are easier to classify than others, objects with higher initial classification confidence should be weighed more towards classifying related but more ambiguous objects, an observation missing from previous graph regularization techniques. In this paper, we propose a Dirichlet-based regularization framework that supports the combination of heterogeneous pairwise features with confidence-aware prediction using limited labeled training data. Next, we showcase a few applications of our framework in information retrieval, focusing on the problem of query intent classification. Finally, we demonstrate through a series of experiments the advantages of our framework on a large-scale real-world dataset.
A utility-theoretic ranking method for semi-automated text classification BIBAFull-Text 961-970
  Giacomo Berardi; Andrea Esuli; Fabrizio Sebastiani
In Semi-Automated Text Classification (SATC) an automatic classifier F labels a set of unlabelled documents D, following which a human annotator inspects (and corrects when appropriate) the labels attributed by F to a subset D' of D, with the aim of improving the overall quality of the labelling. An automated system can support this process by ranking the automatically labelled documents in a way that maximizes the expected increase in effectiveness that derives from inspecting D. An obvious strategy is to rank D so that the documents that F has classified with the lowest confidence are top-ranked. In this work we show that this strategy is suboptimal. We develop a new utility-theoretic ranking method based on the notion of inspection gain, defined as the improvement in classification effectiveness that would derive by inspecting and correcting a given automatically labelled document. We also propose a new effectiveness measure for SATC-oriented ranking methods, based on the expected reduction in classification error brought about by partially inspecting a list generated by a given ranking method. We report the results of experiments showing that, with respect to the baseline method above, and according to the proposed measure, our ranking method can achieve substantially higher expected reductions in classification error.
Improving tweet stream classification by detecting changes in word probability BIBAFull-Text 971-980
  Kyosuke Nishida; Takahide Hoshide; Ko Fujimura
We propose a classification model of tweet streams in Twitter, which are representative of document streams whose statistical properties will change over time. Our model solves several problems that hinder the classification of tweets; in particular, the problem that the probabilities of word occurrence change at different rates for different words. Our model switches between two probability estimates based on full and recent data for each word when detecting changes in word probability. This switching enables our model to achieve both accurate learning of stationary words and quick response to bursty words. We then explain how to implement our model by using a word suffix array, which is a full-text search index. Using the word suffix array allows our model to handle the temporal attributes of word n-grams effectively. Experiments on three tweet data sets demonstrate that our model offers statistically significant higher topic-classification accuracy than conventional temporally-aware classification models.
Predicting quality flaws in user-generated content: the case of wikipedia BIBAFull-Text 981-990
  Maik Anderka; Benno Stein; Nedim Lipka
The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. We apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. We present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. We argue that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem. We develop a quality flaw model and employ a dedicated machine learning approach to predict Wikipedia's most important quality flaws. Since in the Wikipedia setting the acquisition of significant test data is intricate, we analyze the effects of a biased sample selection. In this regard we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. The flaw prediction performance is evaluated with 10,000 Wikipedia articles that have been tagged with the ten most frequent quality flaws: provided test data with little noise, four flaws can be detected with a precision close to 1.

Doctoral submissions

A knowledge-based approach for summarising opinions BIBAFull-Text 991
  Marco Bonzanini
Automatic Document Summarisation plays a central role in the process of providing the user with a quick access to information. Applications range from the generation of news headlines, to the aggregation of opinions extracted from reviews. Traditional topic-based summarisation systems are not always able to capture the sentiments expressed in a review. Major efforts in sentiment analysis have been put in the tasks of mining and classifying reviews according to their polarity. In this research, we investigate the use of summarisation techniques applied to reviews, and we propose a knowledge-based approach to summarisation, in the context of sentiment analysis. The proposed research is focused on three different aspects. Firstly, we investigate the application of summarisation techniques to sentiment classification. Capturing the key passage of a review can be beneficial for both a sentiment classifier, and for a user who could potentially understand the polarity of a review without reading the full text. Secondly, we investigate how to combine knowledge extracted from the reviews or integrated from external sources, with the purpose of producing opinion-oriented summaries. Thirdly, we analyse the possibility of generating personalised (user-oriented or query-biased) opinion-based summaries.
Adaptive IR for exploratory search support BIBAFull-Text 992
  Daniel T. J. Backhausen
Most Information Retrieval (IR) software is designed to fit a general user where users are submitting queries and the retrieval system returns a ranked list of results. Regardless of the user, the query always returns the same list of results. Individual aspects like age, gender, profession or experience are often not taken into account, for example the difference in searching between children and adults. Although long challenged by works such as Bates' berrypicking model [1], common systems still assume that the user has a static information need which remains unchanged during the seeking process. Moreover many systems are strongly optimized for lookup searches, expecting that the user is only interested in facts and not in complex problem solving.
   But in many everyday situations people search for information to gain knowledge which allows them to fulfill a specific work task (e.g., [3]), like answering research questions, investigating for a publication or thesis, comparing different products or learning a language. Such complex tasks can be divided into sub-tasks and generally include multiple exploratory search sessions, in which the user strongly interacts with the system. This is a longitudinal process where the searcher necessarily gathers, collects, aggregates, interprets, processes, and evaluates information objects from one or more sources. In such complex search scenarios all three activities lookup, learn, and investigate are used in conjunction with one another to bridge the users knowledge gap [2]. In each step of this process, the user faces a new situation in which knowledge and information need changes. This influences the relevance of information objects and may direct the user to different topics, domains, or also tasks.
   The goal of this research is to effectively assist at fulfilling complex (work) tasks consisting of multi-session exploratory search activities. To achieve this, information retrieval needs personalization and has to close the gaps between the different search sessions. This can be done by enabling the user to collect information objects into a personal reference library and visualizing past search activities in a kind of breadcrumb or time line.
   Thinking one step further, a personalized IR system (PIR) has to adapt to relevant factors and commit itself to the specific user and the personal search behavior. This means the system needs to guide the user through the searching process, suggesting useful search actions like effective search strategies or query formulations and has to recommend information objects relevant to the work task and the users current situation. Thereby the system has to be aware of the user and specific contextual circumstances. General information about the user like gender or age can be fetched explicitly, allowing to adapt in a more coarse grained way (i.e. decide the way of presenting results based on the user group). Moreover integrating used applications or providing other ways to let the user explicitly manage tasks will help to understand the goal of the users search activities and will provide much better ways of user assistance.
   To close the gap between user and system, both behavioral and contextual information are necessary. Information about the search behavior and indirectly the users knowledge and expertise can be conveyed by logging (e.g. query logs) and examining system interactions. The fetched data should be made transparent to the user, showing what kind of information has been gathered so far. The implicit information has to be refined with other contextual information collected implicitly from different interfaces or sensors (e.g. time, location) and explicitly by direct user input from e.g. relevance feedback interactions. This will allow a more fine grained way of system adaption and offers new options in assisting the user during the long-term search activities showing personalized search strategies and possible next steps appropriate to the information need and level of experience.
Adversarial content manipulation effects BIBAFull-Text 993
  Fiana Raiber
We address a question that has been somewhat overlooked throughout the transition from classical ad hoc retrieval to Web search: how is the performance of classical retrieval approaches affected by the presence of content manipulation? Our initial experiments have shown that the relative performance patterns of some classical retrieval strategies might change in the transition from non-manipulated to manipulated corpora. A natural future venue to explore is how to mix these strategies and make (some of) them more robust under presumed content manipulation conditions.
Building reputation and trust using federated search and opinion mining BIBAFull-Text 994
  Somayeh Khatiban
The term online reputation addresses trust relationships amongst agents in dynamic open systems. These can appear as ratings, recommendations, referrals and feedback. Several reputation models and rating aggregation algorithms have been proposed. However, finding a trusted entity on the web is still an issue as all reputation systems work individually. The aim of this project is to introduce a global reputation system for electronic product reviews that aggregates people's opinions from different resources (e.g. e-commerce websites, and review) with the help federated search techniques and generate a high quality and trusted result. The first step is to choose a range of product review collections from e-commerce review systems (e.g. Amazon), online review sites (e.g. Epinions), social networks (e.g. Facebook), question and answering sites (e.g. Yahoo! Answers), and blog (e.g. My Nokia Blog) resources. By using a federated search approach the query (product name) will be broadcasted to the selected resources and the result will be a list of reputation data with various formats including star rating, text reviews, voting, video, and so on. The focus of this work is on review text data and star ratings.
   A number of challenges including comparison issues (e.g. scale of star ratings: five-star vs. ten-star), hierarchical reviews (e.g. comments about reviews), choice of resources (e.g. choosing relevant sources deepens upon query), display issue (e.g. easy for the user), generalization issue (e.g. apply it on other domains), synchronization problem (e.g. generate up-to-date results), and high quality and trusted reviews will be addressed.
   A sentiment analysis approach is subsequently used to extract high quality opinions and inform how to increase trust in the search result. The extracted opinions will be used to generate facets for the global reputation system.
Enhancing knowledge base with knowledge transfer BIBAFull-Text 995
  Si-Chi Chin
A Knowledge Base (KB) stores, organizes, and shares information pertinent to entities (i.e. KB nodes) such as people, organizations, and events. A large KB system, such as Wikipedia, relies on human curators to create and maintain the content in the systems. However, it has become challenging for human curators to sift through the rapidly growing amount of information and filter out the information irrelevant to a KB node. The area of Knowledge Base Enhancement (KBE) aims to explore and identify automatic methods to assist human curators to accelerate the process. KBE can be viewed as a special case of Information Filtering (IF). However, the lack of high-quality labelled data introduces a major challenge to train a satisfying model for the task. Transfer learning provides solutions to the problem and has explored applications in the area of text mining, whereas direct application to KBE or IF remains absent.
   Transfer learning is a research area in machine learning, emphasizing the reuse of previously acquired knowledge to another applicable task. The method is particularly useful in the situations where labeled instances are absent or difficult to obtain. To accelerate the growth of a KB, a transfer learning approach enables leveraging the heuristics and models learned from one KB node to another. For example, reusing the learned filtering models from Willie Nelson, a famous country singer, to Eddie Rabbitt, another country singer.
   Transfer learning requires three components: the target task (e.g. the problem to be solved), the source task(s) (e.g. auxiliary data, previously studied problem), and criteria to select appropriate source tasks. The objectives of my dissertation are twofold. First, it explores methods to identify informative source nodes from which to transfer. Second, it constructs a knowledge transfer network to represent the transfer learning relationship between KB nodes.
   This proposed research applies a transfer learning method -- Segmented Transfer (ST) -- and a knowledge representation -- Knowledge Transfer Network (KTN) -- to approach the area of KBE. The primary research questions include: What are the transferable objects in information filtering algorithms? What are the KB nodes of high transferability? What are the factors that determine the transfer learning relationship? Does it manifest on a knowledge transfer network representation?
   This interdisciplinary research crosses the study area of information filtering, machine learning, knowledge representation, and network analysis. This proposal motivates the problem of KBE, discusses the research methodology and proposed experiments, and reviews related works in information filtering and transfer learning. This line of research hopes to extend the application of transfer learning to KBE and to explore a new dimension of IF. The proposed ST and KTN intends to bring interdisciplinary approaches in the emerging field of KBE.
Improving e-discovery using information retrieval BIBAFull-Text 996
  Kripabandhu Ghosh
E-discovery is the requirement that the documents and information in electronic form stored in corporate systems be produced as evidence in litigation. It has posed great challenges for legal experts. Legal searchers have always looked to find "any and all" evidence for a given case. Thus, a legal search system would essentially be a recall-oriented system. It has been a common practice among expert searchers to formulate Boolean queries to represent their information need. We want to work on three basic problems: Boolean query formulation -- Our primary goal is to study Boolean query formulation in the light of the E-discovery task. This will include automatic Boolean query generation, expansion and learning the effect of proximity operators in Boolean searches. Data fusion -- We would also like to explore the effectiveness of data fusion techniques in improving recall. Error modeling -- Finally, we will work on error modeling methods for noisy legal documents.
Opinion influence and diffusion in social network BIBAFull-Text 997
  Dehong Gao
Nowadays, more and more people tend to make decisions based on the opinion information from the Internet, in addition to recommendations from offline friends or parents. For example, we may browse the resumes and comments on election candidates to determine if one candidate is qualified, or consult the consumer reports or reviews on special e-commercial websites to decide which brand of computer is suitable for one's needs. Though opinion information is rich on the Internet, [2] points out that 58% of American Internet users deem that online information is irretrievable, confusing, or conflicting with each other. Early works on opinion mining help to classify opinion polarity, to extract specific opinions and to summarize opinion texts. However, all these works are usually based on plain texts (reviews, comments or news articles). With the explosion of Web 2.0 applications, especially social network applications like blogs, discussion forums, micro-blogs, the massive individual users go to the major media websites, which leads to much more opinion materials posted on the Internet by user-shared experiences or views [3]. These opinion-rich and social network-based applications bring new perspectives for opinion mining as well. First, in addition to plain texts (reviews, newswire) in traditional opinion mining, we see new types of cyber-based text, like personal diary blogs, cyber-SMS tweets. Second, if we regard the opinions in plain text as static, the dynamic change of opinions in the social network is a new promising area, and catch increasing attention of worldwide researchers. In the social network, the opinion held by one individual is not static, but changes, which can be influenced by others. A serial of changes among different users forms the opinion propagation or diffusion in the network.
   This paper and my doctoral work focus on the opinion influence and diffusion in the social network, which explore the detailed process of one-to-one influence and the opinion diffusion process in the social network. The significance of this work is it can benefit many other related researches, like information maximum, viral marketing. Now some pioneering works have been conducted to investigate the role of social networks in information diffusion and influencers in the social network. These works are usually based on information diffusion models, like the cascade model (CM) or epidemic model (EM). However, we argue that it is not enough to simply apply these models to opinion influence and diffusion. 1) For both CM and EM, status shift is along specific directions, from inactive to active (CM) or from susceptible to infectious, and then, to recovered (EM). But opinion influence is more complex.
Relevance as a subjective and situational multidimensional concept BIBAFull-Text 998
  Carsten Eickhoff
Relevance is the central concept of information retrieval. Although its important role is unanimously accepted among researchers, numerous different definitions of the term have emerged over the years. Considerable effort has been put into creating consistent and universally applicable descriptions of relevance in the form of relevance frameworks. Across these various formal systems of relevance, a wide range of relevance criteria has been identified. The probably most frequently used single criterion, that in some applications even becomes a synonym for relevance, is topicality. It expresses a document's topical overlap with the user's information need. For textual resources, it is often estimated based on term co-occurrences between query and document. There is, however, a significant number of further noteworthy relevance criteria. Prominent specimen are: (Currency) determines how recent and up to date the document is. Outdated information may have become invalid over time.
   (Availability) expresses how easy it is to obtain the document. Users might not want to invest more than a threshold amount of resources (e.g., disk space, downloading time or money) to get the document.
   (Readability) describes the document's readability and understandability. A document with a high topical relevance towards a given information need can become irrelevant if the user is not able to extract the desired information from it.
   (Credibility) contains criteria such as the document author's expertise, the publication's reputation and the document's general trustworthiness.
   (Novelty) describes the document's contribution to satisfying an information need with respect to the user's context. E.g., previous search results or general knowledge about the domain.
   It is evident that these criteria can have very different scopes. Some of them are static characteristics of the document or the author, others depend on the concrete information need at hand or even the user's search context. Currently, state-of-the-art retrieval models often treat relevance (regardless which interpretation of the term was chosen) as an atomic concept that can be expressed through topical overlap between document and query or a plain linear combination of multiple scores. Considering the broad audiences a web search engine has to serve, such a method does not seem optimal as the concrete composition of relevance will vary from person to person depending on social and educational context. Furthermore, each individual can be expected to have situational preference for certain combinations of relevance facets depending on the information need at hand. We investigate combination schemes which respect the dimension-specific relevance distributions. In particular, we developed a risk-aware method of combining relevance criteria inspired by the economic Portfolio theory. As a first stage, we applied this method for result set diversification across dimensions.
Exploiting temporal topic models in social media retrieval BIBAFull-Text 999
  Tuan A. Tran
Many of user generated contents in the Web 2.0 center around real-world incidents such as Japanese tsunami, or general concerns such as recent economic downturn. Such type of information is always of interest to users. For instance, when a user reads a news article about a tsunami in Japan, she wants to see related Flickr photos or more tweets about it. Conventional keyword-based search is inappropriate, since it is not always trivial to formulate ad-hoc interests about the event and material. In some cases, the user might want to explore emerging topics that dominate different sources. Present systems fail to connect topically documents across media, and the user has to examine individual sources to infer the topics herself.
   In this work, we address a special type of user information need, temporal topic, which refers to any abstract matter active within some points or periods of time. A temporal topic can be a real-world event, e.g. the Arab Spring revolution, but can also be a less conceivable subject, e.g. the study of vacuum tube computers in 1950s. Topics can also be recurrent such as the US presidency campaigns. There are extensive studies on how to detect topics from a collection of documents, but little uses temporal topics as part of user interest to retrieve documents. We believe that temporal topic-based retrieval is a one solution to improve user experience of present IR systems, as well as to benefit other applications (e.g. topic-sensitive online advertisement).
   Our research goal can be defined in three research questions. The first question involves finding latent temporal topics in a social media stream, where documents are well equipped with meta-data (timestamps, geo-spatial data, etc.). Following mixture models such as LDA, we treat each document as a mix of different temporal topic models, each model is incorporated with time. A temporal topic consists of at least two types of attributes -- time and representing words, as similar to [4]. The dynamics of temporal topics can be characterized in a timeline fashion [4], or using hierarchical structures [1]. The challenge lies in devising a model flexible enough to diverse and rapidly changing data without many parameter assumptions. For this, we see Bayesian nonparametrics [3] as one promising solution, and will extend it to temporal dimension.
   The second research question is how to retrieve and rank documents from different social media sites, based on their relevance to one or several given temporal topics. We identify some following challenges. The first one is representing temporal topics as queries: although there have been attempts using keywords and time window separately [2], we aim to unify time and (topical) words in a single query model. The second challenge is integrating temporal topic models into ranking models. Inspired by our previous work [4], we will use language models to capture the relevance scores between documents and topics, and investigate advanced methods to index the scores effectively.
   Our last question involves connecting a given document to documents in other sources (data streams or corpora) that shared one of its latent temporal topics. This task does not only provide unified insight into different social media sites, but also help improve the quality of models by data in diverse sources. However, formalizing the semantics of "similarity" for documents in different settings based on temporal topcis is tricky. One baseline method is to apply Kullback-Leibler divergence on comparable features (TF-IDF, n-grams, photo tags, timestamps,..). We can also use language models [5] to construct a language model for each candidate document, then estimate how likely it generates the document of interest within a given temporal topic.
The essence of time: considering temporal relevance as an intent-aware ranking problem BIBAFull-Text 1000
  Stewart Whiting
Real-time news and social media quickly reflect large-scale phenomena and events. As users become exposed to this information, time plays a central role in prompting both information authorship and seeking activities.
   The objective of this research is to develop a retrieval system which can anticipate a user's likely temporal intent(s), considering recent or ongoing real-world events. Such a system should not only provide recent news when relevant, but also higher rank non-timestamped or even older documents which are temporally pertinent as they cover aspects related to recent event topics.
   Key challenges to be addressed in this work include: a suitable source and method for event detection and tracking, an intent-aware ranking approach and an evaluation methodology.

Demonstrations

A framework for manipulating and searching multiple retrieval types BIBAFull-Text 1001
  Marc-Allen Cartright; Ethem F. Can; William Dabney; Jeff Dalton; Logan Giorda; Kriste Krstovski; Xiaoye Wu; Ismet Zeki Yalniz; James Allan; R. Manmatha; David A. Smith
Conventional retrieval systems view documents as a unit and look at different retrieval types within a document. We introduce Proteus, a frame-work for seamlessly navigating books as dynamic collections which are defined on the fly. Proteus allows us to search various retrieval types. Navigable types include pages, books, named persons, locations, and pictures in a collection of books taken from the Internet Archive. The demonstration shows the value of multi-type browsing in dynamic collections to peruse new data.
A visual tool for Bayesian data analysis: the impact of smoothing on naive Bayes text classifiers BIBAFull-Text 1002
  Giorgio Maria Di Nunzio; Alessandro Sordoni
Naive-Bayes (NB) classifiers are simple probabilistic classifiers still widely used in supervised learning due to their tradeoff between efficient model training and good empirical results. One of the drawbacks of these classifiers is that in situations of data sparsity (i.e. when the size of training set is small) the maximum likelihood estimation of the probability of unseen features in these situations is equal to zero causing arithmetic anomalies. To prevent this undesirable behavior, a number of smoothing techniques have been proposed. Among these, the Bayesian approach incorporates smoothing in terms of prior knowledge about the parameters of the model usually called hyper-parameters. Our research question is: can a visualization tool help researchers to quickly assess the goodness of the performance of NB classifiers by setting optimal smoothing parameters?
ALF: a client side logger and server for capturing user interactions in web applications BIBAFull-Text 1003
  Leif Azzopardi; Myles Doolan; Richard Glassey
This demonstration paper introduces ALF which provides a light-weight client side logging application and a server for collecting user interaction data. ALF has been designed as a loosely coupled independent service that runs in parallel with the IR web application that requires logging.
ChatNoir: a search engine for the ClueWeb09 corpus BIBAFull-Text 1004
  Martin Potthast; Matthias Hagen; Benno Stein; Jan Graßegger; Maximilian Michel; Martin Tippmann; Clement Welsch
We present the ChatNoir search engine which indexes the entire English part of the ClueWeb09 corpus. Besides Carnegie Mellon's Indri system, ChatNoir is the second publicly available search engine for this corpus. It implements the classic BM25F information retrieval model including PageRank and spam likelihood. The search engine is scalable and returns the first results within three seconds, which is significantly faster than Indri. A convenient API allows for implementing reproducible experiments based on retrieving documents from the ClueWeb09 corpus. The search engine has successfully accomplished a load test involving 100,000 queries.
CrowdTerrier: automatic crowdsourced relevance assessments with terrier BIBAFull-Text 1005
  Richard McCreadie; Craig Macdonald; Iadh Ounis
In this demo, we present CrowdTerrier, an infrastructure extension to the open source Terrier IR platform that enables the semi-automatic generation of relevance assessments for a variety of document ranking tasks using crowdsourcing. The aim of CrowdTerrier is to reduce the time and expertise required to effectively Crowdsource relevance assessments by abstracting away from the complexities of the crowdsourcing process. It achieves this by automating the assessment process as much as possible, via a close integration of the IR system that ranks the documents (Terrier) and the crowdsourcing marketplace that is used to assess those documents (Amazon's Mechanical Turk).
Distilling and exploring nuggets from a corpus BIBAFull-Text 1006
  Vittorio Castelli; Hema Raghavan; Radu Florian; Ding-Jung Han; Xiaoqiang Luo; Salim Roukos
This paper describes a live and scalable system that automatically extracts information nuggets for entities/topics from a continuously updated corpus for effective exploration and analysis. A nugget is a piece of semantic information that (1) must be mapped semantically to the transitive closure of a pre-defined ontology, (2) is explicitly supported by text, and (3) has a natural language description that completely conveys its semantic to a user. Fig. 1 shows a type of nugget "involvement in events" for a person entity (Leon Panetta): each nugget has a short description ("meeting", "news conference") with a list of supporting passages.
   Our key contributions are (1) We extract nuggets and remove redundancy to produce a summary of salient information with supporting clusters of passages. (2) We present an entity/topic centric exploration interface that also allows users to navigate to other entities involved in a nugget. (3) We use the statistical NLP technologies developed over the years in the ACE, GALE and TAC-KBP programs, including parsing, mention detection, within and cross document coreference resolution, relation detection and slot filler extraction. (4) Our system is flexible and easily adaptable across domains as demonstrated on two corpora: generic news and scientific papers. Search engines such as Google News and Scholar do not retrieve nuggets, and only remove redundancy at document level. News aggregation applications such as Evri categorize news articles based on the entities of topics but do not extract nuggets. Other systems extract richer information, but not all of it has clear semantics; e.g., Silobreaker presents results as "the relationship between X and Y in the context of [keyphrase]", leaving users with the task of interpreting the semantics as it is not tied to a clear ontology. In contrast we remove redundancy, summarize results and present nuggets that have clear semantics.
Integrative online research-data management BIBAFull-Text 1007
  Michael Huggett; Edie Rasmussen
In support of our research projects in information retrieval, we have developed an integrated multi-process software system that shepherds research data from induction through aggregation, analysis, and presentation. We combine public-domain code libraries with our own software to provide a flexible, easily-configured modular system that exposes data online for easier collaboration. The goal is to create a single online infrastructure that allows colleagues to submit, process, analyze and visualize data, and discuss and prioritize issues through a single integrated interface. We demonstrate our system within the context of the large data set provided by the Indexer's Legacy project [1].
MaSe: create your own mash-up search interface BIBAFull-Text 1008
  Leif Azzopardi; Douglas Dowie; Kelly Ann Marshall; Richard Glassey
MaSe provides a sandbox environment for high school students to create their own personalised search interface. It has been designed with two major goals in mind: (1) as a hands-on tutorial for school children, to excite them about programming and computing science through the development of a practical application, and (2) to enable children to design and create their own search interface without extensive programming knowledge or prior experience. Consequently, MaSe provides a way to ascertain what children would like from a search engine interface in an exploratory and creative way as they can create a working prototype. This approach contrasts with previous work on exploring children's requirements of IR systems which attempts to directly elicit user needs through more traditional methods (i.e. surveys, interviews, focus groups, etc). However, we have attempted to incorporate the design guidelines for children as identified by Large (2006) into MaSe, where: we make use of bright colours, large text fonts, spell checking and the use of icons to represent search services, as well as including a thematic experience as suggested by Large (2006), with the use of a puppy avatar and puppy dog footprints.
myDJ: recommending karaoke songs from one's own voice BIBAFull-Text 1009
  Kuang Mao; Xinyuan Luo; Ke Chen; Gang Chen; Lidan Shou
In this demo, we present myDJ, a karaoke recommendation system which recommends the songs people are capable to sing. Different from the existing song recommendation systems which recommend songs people like to listen, myDJ can recommend proper songs according to a subject's physical phonation area. It consists of a singer profiler to analyze the subject's phonation characters. In addition, the song profile for each song in database is extracted. To learn a ranking function, the learning to rank algorithm Listnet is applied under a list of predefined features extracted from each singer-song profile pair. In the results, proper songs which are suitable but challenging for the subject are recommended.
PageFetch: a retrieval game for children (and adults) BIBAFull-Text 1010
  Leif Azzopardi; Jim Purvis; Richard Glassey
Children often struggle with information retrieval tasks as searching for information often requires a developed vocabulary and strong categorisation skills; neither of which are particularly developed in children under the age of 12. In a study conducted by Druin et al, it was found that in an experimental setting many children are often uninterested in searching for information online or are only interested in searching for information that is relevant to their personal interests. Consequently, children who were unmotivated were the least successful in completing information retrieval tasks in their study. It was suggested that a more effective means of engaging child participants in search studies must be developed in order to gain further insights into the searching behaviours of children. To this end we have developed a game called PageFetch which aims to engage children (aged 8 to 80) in completing search tasks through a fun and interactive search-like interface.
Pictune: situational music recommendation from geotagged pictures BIBFull-Text 1011
  Ke Chen; Gang Chen; Lidan Shou; Fei Xia
Political search trends BIBAFull-Text 1012
  Ingmar Weber; Venkata Rama Kiran Garimella; Erik Borra
We present Political Search Trends, a browser based web search analysis tool that (i) assigns a political leaning to web search queries, (ii) detects trending political queries in a given week, and (iii) links search queries to fact-checked statements. In terms of methodology, it showcases the power of analyzing queries leading to clicks on selected, annotated web sites of interest.
RDF Xpress: a flexible expressive RDF search engine BIBAFull-Text 1013
  Shady Elbassuoni; Maya Ramanath; Gerhard Weikum
We demonstrate RDF Xpress, a search engine that enables users to effectively retrieve information from large RDF knowledge bases or Linked Data Sources. RDF Xpress provides a search interface where users can combine triple patterns with keywords to form queries. Moreover, RDF Xpress supports automatic query relaxation and returns a ranked list of diverse query results.
Sketch-based image similarity search with a pen and paper interface BIBAFull-Text 1014
  Ihab Al Kabary; Heiko Schuldt
We present a novel and innovative user interface for query-by-sketching based image retrieval that exploits emergent interactive paper and digital pen technology. Users can draw sketches with a digital pen on interactive paper in a user-friendly way. The pen is able to capture the stroke vectors and to interactively stream them to the underlying content-based image retrieval (CBIR) system via the pen's Bluetooth interface. We present the integration of interactive paper/digital pen technology with QbS, our CBIR system tailored to Query-by-Sketching, and we demonstrate the use of the paper and pen interface together with QbS for three different collections: MIRFLICKR-25K, a cartoon collection, and a collection of medieval paper watermarks.
Task-aware search assistant BIBFull-Text 1015
  Henry Allen Feild; James Allan
TweetSpector: entity-based retrieval of tweets BIBAFull-Text 1016
  Surender Reddy Yerva; Zoltan Miklos; Flavia Grosan; Alexandru Tandrau; Karl Aberer
TweetSpector is a tool for demonstrating entity-based of retrieval of tweets. The various features of this tool include: entity profile creation, real-time tweet classification, active improvement of the created profiles through user feedback, and the dashboard displaying different metrics.
YooSee: a video browsing application for young children BIBAFull-Text 1017
  Leif Azzopardi; Douglas Dowie; Kelly Ann Marshall
Nowadays children as young as two years old can easily interact with mobile touch screen devices and personal computers to watch online videos through services such as YouTube. However, such services present a number of challenges for young children (e.g. fine grain gestures/interactions and good typing/literacy skills). In addition, when children use such services there is a risk that they may stumble upon content that is inappropriate. YooSee is a web-based application developed using the PuppyIR framework and designed for children aged between two and six years old. YooSee enables children to: (1) search and browse through video content using an engaging, novel interaction paradigm, and (2) be able to safely enjoy moderated video content.
Multi-platform image search using tag enrichment BIBAFull-Text 1018
  Jinming Min; Cristover Lopes; Johannes Leveling; Dag Schmidtke; Gareth J. F. Jones
The number of images available online is growing steadily and current web search engines have indexed more than 10 billion images. Approaches to image retrieval are still often text-based and operate on image annotations and captions. Image annotations (i.e. image tags) are typically short, user-generated, and of varying quality, which increases the mismatch problem between query terms and image tags. For example, a user might enter the query "wedding dress" while all images are annotated with "bridal gown" or "wedding gown". This demonstration presents an image search system using reduction and expansion of image annotations to overcome vocabulary mismatch problems by enriching the sparse set of image tags.
   Our image search application accepts a written query as input and produces a ranked list of result images and annotations (i.e. image tags) as output. The system integrates methods to reduce and expand the image tag set, thus decreasing the effect of sparse image tags. It builds on different image collections such as the Wikipedia image collection (http://www.imageclef.org/wikidata) and the Microsoft Office.com ClipArt collection (http://office.microsoft.com/), but can be applied to social collections such as Flickr as well. Our demonstration system runs on PCs, tablets, and smartphones, making use of advanced user interface capabilities on mobile devices.

Industry talk abstracts

IR paradigms in computational advertising BIBAFull-Text 1019
  Andrei Z. Broder
The central problem in the emerging discipline of computational advertising is to find the "best match" between a given user in a given context and a suitable advertisement. The context could be a user entering a query in a search engine ("sponsored search"), a user reading a web page ("content match" and "display ads"), a user streaming a movie, and so on. In some situations, it is desirable to solve the "dual" optimization problem: rather then find the best ad given a user in a context, the goal is to identify the "best audience", i.e. the most receptive set of users and/or the most suitable contexts for a given advertising campaign. The information about the user can vary from scarily detailed to practically nil. The number of potential advertisements might be in the billions. Thus, depending on the definition of "best match" and "best audience" these problems lead to a variety of massive optimization problems, with complicated constraints, and challenging data representation and access issues.
   In general, the direct problem is solved in two stages: first a rough filtering is used to identify a relatively small set of ads to be considered as potential matches, followed by a more sophisticated secondary ranking where economics considerations take center stage. Historically, the filtering has been conceived as a database selection problem, and was done using simple Boolean formulae, for instance, in sponsored search the filter could be "all ads that provide a specific bid for the present query string or a subset of it". Similarly for the dual problem (audience definition) for, say, a sports car ad, the filter could be "all males in California, aged 40 or less".
   This "database approach" for the direct problem has been recently supplanted by an "IR approach" based on a similarity search between a carefully constructed query that captures the advertising opportunity and an annotated document corpus that represents the potential ads. Similarly, in the dual problem, the newer approach is to devise an efficient and effective representation of the users, then form a query that represents a prototypical ideal user, and finally find the users most similar to the prototype. The aim of this talk is to discuss the penetration of the IR paradigms in computational advertising and present some research challenges and opportunities in this area of enormous economic importance.
Watson: the Jeopardy! challenge and beyond BIBAFull-Text 1020
  Eric W. Brown
Watson, named after IBM founder Thomas J. Watson, was built by a team of IBM researchers who set out to accomplish a grand challenge-build a computing system that rivals a human's ability to answer questions posed in natural language with speed, accuracy and confidence. The quiz show Jeopardy! provided the ultimate test of this technology because the game's clues involve analyzing subtle meaning, irony, riddles and other complexities of natural language in which humans excel and computers traditionally fail. Watson passed its first test on Jeopardy!, beating the show's two greatest champions in a televised exhibition match, but the real test will be in applying the underlying natural language processing and analytics technology in business and across industries. In this talk I will introduce the Jeopardy! grand challenge, present an overview of Watson and the DeepQA technology upon which Watson is built, and explore future applications of this technology.
Putting context into search and search into context BIBAFull-Text 1021
  Susan T. Dumais
It is very challenging task to understand a short query, especially if that query is considered in isolation. Luckily, queries do magically appear in a search box -- rather, they are issued by real people, trying to accomplish a task, at a given point in time and space, and this "context" can be used to aid query understanding. Traditionally search engines have returned the same results to everyone who asks the same question. However, using a single ranking for everyone, in every context limits how well a search engine can do. In this talk I outline a framework to quantify the "potential for personalization", that can be used to characterize the extent to which different people have the same (or different) intents for a query. I then describe several examples of how we represent and use different kinds of context to improve search quality. Finally I conclude by highlighting some important challenges in developing such systems at Web scale including system optimization, evaluation, transparency and serendipity.
CloudSearch and the democratization of information retrieval BIBAFull-Text 1022-1023
  Daniel E. Rose
Amazon CloudSearch is a new hosted search service, built on top of many cloud-based AWS services, and based on the same technology that powers search on Amazon's retail sites. Because of its ease of configuration and scalability, CloudSearch represents the next step in the democratization of information retrieval. This democratization process, increasing access to search for both end users and potential search providers, has continued over several decades, through technologies like early online metered search services, enterprise search software, web search, and open source search tools. CloudSearch further reduces barriers to entry, allowing a person or organization to basically say "make my content searchable" and have it happen automatically. CloudSearch may also offer an opportunity to overcome the stagnation that has occurred in search user experiences over the past 15 years. When you no longer need to be a search expert to make your content available, you're not stuck with ten blue links. Instead, you can focus on providing the kind of interaction that makes sense for your application and your users. CloudSearch enables a flowering of search applications that need not be tied to the web, and an opportunity to explore new ways of interacting with information retrieval technology.
Entity sentiment extraction using text ranking BIBAFull-Text 1024
  John O'Neil
Entity extraction and sentiment classification are among the most common types of information derived from documents, but the problem of directly associating entities and sentiment has received less attention. We use TextRank on a graph linking entities and sentiment-laden words and phrases. We extract from the resulting eigenvector the final sentiment weights of the entities. We then explore the algorithm's performance and accuracy, compared to a baseline.

Poster abstracts

A hybrid model for ad-hoc information retrieval BIBAFull-Text 1025-1026
  Zheng Ye; Jimmy Xiangji Huang; Jun Miao
Many information retrieval (IR) techniques have been proposed to improve the performance, and some combinations of these techniques has been demonstrated to be effective. However, how to effectively combine them is largely unexplored. It is possible that a method reduces the positive influence of the other one even if both of them are effective separately. In this paper, we propose a new hybrid model which can simply and flexibly combine components of three different IR techniques under a uniform framework. Extensive experiments on the TREC standard collections indicate that our proposed model can outperform the best TREC systems consistently in the ad-hoc retrieval. It shows that the combination strategy in our proposed model is very effective. Meanwhile, this method is also re-useable for other researchers to test whether their new methods are additive to the current technologies.
Exploiting paths for entity search in RDF graphs BIBAFull-Text 1027-1028
  Minsuk Kahng; Sang-goo Lee
The field of entity search using Semantic Web (RDF) data has gained more interest recently. In this paper, we propose a probabilistic entity retrieval model for RDF graphs using paths in the graph. Unlike previous work which assumes that all descriptions of an entity are directly linked to the entity node, we assume that an entity can be described with any node that can be reached from the entity node by following paths in the RDF graph. Our retrieval model simulates the generation process of query terms from an entity node by traversing the graph. We evaluate our approach using a standard evaluation framework for entity search.
A study of term weighting schemes using class information for text classification BIBFull-Text 1029-1030
  Youngjoong Ko
A topic model of clinical reports BIBAFull-Text 1031-1032
  Corey Arnold; William Speier
Clinical narrative in the medical record provides perhaps the most detailed account of a patient's history. However, this information is documented in free-text, which makes it challenging to analyze. Efforts to index unstructured clinical narrative often focus on identifying predefined concepts from clinical terminologies. Less studied is the problem of analyzing the text as a whole to create temporal indices that capture relationships between learned clinical events. Topic models provide a method for analyzing large corpora of text to discover semantically related clusters of words. This work presents a topic model tailored to the clinical reporting environment that allows for individual patient timelines. Results show the model is able to identify patterns of clinical events in a cohort of brain cancer patients.
Active query selection for learning rankers BIBAFull-Text 1033-1034
  Mustafa Bilgic; Paul N. Bennett
Methods that reduce the amount of labeled data needed for training have focused more on selecting which documents to label than on which queries should be labeled. One exception to this (Long et al. 2010) uses expected loss optimization (ELO) to estimate which queries should be selected but is limited to rankers that predict absolute graded relevance. In this work, we demonstrate how to easily adapt ELO to work with any ranker and show that estimating expected loss in DCG is more robust than NDCG even when the final performance measure is NDCG.
Anticipatory search: using context to initiate search BIBAFull-Text 1035-1036
  Daniel J. Liebling; Paul N. Bennett; Ryen W. White
Identifying content for which a user may search has a variety of applications, including ranking and recommendation. In this poster, we examine how pre-search context can be used to predict content that the user will seek before they have even specified a search query. We call this anticipatory search. Using a log-based approach, we compare different methods for predicting the content to be searched using different attributes of the pre-query context and behavioral signals from previous visitors to the most recent browse URL. Each method covers different cases and shows promise for query-free anticipatory search on the Web.
BReK12: a book recommender for K-12 users BIBAFull-Text 1037-1038
  Maria Soledad Pera; Yiu-Kai Ng
Ideally, students in K-12 grade levels can turn to book recommenders to locate books that match their interests. Existing book recommenders, however, fail to take into account the readability levels of their users, and hence their recommendations may be unsuitable for the users. To address this issue, we introduce BReK12, a recommender that targets K-12 users and prioritizes the reading level of its users in suggesting books of interest. Empirical studies conducted using the Bookcrossing dataset show that BReK12 outperforms a number of existing recommenders (developed for general users) in identifying books appealing to K-12 users.
Clarity re-visited BIBAFull-Text 1039-1040
  Shay Hummel; Anna Shtok; Fiana Raiber; Oren Kurland; David Carmel
We present a novel interpretation of Clarity [5], a widely used query performance predictor. While Clarity is commonly described as a measure of the "distance" between the language model of the top-retrieved documents and that of the collection, we show that it actually quantifies an additional property of the result list, namely, its diversity. This analysis, along with empirical evaluation, helps to explain the low prediction quality of Clarity for large-scale Web collections.
Cluster-based one-class ensemble for classification problems in information retrieval BIBAFull-Text 1041-1042
  Nedim Lipka; Benno Stein; Maik Anderka
A number of relevant information retrieval classification problems are one-class classification problems at heart. I.e., labeled data is only available for one class, the so-called target class, and common discrimination-based classification approaches, be them binary or multiclass, are not applicable. Achieving a high effectiveness when solving one-class problems is difficult anyway and it becomes even more challenging when the target class data is multimodal, which is often the case. To address these concerns we propose a cluster-based one-class ensemble that consists of four steps: (1) applying a clustering algorithm to the target class data, (2) training an individual one-class classifier for each of the identified clusters, (3) aggregating the decisions of the individual classifiers, and (4) selecting the best fitting clustering model. We evaluate our approach with four datasets: an artificially generated dataset, a dataset compiled from a known multiclass text corpus, and two datasets related to one-class problems that received much attention recently, namely authorship verification and quality flaw prediction. Our approach outperforms a one-class SVM on all four datasets.
Collaborative filtering with short term preferences mining BIBAFull-Text 1043-1044
  Diyi Yang; Tianqi Chen; Weinan Zhang; Yong Yu
Recently, recommender systems have fascinated researchers and benefited a variety of people's online activities, enabling users to survive the explosive web information. Traditional collaborative filtering techniques handle the general recommendation well. However, most such approaches usually focus on long term preferences. To discover more short term factors influencing people's decisions, we propose a short term preferences model, implemented with implicit user feedback. We conduct experiments comparing the performances of different short term models, which show that our model outperforms significantly compared to those long term models.
Creating temporally dynamic web search snippets BIBAFull-Text 1045-1046
  Krysta M. Svore; Jaime Teevan; Susan T. Dumais; Anagha Kulkarni
Content on the Internet is always changing. We explore the value of biasing search result snippets towards new webpage content. We present results from a user study comparing traditional query-focused snippets with snippets that emphasize new page content for two query types: general and trending. Our results indicate that searchers prefer the inclusion of temporal information for trending queries but not for general queries, and that this is particularly valuable for pages that have not been recently crawled.
Dependency trigram model for social relation extraction from news articles BIBAFull-Text 1047-1048
  Maengsik Choi; Harksoo Kim; Bruce W. Croft
We propose a kernel-based model to automatically extract social relations such as economic relations and political relations between two people from news articles. To determine whether two people are structurally associated with each other, the proposed model uses an SVM (support vector machine) tree kernel based on trigrams of head-dependent relations between them. In the experiments with the automatic content extraction (ACE) corpus and a Korean news corpus, the proposed model outperformed the previous systems based on SVM tree kernels even though it used more shallow linguistic knowledge.
Detecting candidate named entities in search queries BIBAFull-Text 1049-1050
  Areej Alasiry; Mark Levene; Alexandra Poulovassilis
The information extraction task of Named Entities Recognition (NER) has been recently applied to search engine queries, in order to better understand their semantics. Here we concentrate on the task prior to the classification of the named entities (NEs) into a set of categories, which is the problem of detecting candidate NEs via the subtask of query segmentation. We present a novel method for detecting candidate NEs using grammar annotation and query segmentation with the aid of top-n snippets from search engine results and a web n-gram model, to accurately identify NE boundaries. The proposed method addresses the problem of accurately setting boundaries of NEs and the detection of multiple NEs in queries.
Effect of dynamic pruning safety on learning to rank effectiveness BIBAFull-Text 1051-1052
  Craig Macdonald; Nicola Tonellotto; Iadh Ounis
A dynamic pruning strategy, such as WAND, enhances retrieval efficiency without degrading effectiveness to a given rank K, known as safe-to-rank-K. However, it is also possible for WAND to obtain more efficient but unsafe retrieval without actually significantly degrading effectiveness. On the other hand, in a modern search engine setting, dynamic pruning strategies can be used to efficiently obtain the set of documents to be re-ranked by the application of a learned model in a learning to rank setting. No work has examined the impact of safeness on the effectiveness of the learned model. In this work, we investigate the impact of WAND safeness through experiments using 150 TREC Web track topics. We find that unsafe WAND is biased towards documents with lower docids, thereby impacting effectiveness.
Effect of written instructions on assessor agreement BIBAFull-Text 1053-1054
  William Webber; Bryan Toth; Marjorie Desamito
Assessors frequently disagree on the topical relevance of documents. How much of this disagreement is due to ambiguity in assessment instructions? We have two assessors assess TREC Legal Track documents for relevance, some to a general topic description, others to detailed assessment guidelines. We find that detailed guidelines lead to no significant increase in agreement amongst assessors or between assessors and the official qrels.
Effects of expertise differences in synchronous social Q&A BIBAFull-Text 1055-1056
  Ryen W. White; Matthew Richardson
Synchronous social question-and-answer (Q&A) systems match askers to answerers and support real-time dialog between them to resolve questions. These systems typically find answerers based on the degree of expertise match with the asker's initial question. However, since synchronous social Q&A involves a dialog between asker and answerer, differences in expertise may also matter (e.g., extreme novices and experts may have difficulty establishing common ground). In this poster we use data from a live social Q&A system to explore the impact of expertise differences on answer quality and aspects of the dialog itself. The findings of our study suggest that synchronous social Q&A systems should consider the relative expertise of candidate answerers with respect to the asker, and offer interactive dialog support to help establish common ground between askers and answerers.
Efficient estimation of aspect weights BIBAFull-Text 1057-1058
  Jon Parker; Andrew Yates; Nazli Goharian; Wai Gen Yee
Many websites encourage people to submit reviews of various products and services. We present and evaluate a novel approach to efficiently model and analyze the text within user reviews to estimate how much reviewers care about different aspects of a product (i.e., amenities, food, location, room, etc. of a hotel). Our approach performs statistically quite similar to the best existing method. However, our method for computing aspect weights is a linear time method while the current state of the art solution requires cubic time at best.
Emotion tagging for comments of online news by meta classification with heterogeneous information sources BIBAFull-Text 1059-1060
  Ying Zhang; Yi Fang; Xiaojun Quan; Lin Dai; Luo Si; Xiaojie Yuan
With the rapid growth of online news services, users can actively respond to online news by making comments. Users often express subjective emotions in comments such as sadness, surprise and anger. Such emotions can help understand the preferences and perspectives of individual users, and therefore may facilitate online publishers to provide users with more relevant services. This paper tackles the task of predicting emotions for the comments of online news. To the best of our knowledge, this is the first research work for addressing the task. In particular, this paper proposes a novel Meta classification approach that exploits heterogeneous information sources such as the content of the comments and the emotion tags of news articles generated by users. The experiments on two datasets from online news services demonstrate the effectiveness of the proposed approach.
Estimating the magic barrier of recommender systems: a user study BIBAFull-Text 1061-1062
  Alan Said; Brijnesh J. Jain; Sascha Narr; Till Plumbaum; Sahin Albayrak; Christian Scheel
Recommender systems are commonly evaluated by trying to predict known, withheld, ratings for a set of users. Measures such as the Root-Mean-Square Error are used to estimate the quality of the recommender algorithms. This process does however not acknowledge the inherent rating inconsistencies of users. In this paper we present the first results from a noise measurement user study for estimating the magic barrier of recommender systems conducted on a commercial movie recommendation community. The magic barrier is the expected squared error of the optimal recommendation algorithm, or, the lowest error we can expect from any recommendation algorithm. Our results show that the barrier can be estimated by collecting the opinions of users on already rated items.
Explaining neighborhood-based recommendations BIBAFull-Text 1063-1064
  Sergio Cleger-Tamayo; Juan M. Fernandez-Luna; Juan F. Huete
Recommender Systems (RS) attempt to discover users' preferences, and to learn about them in order to anticipate their needs. The main task normally associated with a RS is to offer suggestions for items. However, for most users, RSs are black boxes, computerized oracles that give advice, but cannot be questioned. In order to improve the quality of predictions and the satisfaction of the users, explanations facilities are needed. We present a novel methodology to explain recommendations: showing predictions over a set of observed items. Our proposal has been validated by means of user studies and lab experiments using MovieLens dataset.
Exploiting term dependence while handling negation in medical search BIBAFull-Text 1065-1066
  Nut Limsopatham; Craig Macdonald; Richard McCreadie; Iadh Ounis
In medical records, negative qualifiers, e.g. no or without, are commonly used by health practitioners to identify the absence of a medical condition. Without considering whether the term occurs in a negative or positive context, the sole presence of a query term in a medical record is insufficient to imply that the record is relevant to the query. In this paper, we show how to effectively handle such negation within a medical records information retrieval system. In particular, we propose a term representation that tackles negated language in medical records, which is further extended by considering the dependence of negated query terms. We evaluate our negation handling technique within the search task provided by the TREC Medical Records 2011 track. Our results, which show a significant improvement upon a system that does not consider negated context within records, attest the importance of handling negation.
Exploring example-based person search in email BIBAFull-Text 1067-1068
  Tan Xu; Douglas W. Oard
This paper describes an entity ranking model for example-based person search in email. Evaluation by comparison to manually resolved named references in Enron email yield results that correspond to typically placing the correct entity in the first or second rank.
Exploring tag relevance for image tag re-ranking BIBAFull-Text 1069-1070
  Jie Xiao; Wengang Zhou; Qi Tian
In this paper, we propose to explore the relevance between tags for image tag re-ranking. The key component is to define a global tag-tag similarity matrix, which is achieved by analysis in both semantic and visual aspects. The text semantic relevance is explored by the Latent Semantic Indexing (LSI) model [1]. For the visual information, the tag-relevance can be propagated by reconstructing exemplar images with visually and semantically consistent images. Based on our tag relevance matrix, a random-walk approach is leveraged to discover the significance of each tag. Finally, all tags in an image are re-ranked by their significance values. Extensive experiments show its effectiveness on an image dataset with a large tags vocabulary.
Fast on-line learning for multilingual categorization BIBAFull-Text 1071-1072
  Michelle Kovesi; Cyril Goutte; Massih-Reza Amini
Multiview learning has been shown to be a natural and efficient framework for supervised or semi-supervised learning of multilingual document categorizers. The state-of-the-art co-regularization approach relies on alternate minimizations of a combination of language-specific categorization errors and a disagreement between the outputs of the monolingual text categorizers. This is typically solved by repeatedly training categorizers on each language with the appropriate regularizer. We extend and improve this approach by introducing an on-line learning scheme, where language-specific updates are interleaved in order to iteratively optimize the global cost in one pass. Our experimental results show that this produces similar performance as the batch approach, at a fraction of the computational cost.
Finding interesting posts in Twitter based on retweet graph analysis BIBAFull-Text 1073-1074
  Min-Chul Yang; Jung-Tae Lee; Seung-Wook Lee; Hae-Chang Rim
Millions of posts are being generated in real-time by users in social networking services, such as Twitter. However, a considerable number of those posts are mundane posts that are of interest to the authors and possibly their friends only. This paper investigates the problem of automatically discovering valuable posts that may be of potential interest to a wider audience. Specifically, we model the structure of Twitter as a graph consisting of users and posts as nodes and retweet relations between the nodes as edges. We propose a variant of the HITS algorithm for producing a static ranking of posts. Experimental results on real world data demonstrate that our method can achieve better performance than several baseline methods.
Finding readings for scientists from social websites BIBAFull-Text 1075-1076
  Jiepu Jiang; Zhen Yue; Shuguang Han; Daqing He
Current search systems are designed to find relevant articles, especially topically relevant ones, but the notion of relevance largely depends on search tasks. We study the specific task that scientists are searching for worth-reading articles beneficial for their research. Our study finds: users' perception of relevance and preference of reading are only moderately correlated; current systems can effectively find readings that are highly relevant to the topic, but 36% of the worth-reading articles are only marginally relevant or even non-relevant. Our system can effectively find those worth-reading but marginally relevant or non-relevant articles by taking advantages of scientists' recommendations in social websites.
Finding web appearances of social network users via latent factor model BIBAFull-Text 1077-1078
  Kailong Chen; Zhengdong Lu; Xiaoshi Yin; Yong Yu; Zaiqing Nie
With the rapid growing of Web 2.0, people spend more time on social networks such as Facebook and Twitter. In order to know the people they are interacting with, finding the web appearances of them will help the social network users to a great extent. We propose a novel and effective latent factor model to find web appearances of target social network users. Our method solves the name ambiguity problem by simultaneously exploring the link structure of social networks and the web. Experiments on real-world data show the superiority of our method over several baselines.
Fixed versus dynamic co-occurrence windows in TextRank term weights for information retrieval BIBAFull-Text 1079-1080
  Wei Lu; Qikai Cheng; Christina Lioma
TextRank is a variant of PageRank typically used in graphs that represent documents, and where vertices denote terms and edges denote relations between terms. Quite often the relation between terms is simple term co-occurrence within a fixed window of k terms. The output of TextRank when applied iteratively is a score for each vertex, i.e. a term weight, that can be used for information retrieval (IR) just like conventional term frequency based term weights.
   So far, when computing TextRank term weights over co-occurrence graphs, the window of term co-occurrence is always fixed. This work departs from this, and considers dynamically adjusted windows of term co-occurrence that follow the document structure on a sentence- and paragraph-level. The resulting TextRank term weights are used in a ranking function that re-ranks 1000 initially returned search results in order to improve the precision of the ranking. Experiments with two IR collections show that adjusting the vicinity of term co-occurrence when computing TextRank term weights can lead to gains in early precision.
Gender-aware re-ranking BIBAFull-Text 1081-1082
  Eugene Kharitonov; Pavel Serdyukov
In this paper we study usefulness of users' gender information for improving ranking of ambiguous queries in personalized and non-contextual settings. This study is performed as a sequence of offline re-ranking experiments and it demonstrates that the proposed gender-aware ranking features provide improvements in ranking quality. It is also shown that the proposed personalized features exhibit performance superior to non-contextual ones.
Genre classification for million song dataset using confidence-based classifiers combination BIBAFull-Text 1083-1084
  Yajie Hu; Mitsunori Ogihara
We proposed a method to classify songs in the Million Song Dataset according to song genre. Since songs have several data types, we trained sub-classifiers by different types of data. These sub-classifiers are combined using both classifier authority and classification confidence for a particular instance. In the experiments, the combined classifier surpasses all of these sub-classifiers and the SVM classifier using concatenated vectors from all data types. Finally, the genre labels for the Million Song Dataset are provided.
GLASE 0.1: eyes tell more than mice BIBAFull-Text 1085-1086
  Viktors Garkavijs; Mayumi Toshima; Noriko Kando
This paper proposes a prototype system called Gaze-Learning-Access-and-Search-Engine 0.1 (GLASE), which can perform image relevance ranking based on gaze data and within-session learning. We developed a search user interface that uses an eye-tracker as an input device and employed a relevance re-ranking algorithm based on the gaze length. The preliminary experimental results showed that using our gaze-driven system reduced the task completion time an average of 13.7% in a search session.
How query extensions reflect search result abandonments BIBAFull-Text 1087-1088
  Aleksandr Chuklin; Pavel Serdyukov
It is often considered that high abandonment rate corresponds to poor IR system performance. However several studies suggested that there are so called good abandonments, i.e. situations when search engine result page contains enough details to satisfy the user information need without necessity to click on search results. In this work we propose to look at query extensions. We see that an extension by itself might motivate abandonment type (good or bad) for the underlying query to some degree. We also propose a way to find potentially good abandonment extensions in an automated manner.
Identifying entity aspects in microblog posts BIBAFull-Text 1089-1090
  Damiano Spina; Edgar Meij; Maarten de Rijke; Andrei Oghina; Minh Thuong Bui; Mathias Breuss
Online reputation management is about monitoring and handling the public image of entities (such as companies) on the Web. An important task in this area is identifying "aspects" of the entity of interest (such as products, services, competitors, key people, etc.) given a stream of microblog posts referring to the entity. In this paper we compare different IR techniques and opinion target identification methods for automatically identifying aspects and find that (i) simple statistical methods such as TF.IDF are a strong baseline for the task, significantly outperforming opinion-oriented methods, and (ii) only considering terms tagged as nouns improves the results for all the methods analyzed.
Impact of assessor disagreement on ranking performance BIBAFull-Text 1091-1092
  Pavel Metrikov; Virgil Pavlu; Javed A. Aslam
We consider the impact of inter-assessor disagreement on the maximum performance that a ranker can hope to achieve. We demonstrate that even if a ranker were to achieve perfect performance with respect to a given assessor, when evaluated with respect to a different assessor, the measured performance of the ranker decreases significantly. This decrease in performance may largely account for observed limits on the performance of learning-to-rank algorithms.
Incorporating statistical topic information in relevance feedback BIBAFull-Text 1093-1094
  Karla L. Caballero; Ram Akella
Most of the relevance feedback algorithms only use document terms as feedback (local features) in order to update the query and re-rank the documents to show to the user. This approach is limited by the terms of those documents without any global context. We propose to use statistical topic modeling techniques in relevance feedback to incorporate a better estimate of context by including global information about the document. This is particularly helpful for difficult queries where learning the context from the interactions with the user is crucial. We propose to use the topic mixture information obtained to characterize the documents and learn their topics. Then, we rank documents incorporating positive and negative feedback by fitting a latent distribution for each class of documents online and combining all the features using Bayesian Logistic Regression. We show results using the OHSUMED dataset for 3 different variants and obtain higher performance, up to 12.5% in Mean Average Precision (MAP).
Inferring missing relevance judgments from crowd workers via probabilistic matrix factorization BIBAFull-Text 1095-1096
  Hyun Joon Jung; Matthew Lease
In crowdsourced relevance judging, each crowd worker typically judges only a small number of examples, yielding a sparse and imbalanced set of judgments in which relatively few workers influence output consensus labels, particularly with simple consensus methods like majority voting. We show how probabilistic matrix factorization, a standard approach in collaborative filtering, can be used to infer missing worker judgments such that all workers influence output labels. Given complete worker judgments inferred by PMF, we evaluate impact in unsupervised and supervised scenarios. In the supervised case, we consider both weighted voting and worker selection strategies based on worker accuracy. Experiments on crowd judgments from the 2010 TREC Relevance Feedback Track show promise of the PMF approach merits further investigation and analysis.
Investigating performance predictors using Monte Carlo simulation and score distribution models BIBAFull-Text 1097-1098
  Ronan Cummins
The standard deviation of scores in the top k documents of a ranked list has been shown to be significantly correlated with average precision and has been the basis of a number of query performance predictors. In this paper, we outline two hypotheses that aid in understanding this correlation. Using score distribution (SD) models with known parameters, we create a large number of document rankings using Monte Carlo simulation to test the validity of these hypotheses.
Learning to select a time-aware retrieval model BIBAFull-Text 1099-1100
  Nattiya Kanhabua; Klaus Berberich; Kjetil Nørvåg
Time-aware retrieval models exploit one of two time dimensions, namely, (a) publication time or (b) content time (temporal expressions mentioned in documents). We show that the effectiveness for a temporal query (e.g., illinois earthquake 1968) depends significantly on which time dimension is factored into ranking results. Motivated by this, we propose a machine learning approach to select the most suitable time-aware retrieval model for a given temporal query. Our method uses three classes of features obtained from analyzing distributions over two time dimensions, a distribution over terms, and retrieval scores within top-k result documents. Experiments on real-world data with crowdsourced relevance assessments show the potential of our approach.
Learning-based time-sensitive re-ranking for web search BIBAFull-Text 1101-1102
  Po-Tzu Chang; Yen-Chieh Huang; Cheng-Lun Yang; Shou-De Lin; Pu-Jen Cheng
To model time-dependent user intent for Web search, this paper proposes a novel method using machine learning techniques to exploit temporal features for effective time-sensitive search result re-ranking. We propose models to incorporate users' click through information for queries that are seen in the training data, and then further extend the model to deal with unseen queries considering the relationship between queries. Experiment shows significant improvement on search result ranking over original search outputs.
Lightweight contrastive summarization for news comment mining BIBAFull-Text 1103-1104
  Gobaan Raveendran; Charles L. A. Clarke
We develop and discuss a news comment miner that presents distinct viewpoints on a given theme or event. Given a query, the system uses metasearch techniques to find relevant news articles. Relevant articles are then scraped for both article content and comments. Snippets from the comments are sampled and presented to the user, based on theme popularity and contrastiveness to previously selected snippets. The system design focuses on being quicker and more lightweight than recent topic modelling approaches, while still focusing on selecting orthogonal snippets.
Looking inside the box: context-sensitive translation for cross-language information retrieval BIBAFull-Text 1105-1106
  Ferhan Ture; Jimmy Lin; Douglas W. Oard
Cross-language information retrieval (CLIR) today is dominated by techniques that use token-to-token mappings from bilingual dictionaries. Yet, state-of-the-art statistical translation models (e.g., using Synchronous Context-Free Grammars) are far richer, capturing multi-term phrases, term dependencies, and contextual constraints on translation choice. We present a novel CLIR framework that is able to reach inside the translation "black box" and exploit these sources of evidence. Experiments on the TREC-5/6 English-Chinese test collection show this approach to be promising.
Making results fit into 40 characters: a study in document rewriting BIBAFull-Text 1107-1108
  Johannes Leveling; Gareth J. F. Jones
With the increasing popularity of mobile and hand-held devices, automatic approaches for adapting results to the limited screen size of mobile devices are becoming more important. Traditional approaches for reducing the length of textual results include summarization and snippet extraction. In this study, we investigate document rewriting techniques which retain the meaning and readability of the original text. Evaluations on different document sets show that i) rewriting documents considerably reduces document length and thus, scrolling effort on devices with limited screen size, and ii) the rewritten documents have a higher readability.
New assessment criteria for query suggestion BIBAFull-Text 1109-1110
  Zhongrui Ma; Yu Chen; Ruihua Song; Tetsuya Sakai; Jiaheng Lu; Ji-Rong Wen
Query suggestion is a useful tool to help users express their information needs by supplying alternative queries. When evaluating the effectiveness of query suggestion algorithms, many previous studies focus on measuring whether a suggestion query is relevant or not to the input query. This assessment criterion is too simple to describe users' requirements. In this paper, we introduce two scenarios of query suggestion. The first scenario represents cases where the search result of the input query is unsatisfactory. The second scenario represents cases where the search result is satisfactory but the user may be looking for alternative solutions. Based on the two scenarios, we propose two assessment criteria. Our labeling results indicate that the new assessment criteria provide finer distinctions among query suggestions than the traditional relevance-based criterion.
On automatically tagging web documents from examples BIBAFull-Text 1111-1112
  Nicholas Joel Woodward; Weijia Xu; Kent Norsworthy
An emerging need in information retrieval is to identify a set of documents conforming to an abstract description. This task presents two major challenges to existing methods of document retrieval and classification. First, similarity based on overall content is less effective because there may be great variance in both content and subject of documents produced for similar functions, e.g. a presidential speech or a government ministry white paper. Second, the function of the document can be defined based on user interests or the specific data set through a set of existing examples, which cannot be described with standard categories. Additionally, the increasing volume and complexity of document collections demands new scalable computational solutions. We conducted a case study using web-archived data from the Latin American Government Documents Archive (LAGDA) to illustrate these problems and challenges. We propose a new hybrid approach based on Naïve Bayes inference that uses mixed n-gram models obtained from a training set to classify documents in the corpus. The approach has been developed to exploit parallel processing for large scale data set. The preliminary work shows promising results with improved accuracy for this type of retrieval problem.
On building a reusable Twitter corpus BIBAFull-Text 1113-1114
  Richard McCreadie; Ian Soboroff; Jimmy Lin; Craig Macdonald; Iadh Ounis; Dean McCullough
The Twitter real-time information network is the subject of research for information retrieval tasks such as real-time search. However, so far, reproducible experimentation on Twitter data has been impeded by restrictions imposed by the Twitter terms of service. In this paper, we detail a new methodology for legally building and distributing Twitter corpora, developed through collaboration between the Text REtrieval Conference (TREC) and Twitter. In particular, we detail how the first publicly available Twitter corpus -- referred to as Tweets2011 -- was distributed via lists of tweet identifiers and specialist tweet crawling software. Furthermore, we analyse whether this distribution approach remains robust over time, as tweets in the corpus are removed either by users or Twitter itself. Tweets2011 was successfully used by 58 participating groups for the TREC 2011 Microblog track, while our results attest to the robustness of the crawling methodology over time.
On judgments obtained from a commercial search engine BIBAFull-Text 1115-1116
  Emine Yilmaz; Gabriella Kazai; Nick Craswell; Saied Mehrizi Tahaghoghi
In information retrieval, relevance judgments play an important role as they are required both for evaluating the quality of retrieval systems and for training learning to rank algorithms. In recent years, numerous papers have been published using judgments obtained from a commercial search engine by researchers in industry. As typically no information is provided about the quality of these judgments, their reliability for evaluating/training retrieval systems remains questionable. In this paper, we analyze the reliability of such judgments for evaluating the quality of retrieval systems by comparing them to judgments by NIST judges at TREC.
On the mathematical relationship between expected n-call@k and the relevance vs. diversity trade-off BIBAFull-Text 1117-1118
  Kar Wai Lim; Scott Sanner; Shengbo Guo
It has been previously noted that optimization of the n-call@k relevance objective (i.e., a set-based objective that is 1 if at least n documents in a set of k are relevant, otherwise 0) encourages more result set diversification for smaller n, but this statement has never been formally quantified. In this work, we explicitly derive the mathematical relationship between expected n-call@k and the relevance vs. diversity trade-off -- through fortuitous cancellations in the resulting combinatorial optimization, we show the trade-off is a simple and intuitive function of n (notably independent of the result set size k e n), where diversification increases as n approaches 1.
On real-time ad-hoc retrieval evaluation BIBAFull-Text 1119-1120
  Stephen E. Robertson; Evangelos Kanoulas
Lab-based evaluations typically assess the quality of a retrieval system with respect to its ability to retrieve documents that are relevant to the information need of an end user. In a real-time search task however users not only wish to retrieve the most relevant items but the most recent as well. The current evaluation framework is not adequate to assess the ability of a system to retrieve both recent and relevant items, and the one proposed in the recent TREC Microblog Track has certain flaws that quickly became apparent to the organizers. In this poster, we redefine the experiment for a real-time ad-hoc search task, by setting new submission requirements for the submitted systems/runs, proposing metrics to be used in evaluating the submissions, and suggesting a pooling strategy to be used to gather relevance judgments towards the computation of the described metrics. The proposed task can indeed assess the quality of a retrieval system with regard to retrieving both relevant and timely information.
Opinion summarisation through sentence extraction: an investigation with movie reviews BIBAFull-Text 1121-1122
  Marco Bonzanini; Miguel Martinez-Alvarez; Thomas Roelleke
In on-line reviews, authors often use a short passage to describe the overall feeling about a product or a service. A review as a whole can mention many details not in line with the overall feeling, so capturing this key passage is important to understand the overall sentiment of the review. This paper investigates the use of extractive summarisation in the context of sentiment classification. The aim is to find the summary sentence, or the short passage, which gives the overall sentiment of the review, filtering out potential noisy information. Experiments on a movie review data-set show that subjectivity detection plays a central role in building summaries for sentiment classification. Subjective extracts carry the same polarity of the full text reviews, while statistical and positional approaches are not able to capture this aspect.
Optimizing parameters of the expected reciprocal rank BIBAFull-Text 1123-1124
  Yury Logachev; Pavel Serdyukov
Most popular IR metrics are parameterized. Usually parameters of these metrics are chosen on the basis of general considerations and not adjusted by experiments with real users. Particularly, the parameters of the Expected Reciprocal Rank measure are the normalized parameters of the DCG metric, and the latter are chosen in an ad-hoc manner. We suggest an approach for adjusting parameters of the ERR metric that allows to reach maximum agreement with the real users behavior. More exactly, we optimized the parameters by maximizing Pearson weighted correlation between ERR and several online click metrics. For each click metric we managed to find the parameters of ERR that result into its higher correlation with the given online click metric.
Ousting ivory tower research: towards a web framework for providing experiments as a service BIBAFull-Text 1125-1126
  Tim Gollub; Benno Stein; Steven Burrows
With its close ties to the Web, the IR community is destined to leverage the dissemination and collaboration capabilities that the Web provides today. Especially with the advent of the software as a service principle, an IR community is conceivable that publishes experiments executable by anyone over the Web. A review of recent SIGIR papers shows that we are far away from this vision of collaboration. The benefits of publishing IR experiments as a service are striking for the community as a whole, and include potential to boost research profiles and reputation. However, the additional work must be kept to a minimum and sensitive data must be kept private for this paradigm to become an accepted practice. To foster experiments as a service in IR, we present a Web framework for experiments that addresses the outlined challenges and possesses a unique set of compelling features in comparison to existing solutions. We also describe how our reference implementation is already used officially as an evaluation platform for an established international plagiarism detection competition.
Parallelizing ListNet training using spark BIBAFull-Text 1127-1128
  Shilpa Shukla; Matthew Lease; Ambuj Tewari
As ever-larger training sets for learning to rank are created, scalability of learning has become increasingly important to achieving continuing improvements in ranking accuracy. Exploiting independence of "summation form" computations, we show how each iteration in ListNet gradient descent can benefit from parallel execution. We seek to draw the attention of the IR community to use Spark, a newly introduced distributed cluster computing system, for reducing training time of iterative learning to rank algorithms. Unlike MapReduce, Spark is especially suited for iterative and interactive algorithms. Our results show near linear reduction in ListNet training time using Spark on Amazon EC2 clusters.
Predicting lifespans of popular tweets in microblog BIBAFull-Text 1129-1130
  Shoubin Kong; Ling Feng; Guozheng Sun; Kan Luo
In microblog like Twitter, popular tweets are usually retweeted by many users. For different tweets, their lifespans (i.e., how long they will stay popular) vary. This paper presents a simple yet effective approach to predict the lifespans of popular tweets based on their static characteristics and dynamic retweeting patterns. For a potentially popular tweet, we generate a time series based on its first-hour retweeting information, and compare it with those of historic tweets of the same author and post time (at the granularity of hour). The top-k historic tweets are identified, whose mean lifespan is estimated as the lifespan of the new tweet. Our experiments on a three-month real data set from Tencent Microblog demonstrate the effectiveness of the approach.
Preliminary study of technical terminology for the retrieval of scientific book metadata records BIBAFull-Text 1131-1132
  Birger Larsen; Christina Lioma; Ingo Frommholz; Hinrich Schütze
Books only represented by brief metadata (book records) are particularly hard to retrieve. One way of improving their retrieval is by extracting retrieval enhancing features from them. This work focusses on scientific (physics) book records. We ask if their technical terminology can be used as a retrieval enhancing feature. A study of 18,443 book records shows a strong correlation between their technical terminology and their likelihood of relevance. Using this finding for retrieval yields >+5% precision and recall gains.
Queries without clicks: evaluating retrieval effectiveness based on user feedback BIBAFull-Text 1133-1134
  Athanasia Koumpouri; Vasiliki Simaki
Until recently, the lack of user activity on search results was perceived as a sign of user dissatisfaction from retrieval performance. However, recent studies have reported that some queries might not be followed by clicks to the content of the retrieved results, because the search task can be satisfied in the list of retrieved results the user views without the need to click through them. In this paper, we propose a method for evaluating user satisfaction from the results of searches that are not followed by clickthrough activity to the retrieved results. We found that there is a strong association between some implicit measures of user activity and user's explicit satisfaction judgments. Moreover, we developed a predictive model of user satisfaction based on implicit measures, achieving accuracy up to 86%.
Retrieval evaluation on focused tasks BIBAFull-Text 1135-1136
  Besnik Fetahu; Ralf Schenkel
Ranking of retrieval systems for focused tasks requires large number of relevance judgments. We propose an approach that minimizes the number of relevance judgments, where the performance measures are approximated using a Monte-Carlo sampling technique. Partial measures are taken using relevance judgments, whereas the remaining part of passages are annotated using a generated relevance probability distribution based on result rank. We define two conditions for stopping the assessment procedure when the ranking between systems is stable.
Rewarding term location information to enhance probabilistic information retrieval BIBAFull-Text 1137-1138
  Jiashu Zhao; Jimmy Xiangji Huang; Shicheng Wu
We investigate the effect of rewarding terms according to their locations in documents for probabilistic information retrieval. The intuition behind our approach is that a large amount of authors would summarize their ideas in some particular parts of documents. In this paper, we focus on the beginning part of documents. Several shape functions are defined to simulate the influence of term location information. We propose a Reward Term Retrieval model that combines the reward terms' information with BM25 to enhance probabilistic information retrieval performance.
Scheduling queries across replicas BIBAFull-Text 1139-1140
  Ana Freire; Craig Macdonald; Nicola Tonellotto; Iadh Ounis; Fidel Cacheda
For increased efficiency, an information retrieval system can split its index into multiple shards, and then replicate these shards across many query servers. For each new query, an appropriate replica for each shard must be selected, such that the query is answered as quickly as possible. Typically, the replica with the lowest number of queued queries is selected. However, not every query takes the same time to execute, particularly if a dynamic pruning strategy is applied by each query server. Hence, the replica's queue length is an inaccurate indicator of the workload of a replica, and can result in inefficient usage of the replicas. In this work, we propose that improved replica selection can be obtained by using query efficiency prediction to measure the expected workload of a replica. Experiments are conducted using 2.2k queries, over various numbers of shards and replicas for the large GOV2 collection. Our results show that query waiting and completion times can be markedly reduced, showing that accurate response time predictions can improve scheduling accuracy and attesting the benefit of the proposed scheduling algorithm.
Re-examining search result snippet examination time for relevance estimation BIBAFull-Text 1141-1142
  Dmitry Lagun; Eugene Agichtein
Previous studies of web search result examination have provided valuable insights in understanding and modelling searcher behavior. Yet, recent work (e.g., [3]) has been developed based on the assumption that the time a searcher spends examining a particular result abstract or snippet, correlates with result relevance. While this idea is intuitively attractive, to the best of our knowledge it has not been empirically tested. This poster investigates this hypothesis empirically, in a controlled setting, using eye tracking equipment to compare search result examination time with result relevance. Interestingly, while we replicate previous findings showing examination time to be indicative of whole-page relevance, we find that viewing time of individual results alone is a poor indicator of either absolute result relevance or even of pairwise preferences. Our results should not be taken as negating the usefulness of modeling searcher examination behavior, but rather to emphasize that snippet examination time is not in itself a good indicator of relevance.
Sentiment identification by incorporating syntax, semantics and context information BIBAFull-Text 1143-1144
  Kunpeng Zhang; Yusheng Xie; Yu Cheng; Daniel Honbo; Doug Downey; Ankit Agrawal; Wei-keng Liao; Alok Choudhary
This paper proposes a method based on conditional random fields to incorporate sentence structure (syntax and semantics) and context information to identify sentiments of sentences within a document. It also proposes and evaluates two different active learning strategies for labeling sentiment data. The experiments with the proposed approach demonstrate a 5-15% improvement in accuracy on Amazon customer reviews compared to existing supervised learning and rule-based methods.
Short text classification using very few words BIBAFull-Text 1145-1146
  Aixin Sun
We propose a simple, scalable, and non-parametric approach for short text classification. Leveraging the well studied and scalable Information Retrieval (IR) framework, our approach mimics human labeling process for a piece of short text. It first selects the most representative and topical-indicative words from a given short text as query words, and then searches for a small set of labeled short texts best matching the query words. The predicted category label is the majority vote of the search results. Evaluated on a collection of more than 12K Web snippets, the proposed approach achieves comparable classification accuracy with the baseline Maximum Entropy classifier using as few as 3 query words and top-5 best matching search hits. Among the four query word selection schemes proposed and evaluated in our experiments, term frequency together with clarity gives the best classification accuracy.
Summarizing the differences from microblogs BIBAFull-Text 1147-1148
  Dingding Wang; Mitsunori Ogihara; Tao Li
With the rapid growth of social media websites, microblogging has become a popular way to spread instant news and events. Due to the dynamic and social nature of microblogs, extracting useful information from microblogs is more challenging than from the traditional news articles. In this paper we study the problem of summarizing the differences from microblogs. Given a collection of microblogs discussing an event/topic, we propose to generate a short summary delivering the differences among these microblogs, such as the different points of view for a news topic and the changes and evolution of an ongoing event.
Survival analysis of click logs BIBAFull-Text 1149-1150
  Si-Chi Chin; W. Nick Street
Click logs from search engines provide a rich opportunity to acquire implicit feedback from users. Patterns derived from the time between a posted query and a click provide information on the ranking quality, reflecting the perceived relevance of a retrieved URL. This paper applies the Kaplan-Meier estimator to study click patterns. The visualization of click curves demonstrates the interaction between the relevance and the rank position of URLs. The observed results demonstrate the potential of using click curves to predict the quality of the top-ranked results.
Text selections as implicit relevance feedback BIBAFull-Text 1151-1152
  Ryen W. White; Georg Buscher
Users' search activity has been used as implicit feedback to model search interests and improve the performance of search systems. In search engines, this behavior usually takes the form of queries and result clicks. However, richer data on how people engage with search results can now be captured at scale, creating new opportunities to enhance search. In this poster we focus on one type of newly-observable behavior: text selection events on search-result captions. We show that we can use text selections as implicit feedback to significantly improve search result relevance.
Time to judge relevance as an indicator of assessor error BIBAFull-Text 1153-1154
  Mark D. Smucker; Chandra Prakash Jethani
When human assessors judge documents for their relevance to a search topic, it is possible for errors in judging to occur. As part of the analysis of the data collected from a 48 participant user study, we have discovered that when the participants made relevance judgments, the average participant spent more time to make errorful judgments than to make correct judgments. Thus, in relevance assessing scenarios similar to our user study, it may be possible to use the time taken to judge a document as an indicator of assessor error. Such an indicator could be used to identify documents that are candidates for adjudication or reassessment.
Towards alias detection without string similarity: an active learning based approach BIBAFull-Text 1155-1156
  Lili Jiang; Jianyong Wang; Ping Luo; Ning An; Min Wang
Entity aliases commonly exist and accurately detecting these aliases plays a vital role in various applications. In this paper, we use an active-learning-based method to detect aliases without string similarity. To minimize the cost on pairwise comparison, a subset-based method restricts the alias selection within a small-scale entity set. Within each generated entity set, an active learning based logistic regression classifier is employed to predict whether a candidate is the alias of a given entity. The experimental results on three datasets clearly demonstrate that our proposed approach can effectively detect this kind of entity aliases.
Towards zero-click mobile IR evaluation: knowing what and knowing when BIBAFull-Text 1157-1158
  Tetsuya Sakai
In this poster, we propose two evaluation tasks for mobile information access. The first task evaluates the system's ability to guess what the user's query should be given a context ("Knowing What"). The second task evaluates the system's ability to decide when to proactively deploy a given query ("Knowing When"). We conduct a preliminary manual analysis of a mobile query log to limit the space of possible queries so as to design feasible and practical evaluation tasks.
Twanchor text: a preliminary study of the value of tweets as anchor text BIBAFull-Text 1159-1160
  Gilad Mishne; Jimmy Lin
It is well known that anchor text plays an important role in search, providing signals that are often not present in the source document itself. The paper reports results of a preliminary investigation on the value of tweets and tweet conversations as anchor text. We show that using tweets as anchors improves significantly over using HTML anchors, and significantly increases recall of news item retrieval.
Unsupervised linear score normalization revisited BIBAFull-Text 1161-1162
  Ilya Markov; Avi Arampatzis; Fabio Crestani
We give a fresh look into score normalization for merging result-lists, isolating the problem from other components. We focus on three of the simplest, practical, and widely-used linear methods which do not require any training data, i.e. MinMax, Sum, and Z-Score. We provide theoretical arguments on why and when the methods work, and evaluate them experimentally. We find that MinMax is the most robust under many circumstances, and that Sum is -- in contrast to previous literature -- the worst. Based on the insights gained, we propose another three simple methods which work as good or better than the baselines.
User-aware caching and prefetching query results in web search engines BIBAFull-Text 1163-1164
  Hongyuan Ma; Bin Wang
Query results caching is an efficient technique for Web search engines. In this paper we present User-Aware Cache, a novel approach tailored for query results caching, that is based on user characteristics. We then use a trace of around 30 million queries to evaluate User-Aware Cache, as well as traditional methods and theoretical upper bounds. Experimental results show that this approach can achieve hit ratios better than state-of-the-art methods.
Using eye-tracking with dynamic areas of interest for analyzing interactive information retrieval BIBAFull-Text 1165-1166
  Vu Tuan Tran; Norbert Fuhr
Based on a new framework for capturing dynamic areas of interest in eye-tracking, we model the user search process as a Markov-chain. The analysis indicates possible system improvements and yields parameter estimates for the Interactive Probability Ranking Principle (IPRP).
Using PageRank to infer user preferences BIBAFull-Text 1167-1168
  Praveen Chandar; Ben Carterette
Recently, researchers have shown interest in the use of preference judgments for evaluation in IR literature. Although preference judgments have several advantages over absolute judgment, one of the major disadvantages is that the number of judgments needed increases polynomially as the number of documents in the pool increases. We propose a novel method using PageRank to minimize the number of judgments required to evaluate systems using preference judgments. We test the proposed hypotheses using the TREC 2004 to 2006 Terabyte dataset to show that it is possible to reduce the evaluation cost considerably. Further, we study the susceptibility of the methods due to assessor errors.
Utilizing inter-document similarities in federated search BIBAFull-Text 1169-1170
  Savva Khalaman; Oren Kurland
We demonstrate the merits of using inter-document similarities for federated search. Specifically, we study a results merging method that utilizes information induced from clusters of similar documents created across the lists retrieved from the collections. The method significantly outperforms state-of-the-art results merging approaches.
Want a coffee?: predicting users' trails BIBAFull-Text 1171-1172
  Wen Li; Carsten Eickhoff; Arjen P. de Vries
Twitter and Foursquare are two well-connected platforms for sharing information where growing numbers of users post location-related messages. In contrast to the longitude-latitude geotags commonly used online, e.g., on photos and tweets, new place-tags containing category information show more human-readable high-level information rather than a pair of coordinates. This grants an opportunity for better understanding users' physical locations which can be used as context to facilitate other applications, e.g., location context-aware advertisement. In this paper, we verify the assumption that users' current trails contain cues of their future routes. The results from the preliminary experiments show promising performance of a basic Markov Chain-based model.
Will this #hashtag be popular tomorrow? BIBAFull-Text 1173-1174
  Zongyang Ma; Aixin Sun; Gao Cong
Hashtags are widely used in Twitter to define a shared context for events or topics. In this paper, we aim to predict hashtag popularity in near future (i.e., next day). Given a hashtag that has the potential to be popular in the next day, we construct a hashtag profile using the tweets containing the hashtag, and extract both content and context features for hashtag popularity prediction. We model this prediction problem as a classification problem and evaluate the effectiveness of the extracted features and classification models.
$100,000 prize jackpot. call now!: identifying the pertinent features of SMS spam BIBAFull-Text 1175-1176
  Henry Tan; Nazli Goharian; Micah Sherr
Mobile SMS spam is on the rise and is a prevalent problem. While recent work has shown that simple machine learning techniques can distinguish between ham and spam with high accuracy, this paper explores the individual contributions of various textual features in the classification process. Our results reveal the surprising finding that simple is better: using the largest spam corpus of which we are aware, we find that using simple textual features is sufficient to provide accuracy that is nearly identical to that achieved by the best known techniques, while achieving a twofold speedup.

Tutorial presentations

Beyond bag-of-words: machine learning for query-document matching in web search BIBFull-Text 1177
  Hang Li; Jun Xu
Methods for mining and summarizing text conversations BIBAFull-Text 1178-1179
  Giuseppe Carenini; Gabrial Murray
More and more today, people are engaging in conversations via email, blogs, discussion forums, text messaging and other social media. A person may want to archive these conversations and later retrieve information about what was discussed, or analyze a conversation in real-time. What topics are covered in these conversations? What opinions are people expressing? Have any decisions been made? Have action items been assigned? This tutorial will present various natural language processing (NLP) techniques that can help answer these questions, thus creating numerous new and valuable applications that can support people in more effectively participating in these conversation. The tutorial is based on a book that we have recently published, Methods for Mining and Summarizing Text Conversations.
Crowdsourcing for search evaluation and social-algorithmic search BIBAFull-Text 1180
  Matthew Lease; Omar Alonso
The first computers were people. Today, Internet-based access to 24/7 online human crowds has led to a renaissance of research in human computation and the advent of crowdsourcing. These new opportunities have brought a disruptive shift to research and practice for how we build intelligent systems today. Not only can labeled data for training and evaluation be collected faster, cheaper, and easier than ever before, but we now see human computation being integrated into the systems themselves, operating in concert with automation. This tutorial introduces opportunities and challenges of human computation and crowdsourcing, particularly for search evaluation and developing hybrid search solutions that integrate human computation with traditional forms of automated search. We review methodology and findings of recent research and survey current generation crowdsourcing platforms now available, analyzing methods, potential, and limitations across platforms.
(Big) usage data in web search BIBFull-Text 1181-1182
  Ricardo Baeza-Yates; Yoelle Maarek
A new look at old tricks: the fertile roots of current research BIBFull-Text 1183
  Paul Kantor
Aspect-based opinion mining from product reviews BIBAFull-Text 1184
  Samaneh Moghaddam; Martin Ester
"What other people think" has always been an important piece of information for most of us during the decision-making process. Today people tend to make their opinions available to other people via the Internet. As a result, the Web has become an excellent source of consumer opinions. There are now numerous Web resources containing such opinions, e.g., product reviews forums, discussion groups, and blogs. But, it is really difficult for a customer to read all of the reviews and make an informed decision on whether to purchase the product. It is also difficult for the manufacturer of the product to keep track and manage customer opinions. Also, focusing on just user ratings (stars) is not a sufficient source of information for a user or the manufacturer to make decisions. Therefore, mining online reviews (opinion mining) has emerged as an interesting new research direction. Extracting aspects and the corresponding ratings is an important challenge in opinion mining. An aspect is an attribute or component of a product, e.g. 'zoom' for a digital camera. A rating is an intended interpretation of the user satisfaction in terms of numerical values. Reviewers usually express the rating of an aspect by a set of sentiments, e.g. 'great zoom'. In this tutorial we cover opinion mining in online product reviews with the focus on aspect-based opinion mining. This problem is a key task in the area of opinion mining and has attracted a lot of researchers in the information retrieval community recently. Several opinion related information retrieval tasks can benefit from the results of aspect-based opinion mining and therefore it is considered as a fundamental problem. This tutorial covers not only general opinion mining and retrieval tasks, but also state-of-the-art methods, challenges, applications, and also future research directions of aspect-based opinion mining.
Experimental methods for information retrieval BIBFull-Text 1185-1186
  Donald Metzler; Oren Kurland
IR models: foundations and relationships BIBAFull-Text 1187-1188
  Thomas Roelleke
In IR research it is essential to know IR models. Research over the past years has consolidated the foundations of IR models. Moreover, relationships have been reported that help to use and position IR models. Knowing about the foundations and relationships of IR models can significantly improve building information management systems.
   The first part of this tutorial presents an in-depth consolidation of the foundations of the main IR models (TF-IDF, BM25, LM). Particular attention will be given to notation and probabilistic roots. The second part crystallises the relationships between models. Does LM embody IDF? How "heuristic" is TF-IDF? What are the probabilistic roots? How are LM and the probability of relevance related? What are the components shared by the main IR models?
   After the tutorial, attendees will be familiar with a consolidated view on IR models. The tutorial will be illustrative and interactive, providing opportunities to exchange controversial issues and research challenges.
Patent information retrieval: an instance of domain-specific search BIBAFull-Text 1189-1190
  Mihai Lupu
The tutorial aims to provide the IR researchers with an understanding of how the patent system works, the challenges that patent searchers face in using the existing tools and in adopting new methods developed in academia.
   At the same time, the tutorial will inform the IR researcher about the unique opportunities that the patent domain provides: a large amount of multi-lingual and multi-modal documents, the widest possible span of covered domains, a highly annotated corpus and, very importantly, relevance judgements created by experts in the fields and recorded electronically in the documents.
   The combination of these two objectives leads to the main purpose of the tutorial: to create awareness and to encourage more emphasis on the patent domain in the IR community. Table 1 provides details on how the tutorial covers the topics of the SIGIR conference.
Medical information retrieval: an instance of domain-specific search BIBAFull-Text 1191-1192
  Allan Hanbury
Due to an explosion in the amount of medical information available, search techniques are gaining importance in the medical domain. This tutorial discusses recent results on search in the medical domain, including the outcome of surveys on end user requirements, research relevant to the field, and current medical and health search applications available. Finally, the extent to which available techniques meet user requirements are discussed, and open challenges in the field are identified.
Visual information retrieval using Java and LIRE BIBAFull-Text 1193
  Oge Marques; Mathias Lux
Visual information retrieval (VIR) is an active and vibrant research area, which attempts at providing means for organizing, indexing, annotating, and retrieving visual information (images and videos) form large, unstructured repositories. The goal of VIR is to retrieve the highest number of relevant matches to a given query (often expressed as an example image and/or a series of keywords). In its early years (1995-2000) the research efforts were dominated by content-based approaches contributed primarily by the image and video processing community. During the past decade, it was widely recognized that the challenges imposed by the semantic gap (the lack of coincidence between an image's visual contents and its semantic interpretation) required a clever use of textual metadata (in addition to information extracted from the image's pixel contents) to make image and video retrieval solutions efficient and effective. The need to bridge (or at least narrow) the semantic gap has been one of the driving forces behind current VIR research. Additionally, other related research problems and market opportunities have started to emerge, offering a broad range of exciting problems for computer scientists and engineers to work on. In this tutorial, we present an overview of visual information retrieval (VIR) concepts, techniques, algorithms, and applications. Several topics are supported by examples written in Java, using Lucene (an open-source Java-based indexing and search implementation) and LIRE (Lucene Image REtrieval), an open-source Java-based library for content-based image retrieval (CBIR) written by Mathias Lux.
   After motivating the topic, we briefly review the fundamentals of information retrieval, present the most relevant and effective visual descriptors currently used in VIR, the most common indexing approaches for visual descriptors, the most prominent machine learning techniques used in connection with contemporary VIR solutions, as well as the challenges associated with building real-world, large scale VIR solutions, including a brief overview of publicly available datasets used in worldwide challenges, contests, and benchmarks. Throughout the tutorial, we integrate examples using LIRE, whose main features and design principles are also discussed. Finally, we conclude the tutorial with suggestions for deepening the knowledge in the topic, including a brief discussion of the most relevant advances, open challenges, and promising opportunities in VIR and related areas.
   The tutorial is primarily targeted at experienced Information Retrieval researchers and practitioners interested in extending their knowledge of document-based IR to equivalent concepts, techniques, and challenges in VIR. The acquired knowledge should allow participants to derive insightful conclusions and promising avenues for further investigation.
Large-scale graph mining and learning for information retrieval BIBAFull-Text 1194-1195
  Bin Gao; Taifeng Wang; Tie-Yan Liu
For many information retrieval applications, we need to deal with the ranking problem on very large scale graphs. However, it is non-trivial to perform efficient and effective ranking on them. On one aspect, we need to design scalable algorithms. On another aspect, we also need to develop powerful computational infrastructure to support these algorithms. This tutorial aims at giving a timely introduction to the promising advances in the aforementioned aspects in recent years, and providing the audiences with a comprehensive view on the related literature.
Query performance prediction for IR BIBAFull-Text 1196-1197
  David Carmel; Oren Kurland
The goal of this tutorial is to expose participants to current research on query performance prediction. Participants will become familiar with state-of-the-art performance prediction methods, with common evaluation methodologies of prediction quality, and with potential applications that can utilize performance predictors. In addition, some open issues and challenges in the field will be discussed.
   This tutorial is an updated version of the SIGIR 2010 tutorial presented by David Carmel and Elad Yom-Tov on the same subject. This year we intend to expand on new results in the field, in particular focusing on recently developed frameworks that provide a unified model for performance prediction.
Collaborative information seeking: art and science of achieving 1+1>2 in IR BIBAFull-Text 1198-1199
  Chirag Shah
The assumption of information seekers being independent and IR problem being individual has been challenged often in the recent past, with an argument that the next big leap in search and retrieval will come through incorporating social and collaborative aspects of information seeking. This half-day tutorial will introduce the student to theories, methodologies, and tools that focus on information retrieval/seeking in collaboration. The student will have an opportunity to learn about the social aspect of IR with a focus on collaborative information seeking (CIS) situations, systems, and evaluation techniques. The course is intended for those interested in social and collaborative aspects of IR (from both academia and industry), and requires only a general understanding of IR systems and evaluation.
Advances on the development of evaluation measures BIBAFull-Text 1200-1201
  Ben Carterette; Evangelos Kanoulas; Emine Yilmaz
The goal of the tutorial is to provide attendees with a comprehensive overview of the latest advances in the development of information retrieval evaluation measures and discuss the current challenges in the area. A number of topics are covered, including background in traditional evaluation paradigm and traditional evaluation measures, evaluation measures based on user models, advanced models of user interaction with search engines, measures based on these models, measures for novelty and diversity, and session-based measures.