HCI Bibliography Home | HCI Conferences | ADCS Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
ADCS Tables of Contents: 121314

Proceedings of the 2014 Australasian Document Computing Symposium

Fullname:Proceedings of the 19th Australasian Document Computing Symposium
Editors:J. Shane Culpepper; Laurence Park; Guido Zuccon
Location:Melbourne, Australia
Dates:2014-Nov-27 to 2014-Nov-29
Standard No:ISBN: 978-1-4503-3000-8; ACM DL: Table of Contents; hcibib: ADCS14
Links:Conference Website
  1. Keynote
  2. Papers
  3. Posters


Diversity, Intent, and Aggregated Search BIBAFull-Text 1
  Maarten de Rijke
Diversity, intent and aggregated search are three core retrieval concepts that receive significant attention. In search result diversification one typically considers the relevance of a document in light of other retrieved documents. The goal is to identify the probable "aspects" of an ambiguous query, retrieve documents for each of these aspects and make the search results more diverse. By doing so, in the absence of any knowledge of users' context or preferences, the chance that the user will find at least one of these results to be relevant to their underlying information need is increased. Those probable "aspects" of a query may refer to lexical ambiguity (e.g., flash -- Adobe Flash, flash light, flash gordon, flash airlines, flash mob,...) or to intentional ambiguity (e.g., pizza -- how to make one, where to buy one, images, nutritional value, background, restaurant,...). The automatic discovery of query intent has become an active research area, with a range of observational and algorithmic studies as outcomes. Understanding the likely intents behind a query can help search engines to automatically route the query to the corresponding vertical search engines so as to obtain particularly relevant results, thus greatly improving user satisfaction. In aggregated search the task is to search and assemble information from a variety of sources and to organize the resulting material within a single interface. The result page of a modern search engine often goes beyond a simple ranked list. Many specific intents are addressed by aggregated search solutions: specially presented documents, often retrieved from specific sources, that stand out from the regular organic search results.


Improving test collection pools with machine learning BIBAFull-Text 2-9
  Gaya K. Jayasinghe; William Webber; Mark Sanderson; J. Shane Culpepper
IR experiments typically use test collections for evaluation. Such test collections are formed by judging a pool of documents retrieved by a combination of automatic and manual runs for each topic. The proportion of relevant documents found for each topic depends on the diversity across each of the runs submitted and the depth to which runs are assessed (pool depth). Manual runs are commonly believed to reduce bias in test collections when evaluating new IR systems.
   In this work, we explore alternative approaches to improving test collection reliability. Using fully automated approaches, we are able to recognise a large portion of relevant documents that would normally only be found through manual runs. Our approach combines simple fusion methods with machine learning. The approach demonstrates the potential to find many more relevant documents than are found using traditional pooling approaches. Our initial results are promising and can be extended in future studies to help test collection curators ensure proper judgment coverage is maintained across the entire document collection.
A Study of Querying Behaviour of Expert and Non-expert Users of Biomedical Search Systems BIBAFull-Text 10-17
  Sadegh Kharazmi; Sarvnaz Karimi; Falk Scholer; Adam Clark
The amount of biomedical literature, and the popularity of health-related searches, are both growing rapidly. While most biomedical search systems offer a range of advanced features, there is limited understanding of user preferences, and how searcher expertise relates to the use and perception of different search features in this domain. Through a controlled user study where both medical experts and non-medical participants were asked to carry-out informational searches in a task-based environment, we seek to understand how querying behaviour differs, both in the formulation of query strings, and in the use of advanced querying features. Our results suggest that preferences vary substantially between these groups of users, and that biomedical search systems need to offer a range of tools in order to effectively support both types of searchers.
Graph Representations and Applications of Citation Networks BIBAFull-Text 18-25
  Matthias Petri; Alistair Moffat; Anthony Wirth
A citation network is a structure of linked documents that share a pool of authors and a pool of subjects, and via citations, provide references to related documents that have preceded them in the chronology of research. In this paper we review citation networks, and survey and categorize the operations that extract data from them. Our goal is to create a framework against which proposed implementations can be assessed, and to provide a basis for research in to algorithms and techniques that might be applied to citation networks. In particular, we seek to extend the concept of "search" over a citation network, to allow for ranked retrieval models in which a wide range of factors influence the list of answers that is presented to the user in response to a query.
Examining New Event Detection BIBAFull-Text 26-33
  Johannes Schanda; Mark Sanderson; Paul Clough
We examine the accuracy of first story detection on traditional news collections and on a re-purposed source of academic material. The impact on accuracy of detecting an early rather than the first story is examined, showing that accuracy increases under a broader time window, however, the increases on some collections are small. Even on collections where the increase is large, many new events are still missed and there remains an underlying challenge to detecting new events. An analysis of temporal and vocabulary profiles of topics within their source collections is conducted. Analysis of the results establish the underlying causes of the patterns seen in the experimental results with respect to the different source types and performance. The usefulness of new criteria for new event detection and success across source types is discussed.
Tensor Reduction for User Profiling in Personalized Recommender Systems BIBAFull-Text 34-41
  Xiaoyu Tang; Yue Xu; Shlomo Geva
User profiling is the process of constructing user models which represent personal characteristics and preferences of customers. User profiles play a central role in many recommender systems. Recommender systems recommend items to users based on user profiles, in which the items can be any objects which the users are interested in, such as documents, web pages, books, movies, etc. In recent years, multidimensional data are getting more and more attention for creating better recommender systems from both academia and industry. Additional metadata provides algorithms with more details for better understanding the interactions between users and items. However, most of the existing user/item profiling techniques for multidimensional data analyze data through splitting the multidimensional relations, which causes information loss of the multidimensionality. In this paper, we propose a user profiling approach using a tensor reduction algorithm, which we will show is based on a Tucker2 model. The proposed profiling approach incorporates latent interactions between all dimensions into user profiles, which significantly benefits the quality of neighborhood formation. We further propose to integrate the profiling approach into neighborhood-based collaborative filtering recommender algorithms. Experimental results show significant improvements in terms of recommendation accuracy.
Evaluating Diversity and Redundancy-Based Search Metrics Independently BIBAFull-Text 42-49
  Ake Tangsomboon; Teerapong Leelanupab
This paper proposes a new evaluation metric, normalized Coverage Frequency (nCF), which aims to explicitly evaluate the diversity of search results, going beyond the drawbacks of previously proposed measures. In fact, two of the most widely adopted metrics for the diversity retrieval task, namely α-nDCG and Intent-Aware Expected Reciprocal Rank (ERR-IA), explicitly evaluate redundancy, but not diversity. While there exists a genuine diversity-based metric called Intent Recall (I-rec), it has some drawbacks. These drawbacks may be inherited by other derived metrics such as D#-nDCG, which combines I-rec with a modified version of nDCG.
   The proposed nCF metric assesses how often query-intents are successfully covered throughout a ranked list up to a given rank position. A comprehensive study is conducted using both real and synthetic data to compare nCF with α-nDCG, ERR-IA, I-recall and D#-nDCG. Results show that the proposed metric correlates well with the existing ones while it is capable of capturing other factors, e.g., a series of coverage. In addition, we categorize the existing metrics into two distinct groups, i.e., diversity and novelty, based upon their intuitive measurements and suggest that they be used independently according to what they quantify for the ease of performance interpretation.
Compression, SIMD, and Postings Lists BIBAFull-Text 50-57
  Andrew Trotman
The three generations of postings list compression strategies (Variable Byte Encoding, Word Aligned Codes, and SIMD Codecs) are examined in order to test whether or not each truly represented a generational change -- they do. Some weaknesses of the current SIMD-based schemes are identified and a new scheme, QMX, is introduced to address both space and decoding inefficiencies. Improvements are examined on multiple architectures and it is shown that different SSE implementations (Intel and AMD) perform differently.
Improvements to BM25 and Language Models Examined BIBAFull-Text 58-65
  Andrew Trotman; Antti Puurula; Blake Burgess
Recent work on search engine ranking functions report improvements on BM25 and Language Models with Dirichlet Smoothing. In this investigation 9 recent ranking functions (BM25, BM25+, BM25T, BM25-adpt, BM25L, TF1°δ°p×ID, LM-DS, LM-PYP, and LM-PYP-TFIDF) are compared by training on the INEX 2009 Wikipedia collection and testing on INEX 2010 and 9 TREC collections. We find that once trained (using particle swarm optimization) there is very little difference in performance between these functions, that relevance feedback is effective, that stemming is effective, and that it remains unclear which function is best over-all.
Pinpointing Locational Focus in Microblogs BIBAFull-Text 66-72
  Jie Yin; Sarvnaz Karimi; John Lingad
Extracting the geographical location that a tweet is about is crucial for many important applications ranging from disaster management to recommendation systems. We address the problem of finding the locational focus of tweets that is geographically identifiable on a map. Because of the short, noisy nature of tweets and inherent ambiguity of locations, tweet text alone cannot provide sufficient information for disambiguating the location mentions and inferring the actual location focus being referred to in a tweet. Therefore, we present a novel algorithm that identifies all location mentions from three information sources -- tweet text, hashtags, and user profile -- and then uses a gazetteer database to infer the most probable locational focus of a tweet. Our novel algorithm has the ability to infer a locational focus that may not be explicitly mentioned in the tweet and determine its most appropriate granularity, e.g., city or country.


Tweet Author Location Impacts on Tweet Credibility BIBAFull-Text 73-76
  Suliman Aladhadh; Xiuzhen Zhang; Mark Sanderson
We investigate how certain features affect user perceptions of the credibility of tweets. Using a crowd sourcing experiment, we found that users perceive the credibility of tweets is impacted more by some features than by others. Most notably, we discover that displaying the location of certain types of tweets causes users viewing these tweets to perceive the tweets as more credible.
Analysing User Access To An Online Newspaper BIBAFull-Text 77-80
  Husna Sarirah Husin; James A. Thom; Xiuzhen Zhang
There have been several studies of online newspapers that use web server logs to analyze traffic and their user behavior but most of these studies were undertaken requiring a demographic profile of the users. Our study adds to the literature by empirically examining user behavior using web server logs in the absence of demographic information, explaining in detail the preprocessing stage and discussing the user access to contents of online newspaper. Our findings show that the online news website is highly accessed in the morning, access mostly by desktop users who are local and like to read the National and Sports news. The majority of users are only occasional visitors with only one session in the month.
Retrieving Passages and Finding Answers BIBAFull-Text 81-84
  Mostafa Keikha; Jae Hyun Park; W. Bruce Croft; Mark Sanderson
Retrieving topically-relevant text passages in documents has been studied many times, but finding non-factoid, multiple sentence answers to web queries is a different task that is becoming increasingly important for applications such as mobile search. As the first stage of developing retrieval models for "answer passages", we describe the process of creating a test collection of questions and multiple-sentence answers based on the TREC GOV2 queries and documents. This annotation shows that most of the description-length TREC queries do in fact have passage-level answers. We then examine the effectiveness of current passage retrieval models in terms of finding passages that contain answers. We show that the existing methods are not effective for this task, and also observe that the relative performance of these methods in retrieving answers does not correspond to their performance in retrieving relevant documents.
Document Timespan Normalisation and Understanding Temporality for Clinical Records Search BIBAFull-Text 85-88
  Bevan Koopman; Guido Zuccon
Previous qualitative research has highlighted that temporality plays an important role in relevance for clinical records search. In this study, an investigation is undertaken to determine the effect that the timespan of events within a patient record has on relevance in a retrieval scenario. In addition, based on the standard practise of document length normalisation, a document timespan normalisation model that specifically accounts for timespans is proposed. Initial analysis revealed that in general relevant patient records tended to cover a longer timespan of events than non-relevant patient records. However, an empirical evaluation using the TREC Medical Records track supports the opposite view that shorter documents (in terms of timespan) are better for retrieval. These findings highlight that the role of temporality in relevance is complex and how to effectively deal with temporality within a retrieval scenario remains an open question.
How Effective are Proximity Scores in Term Dependency Models? BIBAFull-Text 89-92
  Xiaolu Lu; Alistair Moffat; J. Shane Culpepper
The dominant retrieval models in information retrieval systems today are variants of TF×IDF, and typically use bag-of-words processing in order to balance recall and precision. However, the size of collections continues to increase, and the number of results produced by these models exceeds the number of documents that can be reasonably assessed. To address this need, researchers and commercial providers are now looking at more expensive computational models to improve the quality of the results returned. One such method is to incorporate term proximity into the ranking model. We explore the effectiveness gains achievable when term proximity is a factor used in ranking algorithms, and explore the relative effectiveness of several variants of the term dependency model. Our goal is to understand how these proximity-based models improve effectiveness.
Medical Free-Text to Concept Mapping as an Information Retrieval Problem BIBAFull-Text 93-96
  Shahin Mirhosseini; Guido Zuccon; Bevan Koopman; Anthony Nguyen; Michael Lawley
Concept mapping involves determining relevant concepts from a free-text input, where concepts are defined in an external reference ontology. This is an important process that underpins many applications for clinical information reporting, derivation of phenotypic descriptions, and a number of state-of-the-art medical information retrieval methods. Concept mapping can be cast into an information retrieval (IR) problem: free-text mentions are treated as queries and concepts from a reference ontology as the documents to be indexed and retrieved. This paper presents an empirical investigation applying general-purpose IR techniques for concept mapping in the medical domain. A dataset used for evaluating medical information extraction is adapted to measure the effectiveness of the considered IR approaches. Standard IR approaches used here are contrasted with the effectiveness of two established benchmark methods specifically developed for medical concept mapping. The empirical findings show that the IR approaches are comparable with one benchmark method but well below the best benchmark.
Assessing the Cognitive Complexity of Information Needs BIBAFull-Text 97-100
  Alistair Moffat; Peter Bailey; Falk Scholer; Paul Thomas
Information retrieval systems can be evaluated in laboratory settings through the use of user studies, and through the use of test collections and effectiveness metrics. In a larger investigation we are exploring the extent to which individual user differences and behaviours can affect the scores generated by a retrieval system.
   Our objective in the first phase of that project is to define information need statements corresponding to a range of TREC search tasks, and to categorise those statements in terms of task complexity. The goal is to reach a position from which we can determine whether user actions while searching are influenced by the way the information need is expressed, and by the fundamental nature of the information need. We describe the process used to create information need statements, and then report inter- and intra-assessor agreements across four annotators. We conclude that assessing the relative cognitive complexity of tasks is a complex activity, even for experienced annotators.
By the power of Grayskull: Small sample statistical power in Information Retrieval evaluation BIBAFull-Text 101-104
  Laurence A. F. Park; Glenn Stone
Information Retrieval evaluation is typically performed using a sample of queries and a statistical hypothesis test is used to make inferences about the systems accuracy on the population of queries. Research has shown that the t test is one of a set of tests that provides the greatest statistical power while maintaining acceptable type I error rates, when evaluating with a large sample of queries. In this article, we investigate the effect of using a small query sample on the control of the type I error rate and change in type II error rate of a given set of hypothesis tests, meaning that the hypothesis tests may not satisfy Central Limit Theorem conditions. We found that all test performed similarly for unpaired tests. We also found that the bootstrap test provided greater power for the paired test, but violated the desired type I error rate for the smallest sample size (5 queries).
Identifying Re-finding Difficulty from User Query Logs BIBAFull-Text 105-108
  Sargol Sadeghi; Roi Blanco; Peter Mika; Mark Sanderson; Falk Scholer; David Vallet
This paper presents a first study of how consistently human assessors are able to identify, from query logs, when searchers are facing difficulties re-finding documents. Using 12 assessors, we investigate the effect of two variables on assessor agreement: the assessment guideline detail, and assessor experience. The results indicate statistically significant better agreement when using detailed guidelines. An upper agreement of 78.9% was achieved, which is comparable to the levels of agreement in other information retrieval contexts. The effects of two contextual factors, representative of system performance and user effort, were studied. Significant differences between agreement levels were found for both factors, suggesting that contextual factors may play an important role in obtaining higher agreement levels. The findings contribute to a better understanding of how to generate ground truth data both in the re-finding and other labeling contexts, and have further implications for building automatic re-finding difficulty prediction models.
Towards Universal Search Design BIBAFull-Text 109-112
  Laurianne Sitbon; Lauren Fell; David Poxon; Jinglan Zhang; Shlomo Geva
For people with cognitive disabilities, technology is more often thought of as a support mechanism, rather than a source of division that may require intervention to equalize access across the cognitive spectrum. This paper presents a first attempt at formalizing the digital gap created by the generalization of search engines. This was achieved through the development of a mapping of cognitive abilities required by users to execute low-level tasks during a standard Web search task. The mapping demonstrates how critical these abilities are to successfully use search engines with an adequate level of independence. It will lead to a set of design guidelines for search engine interfaces that will allow for the engagement of users of all abilities, and also, more importantly, in search algorithms such as query suggestion and measure of relevance (i.e. ranking).
Blended Dictionaries for Reduced-Memory Lempel-Ziv Corpus Compression BIBAFull-Text 113-116
  Jiancong Tong; Anthony Wirth; Justin Zobel
Relative Lempel-Ziv (RLZ) compression has been shown to be effective for compression of large text repositories. It provides high compression ratios with extremely fast atomic decompression of individual documents. However, it depends on a large in-memory dictionary, which is implemented as a contiguous string that must be accessed randomly during the decompression process. In this paper we explore how compressed suffix arrays might reduce the size of the dictionary. These suffix arrays drastically increase the cost of accessing individual characters, however, so we propose splitting of the dictionary: an uncompressed structure for frequently accessed dictionary elements, with compression for the remainder. Our results show that splitting provides a smoothly tuneable trade-off between access time and memory requirements, but does not overcome the inherent limitations of compressed suffix arrays for this application, with decompression time growing by a factor of 10 for even the best combination of parameters. Suffix arrays comprise an attractive option where memory is limited, high compression is paramount, and decompression speed is unimportant.
Automated Categorisation of Patent Claims that Reference Human Genome Sequences BIBAFull-Text 117-120
  Donglu Wang; Gabriela Ferraro; Hanna Suominen; Osmat A. Jefferson
Debates on gene patents have necessitated the analysis of patents that disclose and reference human sequences. In this study, we built an automated classifier that assigns sequences to one of nine predefined categories according to their functional roles in patent claims by applying natural language processing and supervised learning techniques. To improve its correctness, we experimented with various feature mappings, resulting in the maximal accuracy of 79%.