HCI Bibliography Home | HCI Conferences | HYPER Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
HYPER Tables of Contents: 040506070809101112131415

Proceedings of the 2014 ACM Conference on Hypertext and Social Media

Fullname:HT'14: 25th ACM Conference on Hypertext & Social Media
Editors:Leo Ferres; Gustavo Rossi; Virgilio Almeida; Eelco Herder
Location:Satiago, Chile
Dates:2014-Sep-01 to 2014-Sep-04
Standard No:ISBN: 978-1-4503-2954-5; ACM DL: Table of Contents; hcibib: HYPER14
Links:Conference Website
  1. Keynotes
  2. Full papers
  3. Short papers
  4. Posters and demos


The wisdom of ad-hoc crowds BIBAFull-Text 1-2
  Ricardo Baeza-Yates
In this keynote we give an introduction to wisdom of crowds in the Web, the long tail of web content, and the bias involved in the generation of user generated content (UGC). This bias creates the wisdom of ad-hoc crowds or the wisdom of a few. Although it is well known that user activity in most settings follows a power law, that is, few people do a lot, while most do nothing, there are few studies that characterize well this activity. In a recent analysis of social network data we corroborated that a small percentage of the active users (passive users are the majority) represent at least the 50% of the UGC. This implies that most of the wisdom comes from a few users, which is not that surprising, as the Web is a reflection of our own society, where economical or political power also is in the hands of minorities.
Big data visualization engines for understanding the development of countries, social networks, culture and cities BIBAFull-Text 3
  Cesar Hidalgo
Big data can be used for more than improving the targeting of marketing campaigns. In this talk I will present five big data visualization engines we have created at the MIT Media Lab's Macro Connections group and will show how we can use big data and visualizations to improve our understanding of the development of economies, cultures and cities. The data visualization engines I will demo include (i) the Observatory of Economic Complexity (atlas.media.mit.edu), which is the most comprehensive tool for exploring international trade data created to date; (ii) DataViva (dataviva.info), which is a tool we created to open up data for the entire formal sector economy of Brazil, including data on all of the working force, municipalities, industries, and occupations of Brazil; (iii) Pantheon (pantheon.media.mit.edu), a dataset and visualization engine we created to explore global patterns of cultural production; (iv) Immersion (immersion.media.mit.edu), which is a tool that inverts the email interface, by focusing it on people rather than messages; and (v) Place Pulse and StreetScore (pulse.media.mit.edu & streetscore.media.mit.edu), which are crowd-sourcing and machine learning tools we have developed to help understand the aesthetic aspects of cities and their evolution.

Full papers

Scalable learning of users' preferences using networked data BIBAFull-Text 4-12
  Mohammad Ali Abbasi; Jiliang Tang; Huan Liu
Users' personal information such as their political views is important for many applications such as targeted advertisements or real-time monitoring of political opinions. Huge amounts of data generated by social media users present opportunities and challenges to study these preferences in a large scale. In this paper, we aim to infer social media users' political views when only network information is available. In particular, given personal preferences about some of the social media users, how can we infer the preferences of unobserved individuals in the same network? There are many existing solutions that address the problem of classification with networked data problem. However, networks in social media normally involve millions and even hundreds of millions of nodes, which make the scalability an important problem in inferring personal preferences in social media. To address the scalability issue, we use social influence theory to construct new features based on a combination of local and global structures of the network. Then we use these features to train classifiers and predict users' preferences. Due to the size of real-world social networks, using the entire network information is inefficient and not practical in many cases. By extracting local social dimensions, we present an efficient and scalable solution. Further, by capturing the network's global pattern, the proposed solution, balances the performance requirement between accuracy and efficiency.
Recognizing skill networks and their specific communication and connection practices BIBAFull-Text 13-23
  Sergiu Chelaru; Eelco Herder; Kaweh Djafari Naini; Patrick Siehndel
Social networks are a popular medium for building and maintaining a professional network. Many studies exist on general communication and connection practices within these networks. However, studies on expertise search suggest the existence of subgroups centered around a particular profession. In this paper, we analyze commonalities and differences between these groups, based on a set of 94,155 public user profiles. The results confirm that such subgroups can be recognized. Further, the average number of connections differs between groups, as a result of differences in intention for using social media. Similarly, within the groups, specific topics and resources are discussed and shared, and there are interesting differences in the tone and wording the group members use. These insights are relevant for interpreting results from social media analyses and can be used for identifying group-specific resources and communication practices that new members may want to know about.
Online popularity and topical interests through the lens of Instagram BIBAFull-Text 24-34
  Emilio Ferrara; Roberto Interdonato; Andrea Tagarelli
Online socio-technical systems can be studied as proxy of the real world to investigate human behavior and social interactions at scale. Here we focus on Instagram, a media-sharing online platform whose popularity has been rising up to gathering hundred millions users. Instagram exhibits a mixture of features including social structure, social tagging and media sharing. The network of social interactions among users models various dynamics including follower/followee relations and users' communication by means of posts/comments. Users can upload and tag media such as photos and pictures, and they can "like" and comment each piece of information on the platform. In this work we investigate three major aspects on our Instagram dataset: (i) the structural characteristics of its network of heterogeneous interactions, to unveil the emergence of self organization and topically-induced community structure; (ii) the dynamics of content production and consumption, to understand how global trends and popular users emerge; (iii) the behavior of users labeling media with tags, to determine how they devote their attention and to explore the variety of their topical interests.
   Our analysis provides clues to understand human behavior dynamics on socio-technical systems, specifically users and content popularity, the mechanisms of users' interactions in online environments and how collective trends emerge from individuals' topical interests.
Scalable, generic, and adaptive systems for focused crawling BIBAFull-Text 35-45
  Georges Gouriten; Silviu Maniu; Pierre Senellart
Focused crawling is the process of exploring a graph iteratively, focusing on parts of the graph relevant to a given topic. It occurs in many situations such as a company collecting data on competition, a journalist surfing the Web to investigate a political scandal, or an archivist recording the activity of influential Twitter users during a presidential election. In all these applications, users explore a graph (e.g., the Web or a social network), nodes are discovered one by one, the total number of exploration steps is constrained, some nodes are more valuable than others, and the objective is to maximize the total value of the crawled subgraph. In this article, we introduce scalable, generic, and adaptive systems for focused crawling. Our first effort is to define an abstraction of focused crawling applicable to a large domain of real-world scenarios. We then propose a generic algorithm, which allows us to identify and optimize the relevant subsystems. We prove the intractability of finding an optimal exploration, even when all the information is available.
   Taking this intractability into account, we investigate how the crawler can be steered in several experimental graphs. We show the good performance of a greedy strategy and the importance of being able to run at each step a new estimation of the crawling frontier. We then discuss this estimation through heuristics, self-trained regression, and multi-armed bandits. Finally, we investigate their scalability and efficiency in different real-world scenarios and by comparing with state-of-the-art systems.
An author-reader influence model for detecting topic-based influencers in social media BIBAFull-Text 46-55
  Jonathan Herzig; Yosi Mass; Haggai Roitman
This work addresses the problem of detecting topic-based influencers in social media. For that end, we devise a novel behavioral model of authors and readers, where authors try to influence readers by generating "attractive" content, which is both relevant and unique, and readers can become authors themselves by further citing or referencing content made by other authors. The model is realized by means of a content-based citation graph, where nodes represent authors with their generated content and edges represent reader-to-author citations. To find the top influencers for a given topic, we first profile the content of authors (nodes) and citations (edges) and derive topic-based similarity scores to the topic, which further model the unique and relevant topic interests of users. We then present three different extensions of the Topic-Sensitive PageRank algorithm that exploit the similarity scores to find topic-based influencers. We evaluate our solution on a large real-world dataset that was gathered from Twitter by measuring information diffusion in social networks. We show that, overall, our methods outperform several state-of-the-art methods. This work further serves as an evidence that the topic uniqueness aspect in user interests within social media should be considered for the influencers detection task; this is in comparison to previous works that have solely focused on detecting topic-based influencers using the combination of link structure and topic-relevance.
Exploiting the wisdom of the crowds for characterizing and connecting heterogeneous resources BIBAFull-Text 56-65
  Ricardo Kawase; Patrick Siehndel; Bernardo Pereira Nunes; Eelco Herder; Wolfgang Nejdl
Heterogeneous content is an inherent problem for cross-system search, recommendation and personalization. In this paper we investigate differences in topic coverage and the impact of topics in different kinds of Web services. We use entity extraction and categorization to create fingerprints that allow for meaningful comparison. As a basis taxonomy, we use the 23 main categories of Wikipedia Category Graph, which has been assembled over the years by the wisdom of the crowds. Following a proof of concept of our approach, we analyze differences in topic coverage and topic impact. The results show many differences between Web services like Twitter, Flickr and Delicious, which reflect users' behavior and the usage of each system. The paper concludes with a user study that demonstrates the benefits of fingerprints over traditional textual methods for recommendations of heterogeneous resources.
Cross-site personalization: assisting users in addressing information needs that span independently hosted websites BIBAFull-Text 66-76
  Kevin Koidl; Owen Conlan; Vincent Wade
This paper discusses Cross-Site Personalization (CSP) an approach to provide personalized assistance for the user in addressing information needs that span independently hosted websites. This is done by seamlessly personalizing the support offered to each individual user, as they browse across multiple websites, by modeling the user's interactions and then augmenting information access points, such as links, on each independent website. Cross-Site Personalization is realized as a third-party API offering Personalisation As a Service to ensure cross-site and cross-device usage. The personalized augmentations are provided through module extensions for the Web-based Content Management Systems (WCMS) Drupal. The approach is non-intrusive and does not limit or alter the user's information access paradigm. This is done by visually augmenting the existing hyperlinks on webpages. The design of the API ensures user's privacy by not disclosing personal browsing information to the websites. Rather, this approach recommends how each website may adapt their information and navigation structures to meet user's information needs. Finally, the approach ensures user control and scrutiny. The user can enable/disable CSP at any time and view any information collected. The evaluation of the approach was conducted with a real-world use case. This paper introduces the architecture, a prototype implementation and encouraging evaluation results.
A linked data approach to care coordination BIBAFull-Text 77-87
  Spyros Kotoulas; Vanessa Lopez; Marco Luca Sbodio; Martin Stephenson; Pierpaolo Tommasi; Pol Mac Aonghusa
The success of a society is often judged by its ability to support the most vulnerable. Supporting the most vulnerable individuals is extremely challenging from an information needs perspective, since it requires data from numerous domains and systems, including Social Care, Healthcare, Public Safety and Juridical systems. Information sharing on this scale gives rise to scientific and technical challenges with regard to data representation, access, integration and retrieval granularity. This is a practice-oriented paper presenting a Linked Data-based approach that is uniquely positioned to access and surface information across domains and data sources using a combination of vulnerability indexes and contextual exploration. We apply this approach on a set of enterprise systems from IBM to develop an information sharing architecture and prototype for Care Coordination with a focus on Social Care and Healthcare. We report on expert feedback and user studies that indicate that our approach indeed reduces the time required to gain some business insight while maintaining the flexibility of a Linked Data-based integration approach.
Reader preferences and behavior on Wikipedia BIBAFull-Text 88-97
  Janette Lehmann; Claudia Müller-Birn; David Laniado; Mounia Lalmas; Andreas Kaltenbrunner
Wikipedia is a collaboratively-edited online encyclopaedia that relies on thousands of editors to both contribute articles and maintain their quality. Over the last years, research has extensively investigated this group of users while another group of Wikipedia users, the readers, their preferences and their behavior have not been much studied. This paper makes this group and its activities visible and valuable to Wikipedia's editor community. We carried out a study on two datasets covering a 13-months period to obtain insights on users preferences and reading behavior in Wikipedia. We show that the most read articles do not necessarily correspond to those frequently edited, suggesting some degree of non-alignment between user reading preferences and author editing preferences. We also identified that popular and often edited articles are read according to four main patterns, and that how an article is read may change over time. We illustrate how this information can provide valuable insights to Wikipedia's editor community.
A study of age gaps between online friends BIBAFull-Text 98-106
  Lizi Liao; Jing Jiang; Ee-Peng Lim; Heyan Huang
User attribute extraction on social media has gain considerable attention, while existing methods are mostly supervised which suffer great difficulty in insufficient gold standard data. In this paper, we validate a strong hypothesis based on homophily and adapt it to ensure the certainty of user attribute we extracted via weakly supervised propagation. Homophily, the theory which states that people who are similar tend to become friends, has been well studied in the setting of online social networks. When we focus on age attribute, based on this theory, online friends tend to have similar age. In this work, we take a step further and study the hypothesis that the age gap between online friends become even smaller in a larger friendship clique. We empirically validate our hypothesis using two real social network data sets. We further design a propagation-based algorithm to predict online users' age, leveraging the clique-based hypothesis. We find that our algorithm can outperform several baselines. We believe that this method could work as a way to enrich sparse data and the hypothesis we validated would shed light on exploring the proximity of other user attributes such as education as well.
Understanding and controlling the filter bubble through interactive visualization: a user study BIBAFull-Text 107-115
  Sayooran Nagulendra; Julita Vassileva
The "filter bubble" is a term which refers to people getting encapsulated in streams of data such as news or social network updates that are personalized to their interests. While people need protection from information overload and maybe prefer to see content they feel familiar or agree with, there is the danger that important issues that should be of concern for everyone will get filtered away and people will lack exposure to different views, living in "echo-chambers", blissfully unaware of the reality. We have proposed a design of an interactive visualization, which provides the user of a social networking site with awareness of the personalization mechanism (the semantics and the source of the content that is filtered away), and with means to control the filtering mechanism. The visualization has been implemented in a peer-to-peer social network, called MADMICA, and we present here the results of a large scale lab study with 163 crowd-sourced participants. The results demonstrate that the visualization leads to increased users' awareness of the filter bubble, understandability of the filtering mechanism and to a feeling of control over their data stream.
The shortest path to happiness: recommending beautiful, quiet, and happy routes in the city BIBAFull-Text 116-125
  Daniele Quercia; Rossano Schifanella; Luca Maria Aiello
When providing directions to a place, web and mobile mapping services are all able to suggest the shortest route. The goal of this work is to automatically suggest routes that are not only short but also emotionally pleasant. To quantify the extent to which urban locations are pleasant, we use data from a crowd-sourcing platform that shows two street scenes in London (out of hundreds), and a user votes on which one looks more beautiful, quiet, and happy. We consider votes from more than 3.3K individuals and translate them into quantitative measures of location perceptions. We arrange those locations into a graph upon which we learn pleasant routes. Based on a quantitative validation, we find that, compared to the shortest routes, the recommended ones add just a few extra walking minutes and are indeed perceived to be more beautiful, quiet, and happy. To test the generality of our approach, we consider Flickr metadata of more than 3.7M pictures in London and 1.3M in Boston, compute proxies for the crowdsourced beauty dimension (the one for which we have collected the most votes), and evaluate those proxies with 30 participants in London and 54 in Boston. These participants have not only rated our recommendations but have also carefully motivated their choices, providing insights for future work.
Comparing the pulses of categorical hot events in Twitter and Weibo BIBAFull-Text 126-135
  Xin Shuai; Xiaozhong Liu; Tian Xia; Yuqing Wu; Chun Guo
The fragility and interconnectivity of the planet argue compellingly for a greater understanding of how different communities make sense of their world. One of such critical demands relies on comparing the Chinese and the rest of the world (e.g., Americans), where communities' ideological and cultural backgrounds can be significantly different. While traditional studies aim to learn the similarities and differences between these communities via high-cost user studies, in this paper we propose a much more efficient method to compare different communities by utilizing social media. Specifically, Weibo and Twitter, the two largest microblogging systems, are employed to represent the target communities, i.e. China and the Western world (mainly United States), respectively. Meanwhile, through the analysis of the Wikipedia page-click log, we identify a set of categorical 'hot events' for one month in 2012 and search those hot events in Weibo and Twitter corpora along with timestamps via information retrieval methods. We further quantitatively and qualitatively compare users' responses to those events in Twitter and Weibo in terms of three aspects: popularity, temporal dynamic, and information diffusion. The comparative results show that although the popularity ranking of those events are very similar, the patterns of temporal dynamics and information diffusion can be quite different.
Analyzing images' privacy for the modern web BIBAFull-Text 136-147
  Anna C. Squicciarini; Cornelia Caragea; Rahul Balakavi
Images are now one of the most common form of content shared in online user-contributed sites and social Web 2.0 applications. In this paper, we present an extensive study exploring privacy and sharing needs of users' uploaded images. We develop learning models to estimate adequate privacy settings for newly uploaded images, based on carefully selected image-specific features. We focus on a set of visual-content features and on tags. We identify the smallest set of features, that by themselves or combined together with others, can perform well in properly predicting the degree of sensitivity of users' images. We consider both the case of binary privacy settings (i.e. public, private), as well as the case of more complex privacy options, characterized by multiple sharing options. Our results show that with few carefully selected features, one may achieve extremely high accuracy, especially when high-quality tags are available.
Is distrust the negation of trust?: the value of distrust in social media BIBAFull-Text 148-157
  Jiliang Tang; Xia Hu; Huan Liu
Trust plays an important role in helping online users collect reliable information, and has attracted increasing attention in recent years. We learn from social sciences that, as the conceptual counterpart of trust, distrust could be as important as trust. However, little work exists in studying distrust in social media. What is the relationship between trust and distrust? Can we directly apply methodologies from social sciences to study distrust in social media? In this paper, we design two computational tasks by leveraging data mining and machine learning techniques to enable the computational understanding of distrust with social media data. The first task is to predict distrust from only trust, and the second task is to predict trust with distrust. We conduct experiments in real-world social media data. The empirical results of the first task provide concrete evidence to answer the question, "is distrust the negation of trust?" while the results of the second task help us figure out how valuable the use of distrust in trust prediction.
Automatic discovery of global and local equivalence relationships in labeled geo-spatial data BIBAFull-Text 158-168
  Bart Thomee; Gianmarco De Francisci Morales
We propose a novel algorithmic framework to automatically detect which labels refer to the same concept in labeled spatial data. People often use different words and synonyms when referring to the same concept or location. Furthermore these words and their usage vary across culture, language, and place. Our method analyzes the patterns in the spatial distribution of labels to discover equivalence relationships. We evaluate our proposed technique on a large collection of geo-referenced Flickr photos using a semi-automatically constructed ground truth from an existing ontology. Our approach is able to classify equivalent tags with a high accuracy (AUC of 0.85), as well as providing the geographic extent where the relationship holds.
Evaluating the helpfulness of linked entities to readers BIBAFull-Text 169-178
  Ikuya Yamada; Tomotaka Ito; Shinnosuke Usami; Shinsuke Takagi; Hideaki Takeda; Yoshiyasu Takefuji
When we encounter an interesting entity (e.g., a person's name or a geographic location) while reading text, we typically search and retrieve relevant information about it. Entity linking (EL) is the task of linking entities in a text to the corresponding entries in a knowledge base, such as Wikipedia. Recently, EL has received considerable attention. EL can be used to enhance a user's text reading experience by streamlining the process of retrieving information on entities. Several EL methods have been proposed, though they tend to extract all of the entities in a document including unnecessary ones for users. Excessive linking of entities can be distracting and degrade the user experience. In this paper, we propose a new method for evaluating the helpfulness of linking entities to users. We address this task using supervised machine-learning with a broad set of features. Experimental results show that our method significantly outperforms baseline methods by approximately 5.7%-12% F1. In addition, we propose an application, Linkify, which enables developers to integrate EL easily into their web sites.
Asking the right question in collaborative q&a systems BIBAFull-Text 179-189
  Jie Yang; Claudia Hauff; Alessandro Bozzon; Geert-Jan Houben
Collaborative Question Answering (cQA) platforms are a very popular repository of crowd-generated knowledge. By formulating questions, users express needs that other members of the cQA community try to collaboratively satisfy. Poorly formulated questions are less likely to receive useful responses, thus hindering the overall knowledge generation process. Users are often asked to reformulate their needs, adding specific details, providing examples, or simply clarifying the context of their requests. Formulating a good question is a task that might require several interactions between the asker and other community members, thus delaying the actual answering and, possibly, decreasing the interest of the community in the issue. This paper contributes new insights to the study of cQA platforms by investigating the editing behaviour of users. We identify a number of editing actions, and provide a two-step approach for the automatic suggestion of the most likely editing actions to be performed for a newly created question. We evaluated our approach in the context of the Stack Overflow cQA , demonstrating how, for given types of editing actions, it is possible to provide accurate reformulation suggestions.
Empirical analysis of implicit brand networks on social media BIBAFull-Text 190-199
  Kunpeng Zhang; Siddhartha Bhattacharyya; Sudha Ram
This paper investigates characteristics of implicit brand networks extracted from a large dataset of user historical activities on a social media platform. To our knowledge, this is one of the first studies to comprehensively examine brands by incorporating user-generated social content and information about user interactions. This paper makes several important contributions. We build and normalize a weighted, undirected network representing interactions among users and brands. We then explore the structure of this network using modified network measures to understand its characteristics and implications. As a part of this exploration, we address three important research questions: (1) What is the structure of a brand-brand network? (2) Does an influential brand have a large number of fans? (3) Does an influential brand receive more positive or more negative comments from social users? Experiments conducted with Facebook data show that the influence of a brand has (a) high positive correlation with the size of a brand, meaning that an influential brand can attract more fans, and, (b) low negative correlation with the sentiment of comments made by users on that brand, which means that negative comments have a more powerful ability to generate awareness of a brand than positive comments. To process the large-scale datasets and networks, we implement MapReduce-based algorithms.

Short papers

Am i more similar to my followers or followees?: analyzing homophily effect in directed social networks BIBAFull-Text 200-205
  Mohammad Ali Abbasi; Reza Zafarani; Jiliang Tang; Huan Liu
Homophily is the formation of social ties between two individuals due to similar characteristics or interests. Based on homophily, in a social network it is expected to observe a higher degree of homogeneity among connected than disconnected people. Many researchers use this simple yet effective principal to infer users' missing information and interests based on the information provided by their neighbors. In a directed social network, the neighbors can be further divided into followers and followees. In this work, we investigate the homophily effect in a directed network. To explore the homophily effect in a directed network, we study if a user's personal preferences can be inferred from those of users connected to her (followers or followees). We investigate which of followers or followees are more effective in helping to infer users' personal preferences. Our findings can help to raise the awareness of users over their privacy and can help them better manage their privacy.
Understanding mass cooperation through visualization BIBAFull-Text 206-211
  Remy Cazabet; Hideaki Takeda
We present a new type of visualization designed to help the understanding of inner mechanisms of mass cooperation. This type of cooperation is ubiquitous nowadays, not only in Online Social Networks, but also in many other situations, such as scientific research on a worldwide scale. Mass cooperation is also at the source of most complex systems. One problem to which researchers are confronted to when they study such cooperation is to build an intuitive representation of what is happening. Many tools and metrics exist to study the results of the cooperation, but sometimes, these metrics can be misleading if one doesn't really observe what the cooperation process really looks like. The main proposition of this paper is a visualization of the cooperation flow. The novelty of our approach is to represent the internal structure of the cooperation in a longitudinal perspective. Through examples, we present how one can form a rich understanding of what form the cooperation takes in a given context, and how this understanding can help to formulate hypothesis which can consequently be studied with appropriate tools such as statistical analysis.
How you post is who you are: characterizing google+ status updates across social groups BIBAFull-Text 212-217
  Evandro Cunha; Gabriel Magno; Marcos André Gonçalves; César Cambraia; Virgilio Almeida
The analysis of user-generated content on the Web provides tools to better understand users' behavior and to the development of improved Web services. Here, we consider a large dataset of Google+ status updates to evaluate linguistic features among members of distinct social groups. Our study reveals that groups hold linguistic particularities -- such as a tendency to use professional vocabulary, suggesting that Google+ might be employed, by certain users, for professional activities, or that members do not dissociate from their jobs when interacting in this environment. To illustrate a possible application of our outcomes, we present a classification experiment aiming to infer users' social information through the analysis of their posts, with satisfactory preliminary results. Our findings help to understand not only collective peculiarities of online social media users, but also important characteristics of the textual genre 'post', being one of the first and most comprehensive studies on this topic.
A taxonomy of microtasks on the web BIBAFull-Text 218-223
  Ujwal Gadiraju; Ricardo Kawase; Stefan Dietze
Nowadays, a substantial number of people are turning to crowdsourcing, in order to solve tasks that require human intervention. Despite a considerable amount of research done in the field of crowdsourcing, existing works fall short when it comes to classifying typically crowdsourced tasks. Understanding the dynamics of the tasks that are crowdsourced and the behaviour of workers, plays a vital role in efficient task-design. In this paper, we propose a two-level categorization scheme for tasks, based on an extensive study of 1000 workers on CrowdFlower. In addition, we present insights into certain aspects of crowd behaviour; the task affinity of workers, effort exerted by workers to complete tasks of various types, and their satisfaction with the monetary incentives.
The AMAS authoring tool 2.0: a UX evaluation BIBAFull-Text 224-230
  Conor Gaffney; Owen Conlan; Vincent Wade
Adaptive hypermedia has been well documented as being very beneficial in the domain of online education. Authoring adaptive educational hypermedia is however a complex and difficult task. There have been a number of tools developed to address the issue of authoring so as to ease the cognitive load involved in composition. This paper examines two key areas related authoring tool design: hypertext representation and User Experience (UX) design. Both of these are important factors that should be considered when designing hypertext authoring tools. The paper also presents the AMAS Authoring Tool. A new and unique authoring tool that allows non-technical Subject Matter Experts to compose adaptive activity based courses without the needing to write any code or technical languages.
Balancing diversity to counter-measure geographical centralization in microblogging platforms BIBAFull-Text 231-236
  Eduardo Graells-Garrido; Mounia Lalmas
We study whether geographical centralization is reflected in the virtual population of microblogging platforms. A consequence of centralization is the decreased visibility and findability of content from less central locations. We propose to counteract geographical centralization in microblogging timelines by promoting geographical diversity through: 1) a characterization of imbalance in location interaction centralization over a graph of geographical interactions from user generated content; 2) geolocation of microposts using imbalance-aware content features in text classifiers, and evaluation of those classifiers according to their diversity and accuracy; 3) definition of a two-step information filtering algorithm to ensure diversity in summary timelines of events. We study our proposal through an analysis of a dataset of Twitter in Chile, in the context of the 2012 municipal political elections.
Inferring nationalities of Twitter users and studying inter-national linking BIBAFull-Text 237-242
  Wenyi Huang; Ingmar Weber; Sarah Vieweg
Twitter user profiles contain rich information that allows researchers to infer particular attributes of users' identities. Knowing identity attributes such as gender, age, and/or nationality are a first step in many studies which seek to describe various phenomena related to computational social science. Often, it is through such attributes that studies of social media that focus on, for example, the isolation of foreigners, become possible. However, such characteristics are not often clearly stated by Twitter users, so researchers must turn to other means to ascertain various categories of identity. In this paper, we discuss the challenge of detecting the nationality of Twitter users using rich features from their profiles. In addition, we look at the effectiveness of different features as we go about this task. For the case of a highly diverse country -- Qatar -- we provide a detailed network analysis with insights into user behaviors and linking preference (or the lack thereof) to other nationalities.
Sociolinguistic analysis of Twitter in multilingual societies BIBAFull-Text 243-248
  Suin Kim; Ingmar Weber; Li Wei; Alice Oh
In a multilingual society, language not only reflects culture and heritage, but also has implications for social status and the degree of integration in society. Different languages can be a barrier between monolingual communities, and the dynamics of language choice could explain the prosperity or demise of local languages in an international setting. We study this interplay of language and network structure in diverse, multi-lingual societies, using Twitter. In our analysis, we are particularly interested in the role of bilinguals. Concretely, we attempt to quantify the degree to which users are the "bridge-builders" between monolingual language groups, while monolingual users cluster together. Also, with the revalidation of English as a lingua franca on Twitter, we reveal users of the native non-English language have higher influence than English users, and the language convergence pattern is consistent across the regions. Furthermore, we explore for which topics these users prefer their native language rather than English. To the best of our knowledge, this is the largest sociolinguistic study in a network setting.
Co-following on Twitter BIBAFull-Text 249-254
  Venkata Rama Kiran Garimella; Ingmar Weber
We present an in-depth study of co-following on Twitter based on the observation that two Twitter users whose followers have similar friends are also similar, even though they might not share any direct links or a single mutual follower. We show how this observation contributes to (i) a better understanding of language-agnostic user classification on Twitter, (ii) eliciting opportunities for Computational Social Science, and (iii) improving online marketing by identifying cross-selling opportunities. We start with a machine learning problem of predicting a user's preference among two alternative choices of Twitter friends. We show that co-following information provides strong signals for diverse classification tasks and that these signals persist even when the most discriminative features are removed. Going beyond mere classification performance optimization, we present applications of our methodology to Computational Social Science. Here we confirm stereotypes such as that the country singer Kenny Chesney (@kennychesney) is more popular among @GOP followers, whereas Lady Gaga (@ladygaga) enjoys more support from @TheDemocrats followers.
   In the domain of marketing we give evidence that celebrity endorsement is reflected in co-following and we demonstrate how our methodology can be used to reveal the audience similarities between not so obvious entities such as Apple and Puma.
A behavior analytics approach to identifying tweets from crisis regions BIBAFull-Text 255-260
  Shamanth Kumar; Xia Hu; Huan Liu
The growing popularity of Twitter as an information medium has allowed unprecedented access to first-hand information during crises and mass emergency situations. Due to the sheer volume of information generated during a disaster, a key challenge is to filter tweets from the crisis region so their analysis can be prioritized. In this paper, we introduce the task of identifying whether a tweet is generated from crisis regions and formulate it as a decision problem. This problem is challenging due to the fact that only 1% of all tweets have location information. Existing approaches tackle this problem by predicting the location of the user using historical tweets from users or their social network. As collecting historical information is not practical during emergency situations, we investigate whether it is possible to determine that a tweet originates from the crisis region through the information in the tweet and the publishing user's profile.
Finding mr and mrs entity in the city of knowledge BIBFull-Text 261-266
  Vanessa Lopz Garcia; Martin Stephenson; Spyros Kotoulas; Pierpaolo Tommasi
Self-adaptive filtering using pid feedback controller in electronic commerce BIBAFull-Text 267-272
  Zeinab Noorian; Mohsen Mohkami; Julita Vassileva
The performance of e-marketplaces plays a crucial role in attracting and retaining buyers. For example, a variation in the delivered Quality of Services (QoS) can frustrate buyers and they leave the e-marketplace, causing revenue loss. The inherent uncertainties of open marketplaces motivate the design of reputation systems to facilitate buyers in finding honest feedback from other buyers (advisers). Defining the threshold for acceptable level of honesty of advisers is very important, since inappropriately set thresholds would filter away possibly good advice, or the opposite -- allow malicious buyers to badmouth good services. However, currently, there is no systematic approach for setting the honesty threshold. We propose a self-adaptive honesty threshold management mechanism based on PID feedback controller. Experimental results show that adaptively tuning the honesty threshold to the market performance enables honest buyers to obtain higher quality of services in comparison with static threshold values defined by intuition and used in previous work.
On the choice of data sources to improve content discoverability via textual feature optimization BIBAFull-Text 273-278
  Elizeu Santos-Neto; Tatiana Pontes; Jussara Almeida; Matei Ripeanu
A large portion of the audience of video content items on the web currently comes from keyword-based search and/or tag-based navigation. Thus, the textual features of this content (e.g., the title, description, and tags) can directly impact the view count of a particular content item, and ultimately the advertisement generated revenue. More importantly, the textual features can generally be optimized to attract more search traffic. This study makes progress on the problem of automating tag selection for online video content with the goal of increasing viewership. It brings two key insights: first, based on evidence that existing tags for YouTube videos can be improved by an automated tag recommender, even for a sample of well curated movies, it explores the impact of using information mined from repositories created by different production modes (e.g., peer- and expert-produced); second, this study performs a preliminary characterization of the factors that impact the quality of the tag recommendation pipeline for different input data sources.
On the predictability of talk attendance at academic conferences BIBAFull-Text 279-284
  Christoph Scholz; Jens Illig; Martin Atzmueller; Gerd Stumme
This paper focuses on the prediction of real-world talk attendances at academic conferences with respect to different influence factors. We study and discuss the predictability of talk attendances using real-world face-to-face contact data and user interests extracted from the users' previous publications. For our experiments, we apply RFID-tracked talk attendance information captured at the ACM Conference on Hypertext and Hypermedia 2011. We find that contact and similarity networks achieve comparable results, and that combining these networks helps to a limited extent to improve the prediction quality.
Twitter in academic conferences: usage, networking and participation over time BIBAFull-Text 285-290
  Xidao Wen; Yu-Ru Lin; Christoph Trattner; Denis Parra
Twitter is often referred to as a backchannel for conferences. While the main conference takes place in a physical setting, attendees and virtual attendees socialize, introduce new ideas or broadcast information by microblogging on Twitter. In this paper we analyze the scholars' Twitter use in 16 Computer Science conferences over a timespan of five years. Our primary finding is that over the years there are increasing differences with respect to conversation use and information use in Twitter. We studied the interaction network between users to understand whether assumptions about the structure of the conversations hold over time and between different types of interactions, such as retweets, replies, and mentions. While 'people come and people go,' we want to understand what keeps people staying engaged with the conference on Twitter. By casting the problem as a classification task, we find different factors that contribute to the continuing participation of users to the online Twitter conference activity. These results have implications for research communities to implement strategies for continuous and active participation among members.

Posters and demos

A rating aggregation method for generating product reputations BIBAFull-Text 291-293
  Ahmad Abdel-Hafez; Yue Xu; Audun Jøsang
Many websites offer the opportunity for customers to rate items and then use customers' ratings to generate items reputation, which can be used later by other users for decision making purposes. The aggregated value of the ratings per item represents the reputation of this item. The accuracy of the reputation scores is important as it is used to rank items.
   Most of the aggregation methods didn't consider the frequency of distinct ratings and they didn't test how accurate their reputation scores over different datasets with different sparsity. In this work we propose a new aggregation method which can be described as a weighted average, where weights are generated using the normal distribution. The evaluation result shows that the proposed method outperforms state-of-the-art methods over different sparsity datasets.
A focused crawler for mining hate and extremism promoting videos on YouTube BIBAFull-Text 294-296
  Swati Agarwal; Ashish Sureka
Online video sharing platforms such as YouTube contains several videos and users promoting hate and extremism. Due to low barrier to publication and anonymity, YouTube is misused as a platform by some users and communities to post negative videos disseminating hatred against a particular religion, country or person. We formulate the problem of identification of such malicious videos as a search problem and present a focused-crawler based approach consisting of various components performing several tasks: search strategy or algorithm, node similarity computation metric, learning from exemplary profiles serving as training data, stopping criterion, node classifier and queue manager. We implement a best-first search algorithm and conduct experiments to measure the accuracy of the proposed approach. Experimental results demonstrate that the proposed approach is effective.
Spatio-temporal quality issues for local search BIBAFull-Text 297-299
  Dirk Ahlers
Geographic search is routinely used in many services and applications that exploit the availability of Web content which is related to a real world place, region or object. However, do you trust the location information? Who has not made the experience that the restaurant you went to has just moved to another part of the city or shut down? Local search returns located results, e.g., extracted entities located in a certain spot or area, but their quality can be difficult to judge. Compared to normal Web search, local Web search has additional inherent issues due to factors such as insufficient semantics, ambiguity of references, imprecise mapping, or unknown status of the real-world entities described in documents. We present selected issues and features of geospatial quality and credibility based on spatial, temporal, and topical indicators as an additional measurement of spatial relevance.
A two-tier index architecture for fast processing large RDF data over distributed memory BIBAFull-Text 300-302
  Long Cheng; Spyros Kotoulas; Tomas E. Ward; Georgios Theodoropoulos
We propose an efficient method for fast processing large RDF data over distributed memory. Our approach adopts a two-tier index architecture on each computation node: (1) a light-weight primary index, to keep loading times low, and (2) a dynamic, multi-level secondary index, calculated as a by-product of query execution, to decrease or remove inter-machine data movement for subsequent queries that contain the same graph patterns. Experimental results on a commodity cluster show that we can load large RDF data very quickly in memory while remaining within an interactive range for query processing with the secondary index.
Spatial hypertext modeling for dynamic contents authoring system based on transclusion BIBAFull-Text 303-304
  Ja-Ryoung Choi; Sungeun An; Soon-Bum Lim
This paper proposed a web content collecting model to reuse a variety of web contents based on Transclusion. Transclusion is a model for collecting existing web contents and including them into a new document. However, Transclusion lacks consideration of copyright issues and dynamic changes. Therefore, we classified Transclusions into three different types based on copyright restrictions: Trans-quotation, Trans-reference and Trans-annotation. Then we represented Transclusions in each different type of spatial hypertext model. Also, we designed RVS (ReVerse Syndication) model in order to trace the dynamic changes.
TagRec: towards a standardized tag recommender benchmarking framework BIBAFull-Text 305-307
  Dominik Kowald; Emanuel Lacic; Christoph Trattner
In this paper, we introduce TagRec, a standardized tag recommender benchmarking framework implemented in Java. The purpose of TagRec is to provide researchers with a framework that supports all steps of the development process of a new tag recommendation algorithm in a reproducible way, including methods for data pre-processing, data modeling, data analysis and recommender evaluation against state-of-the-art baseline approaches. We show the performance of the algorithms implemented in TagRec in terms of prediction quality and runtime using an evaluation of a real-world folksonomy dataset. Furthermore, TagRec contains two novel tag recommendation approaches based on models derived from human cognition and human memory theories.
SocRecM: a scalable social recommender engine for online marketplaces BIBAFull-Text 308-310
  Emanuel Lacic; Dominik Kowald; Christoph Trattner
This paper presents work-in-progress on SocRecM, a novel social recommendation framework for online marketplaces. We demonstrate that SocRecM is easy to integrate with existing Web technologies through a RESTful, scalable and easy-to-extend service-based architecture. Moreover, we reveal the extent to which various social features and recommendation approaches are useful in an online social marketplace environment.
Cross-hierarchical communication in Twitter conflicts BIBAFull-Text 311-312
  Zhe Liu; Ingmar Weber
Social hierarchy plays an important role in shaping the way individuals interacting with each other. In this study, we propose three metrics: equality, diversity, and reciprocity to evaluate the social hierarchical differences in cross-ideological communication on Twitter. We do this within the context of three diverse conflicts: Israel-Palestine, US Democrats-Republicans, and FC Barcelona-Real Madrid. In all cases, we collect data around a central pair of Twitter accounts representing the two main parties. Our results show in a quantitative manner that social hierarchy can be considered a factor that impacts individual's communication in Twitter conflicts. As one of the first literatures in this area, we demonstrate social hierarchy's effect in online environments.
A DSL based on CSS for hypertext adaptation BIBAFull-Text 313-315
  Alejandro Montes García; Paul De Bra; George H. L. Fletcher; Mykola Pechenizkiy
Personalization offered by Adaptive Hypermedia and Recommender Systems is effective for tackling the information overload problem. However, the development of Adaptive Web-Based Systems is cumbersome. In order to ease the development of such systems, we propose a language based on CSS to express personalization in web systems that captures current adaptation techniques.
Fake tweet buster: a webtool to identify users promoting fake news ontwitter BIBAFull-Text 316-317
  Diego Saez-Trumper
We present the "Fake Tweet Buster" (FTB), a web application that identifies tweets with fake images and users that are consistently uploading and/or promoting fake information on Twitter. To do that we mix three techniques: (i) reverse image searching, (ii) user analysis and (iii) a crowd sourcing approach to detected that kind of malicious users on Twitter. Using that information we provide a credibility classification for the tweet and the user.
Inferring social ties from common activities in Twitter BIBAFull-Text 318-320
  Umang Sharma; Abhishek Suman; Saswata Shannigrahi
We investigate the extent to which we can infer social ties between a pair of users in an online social network Twitter, based on their common activities defined by the number of common celebrity profiles they are following. In this work, we analyze the list of celebrities that a set of Twitter users are following in December 2013 to infer the social ties that existed between these users till July 2009. We use two probabilistic models given by Kossinets et al. [Science, 2006] and Crandall et al. [PNAS, 2010] for this purpose. The model of Kossinets et al. is meant to give an upper bound for the probability of friendship between a pair of users, whereas the model by Crandall et al. is supposed to give an almost accurate estimate of the same. We observe that the model of Kossinets et al. is able to give an upper bound whereas the model given by Crandall et al. is unable to give an almost accurate estimate for our dataset. However, the model by Crandall et al. is observed to provide a correct estimate of the probability of friendship between the users when we consider following a particular type of celebrity profile, e.g. CEO, Author etc., as the activity of a user.
FoP: never-ending face recognition and data lifting BIBAFull-Text 321-323
  Julien Subercaze; Christophe Gravier
In this demonstration, we present Faces of Politics (FoP), a face detection system from pictures illustrating news articles. The first iteration of the face recognition model propelling FoP was trained using Freebase data about politicians and their pictures. FoP is a never-ending system: when a new face is recognized, the learned model is updated accordingly. At this step, FoP is also giving data in return to the LoD cloud that fed him in the first place: it leverages visual knowledge as Linked Data.
Why you follow: a classification scheme for Twitter follow links BIBAFull-Text 324-326
  Atsushi Tanaka; Hikaru Takemura; Keishi Tajima
Twitter is used for various purposes, such as, information publishing/gathering, open discussions, and personal communications. As a result, there are various types of follow links. In this paper, we propose a scheme for classifying follow links according to the followers' intention. The scheme consists of three axes: user-orientation, content-orientation, and mutuality. The combination of these three axes can classify most major types of follow links. Our experimental results suggest that the type of a follow link does not solely depend on the type of the followee nor solely on the type of the follower. The results also suggest that the proposed three axes are highly independent of one another.
Buon appetito: recommending personalized menus BIBAFull-Text 327-329
  Michele Trevisiol; Luca Chiarandini; Ricardo Baeza-Yates
This paper deals with the problem of menu recommendation, namely recommending menus that a person is likely to consume at a particular restaurant. We mine restaurant reviews to extract food words, we use sentiment analysis applied to each sentence in order to compute the individual food preferences. Then we extract frequent combination of dishes using a variation of the Apriori algorithm. Finally, we propose several recommender systems to provide suggestions of food items or entire menus, i.e. sets of dishes.
AIRCacher: virtual geocaching powered with augmented reality BIBAFull-Text 330-332
  Gianluca Tursi; Martina Deplano; Giancarlo Ruffo
Nowadays, smartphones and digital networks are being heavily used as data sources for research on social networks. Our daily experiences, interactions and transactions are recorded thanks to the digital traces that users leave behind their activities, both individual and social. In this work, we describe AiRCacher, a mobile app for virtual geocaching enhanced with Augmented Reality. By following gamification and Game With A Purpose design approaches, the aim is to bring people outside and make them move, by hiding and seeking virtual caches. As a side effect of their gaming activity, they became like social sensors able to provide geo-located social data. Therefore, the aim of our work is to carry out data analyses about users' outdoor behaviors, by looking for several findings such as trending places for different cache's typologies, and the detection of interesting events emerging from the concentration of caches in specific places..