HCI Bibliography Home | HCI Conferences | SocInfo Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
SocInfo Tables of Contents: 1011121313w14

Proceedings of the 2013 International Conference on Social Informatics

Fullname:SocInfo 2013: 5th International Conference on Social Informatics
Editors:Adam Jatowt; Ee-Peng Lim; Ying Ding; Asako Miura; Taro Tezuka; Gaël Dias; Katsumi Tanaka; Andrew Flanagin; Bing Tian Dai
Location:Kyoto, Japan
Dates:2013-Nov-25 to 2013-Nov-27
Publisher:Springer International Publishing
Series:Lecture Notes in Computer Science 8238
Standard No:DOI: 10.1007/978-3-319-03260-3 hcibib: SocInfo13; ISBN: 978-3-319-03259-7 (print), 978-3-319-03260-3 (online)
Links:Online Proceedings | Conference Website (defunct)

Proceedings of the 2013 International Conference Workshops on Social Informatics

Fullname:SocInfo 2013: 5th International Workshops on Social Informatics: QMC and HISTOINFORMATICS, Revised Selected Papers
Editors:Akiyo Nadamoto; Adam Jatowt; Adam Wierzbicki; Jochen L. Leidner
Location:Kyoto, Japan
Publisher:Springer Berlin Heidelberg
Series:Lecture Notes in Computer Science 8359
Standard No:DOI: 10.1007/978-3-642-55285-4 hcibib: SocInfo13w; ISBN: 978-3-642-55284-7 (print), 978-3-642-55285-4 (online)
Links:Online Proceedings | Conference Website (defunct)
Modeling Analogies for Human-Centered Information Systems BIBAFull-Text 1-15
  Christoph Lofi; Christian Nieke
This paper introduces a conceptual model for representing queries, statements, and knowledge in an analogy-enabled information system. Analogies are considered to be one of the core concepts of human cognition and communication, and are very efficient at conveying complex information in a natural fashion. Integrating analogies into modern information systems paves the way for future truly human-centered paradigms for interacting with data and information, and opens up a number of interesting scientific challenges, especially due to the ambiguous and often consensual nature of analogy statements. Our proposed conceptual analogy model therefore provides a unified model for representing analogies of varying complexity and type, while an additional layer of interpretation models adapts and adjusts the operational semantics for different data sources and approaches, avoiding the shortcomings of any single approach. Here, especially the Social Web promises to be a premier source of analogical knowledge due to its rich variety and subjective content, and therefore we outline first steps for harnessing this valuable information for future human-centered information systems.
Resilience of Social Networks under Different Attack Strategies BIBAKFull-Text 16-29
  Mohammad Ayub Latif; Muhammad Naveed; Faraz Zaidi
Recent years have seen the world become a closely connected society with the emergence of different types of social networks. Online social networks have provided a way to bridge long distances and establish numerous communication channels which were not possible earlier. These networks exhibit interesting behavior under intentional attacks and random failures where different structural properties influence the resilience in different ways.
   In this paper, we perform two sets of experiments and draw conclusions from the results pertaining to the resilience of social networks. The first experiment performs a comparative analysis of four different classes of networks namely small world networks, scale free networks, small world-scale free networks and random networks with four semantically different social networks under different attack strategies. The second experiment compares the resilience of these semantically different social networks under different attack strategies. Empirical analysis reveals interesting behavior of different classes of networks with different attack strategies.
Keywords: Resilience of Networks; Targetted Attacks; Social Networks
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms BIBAFull-Text 30-39
  Matthew Rowe
In this paper we define the development of a user from entry to churn as his lifecycle that can be divided into discrete stages of development known as lifecycle periods. Prior work has examined how social networking site users have developed along isolated dimensions using lexical information [2] and social connections in the context of telecommunications networks [6]. We unify such dimensions by modelling and examining how users develop both socially and lexically, through contrasts of user properties (e.g. time-delimited in-degree distributions) against: (i) prior user properties; and (ii) the community in which the user is interacting. We identify salient traits of user development characterisable in the form of growth features, and demonstrate the applicability of such features within a vector space model by outperforming several baselines when detecting the lifecycle period of a given user.
Metro: Exploring Participation in Public Events BIBAFull-Text 40-45
  Luca Chiarandini; Luca Maria Aiello; Neil O'Hare; Alejandro Jaimes
The structure of a social network is time-dependent, as relationships between entities change in time. In large networks, static or animated visualizations are often insufficient to capture all the information about the interactions between people over time, which could be captured better by interactive interfaces. We propose a novel system for exploring the interactions of entities over time, and support it with an application that displays interactions of public figures at events.
Follow My Friends This Friday! An Analysis of Human-Generated Friendship Recommendations BIBAFull-Text 46-59
  Ruth Garcia Gavilanes; Neil O'Hare; Luca Maria Aiello; Alejandro Jaimes
Online social networks support users in a wide range of activities, such as sharing information and making recommendations. In Twitter, the hashtag #ff, or #followfriday, arose as a popular convention for users to create contact recommendations for others. Hitherto, there has not been any quantitative study of the effect of such human-generated recommendations. This paper is the first study of a large-scale corpus of human friendship recommendations based on such hashtags, using a large corpus of recommendations gathered over a 24 week period and involving a set of nearly 6 million users. We show that these explicit recommendations have a measurable effect on the process of link creation, increasing the chance of link creation between two and three times on average, compared with a recommendation-free scenario. Also, ties created after such recommendations have up to 6% more longevity than other Twitter ties. Finally, we build a supervised system to rank user-generated recommendations, surfacing the most valuable ones with high precision (0.52 MAP), and we find that features describing users and the relationships between them, are discriminative for this task.
A Divide-and-Conquer Approach for Crowdsourced Data Enumeration BIBAFull-Text 60-74
  Hideto Aoki; Atsuyuki Morishima
Crowdsourced data enumeration, in which the Web crowd is requested to enumerate data items within a specified range, is important in many Web applications such as hotel reviews. This paper presents a processing method for crowdsourced data enumeration on microtask-based crowdsourcing platforms. A general approach to achieving a high recall in data enumeration is to apply the divide-and-conquer principle. However, how to apply the principle to data enumeration on microtask-based crowdsourcing platforms is not trivial. The proposed method is unique in that the workers join the process of generating smaller tasks in a divide-and-conquer fashion, and the programmer does not need to provide many microtasks in advance. This paper explains the method, provides theoretical results to show the method works well with microtask-based platforms, and explains our experimental results that suggest the proposed method can achieve higher recalls and produces appropriate tasks for microtask-based crowdsourcing.
Social Listening for Customer Acquisition BIBAFull-Text 75-80
  Juan Du; Biying Tan; Feida Zhu; Ee-Peng Lim
Social network analysis has received much attention from corporations recently. Corporations are trying to utilize social media platforms such as Twitter, Facebook and Sina Weibo to expand their own markets. Our system is an online tool to assist these corporations to 1) find potential customers, and 2) track a list of users by specific events from social networks. We employ both textual and network information, and thus produce a keyword-based relevance score for each user in pre-defined dimensions, which indicates the probability of the adoption of a product. Based on the score and its trend, out tool is able to pick up the potential customers for different kinds of products, such as suits which are time-insensitive and diapers which are time-sensitive. In order to detect the scenario of purchasing products as gifts, we filter the user network and only consider the off-line close friend network. In addition, we could track users in a more flexible way. Despite the pre-defined dimensions, our tool is also able to track users by customized events and catch those who mention the event at an early stage.
Passive Participation in Communities of Practice: Scope and Motivations BIBAKFull-Text 81-94
  Azi Lev-On; Odelia Adler
In spite of its prevalence in online social media platforms, there have been very few studies of passive participation. The current study uses interviews to understand the motivations for passive participation in online communities of the Israeli Ministry of Social Services. In addition to the motivations commonly found when not participating "more actively", such as concerns of criticism, lack of need, lack of motivation and technological concerns, users stated two additional and secondary reasons for not contributing content in the communities: concern that their posts would not be addressed (i.e. lack of reciprocation), and reasons relating to the graphical interface of the communities.
Keywords: communication; online communities; communities of practice; participation; passive participation; lurking
An Ontology-Based Approach to Sentiment Classification of Mixed Opinions in Online Restaurant Reviews BIBAKFull-Text 95-108
  Hea-Jin Kim; Min Song
Consumers review other consumer's opinion and experience of the quality of various products before making purchase. Automatic sentiment analysis of WOM in the form of user product reviews, blog posts and comments in online forum can support strategies in areas such as search engines, recommender systems, and market research and benefit to both consumers and sellers. The ontology-based approach designed in this work aims to investigate how to detect and classify mixed positive and negative opinions by interpreting with an ontology containing opinion information on terms. Our research question is whether disinterested subjectivity scores of sentiment ontology are pertinent to sentiment orientations not affected by reviewer's linguistic bias. The experimental results adopting opinion lexical resource achieve better and more stable performance in F-measure.
Keywords: opinion mining; sentiment analysis; ontology-based approach; SentiWordNet
A Novel Social Event Recommendation Method Based on Social and Collaborative Friendships BIBAKFull-Text 109-118
  Yu-Chun Sun; Chien Chin Chen
Many social network sites (SNSs) provide social event functions to facilitate user interactions. However, it is difficult for users to find interesting events among the huge number posted on such sites. In this paper, we investigate the problem and propose a social event recommendation method that exploits user's social and collaborative friendships to recommend events of interest. As events are one-and-only items, their ratings are not available until they are over. Hence, traditional recommendation methods are incapable of event recommendation because they need sufficient ratings to generate recommendations. Instead of using ratings, we analyze the behavior patterns of social network users to measure their social and collaborative friendships. The friendships are aggregated to identify the acquaintances of a user and events relevant to the preferences of the acquaintances and the user are recommended. The results of experiments show that the proposed method is effective and it outperforms many well-known recommendation methods.
Keywords: social network; recommendation systems; friendship analysis
Factors That Influence Social Networking Service Private Information Disclosure at Diverse Openness and Scopes BIBAKFull-Text 119-128
  Basilisa Mvungi; Mizuho Iwaihara
In this paper, we present findings about factors that influence private information disclosure activities in social networking service (SNS). Our study is based on two data sets: (1) Survey Data, responses to a questionnaire consisting of items such as users' privacy settings, motivations of using SNS (motivations), and risk awareness, and (2) Public Data, consisting of user profiles that were opened to the public. Openness score is calculated from the number of disclosed items. We study influential factors and their rankings at varying openness scores and disclosure scopes. Our findings reveal that, gender, profile photo, certain motivations, and risk awareness highly affect private information disclosure activities. However the ranking of influential factors is not uniform. Gender and profile photo have greater influence, however their influence becomes lower and loses significance as openness is getting higher, falling behind motivations and number of friends. We also observe consistent tendencies between both data.
Keywords: Information disclosure; motivations of using SNS; privacy; risk awareness; social networking service
An Approach to Building High-Quality Tag Hierarchies from Crowdsourced Taxonomic Tag Pairs BIBAKFull-Text 129-138
  Fahad Almoqhim; David E. Millard; Nigel Shadbolt
Building taxonomies for web content is costly. An alternative is to allow users to create folksonomies, collective social classifications. However, folksonomies lack structure and their use for searching and browsing is limited. Current approaches for acquiring latent hierarchical structures from folksonomies have had limited success. We explore whether asking users for tag pairs, rather than individual tags, can increase the quality of derived tag hierarchies. We measure the usability cost, and in particular cognitive effort required to create tag pairs rather than individual tags. Our results show that when applied to tag pairs a hierarchy creation algorithm (Heymann-Benz) has superior performance than when applied to individual tags, and with little impact on usability. However, the resulting hierarchies lack richness, and could be seen as less expressive than those derived from individual tags. This indicates that expressivity, not usability, is the limiting factor for collective tagging approaches aimed at crowdsourcing taxonomies.
Keywords: Folksonomies; Taxonomies; Collective Intelligence; Social Information Processing; Social Metadata; Tag similarities
Automating Credibility Assessment of Arabic News BIBAKFull-Text 139-152
  Mohamed Hammad; Elsayed Hemayed
During the past few years internet has witnessed a massive increase of Arabic language users. Accompanied with this increase in the number of users is an increase in e-publishing. However, necessary laws and regulations are not yet available to control the credibility of e-published content. Furthermore, many political conflicts have risen after the Arab Spring. All of this led to an increasing demand for assessing the credibility of news in general and e-news in particular.
   In this work, we present a system for automating credibility assessment of a news article based on two of the most important and most frequently violated criteria; (i) Does the news article indicate the source of its information? (ii) Does the news article indicate the time of occurrence of the reported event? For each of the chosen criteria, we build a classification model to classify a news article as either violating the criteria or not. News articles previously evaluated by MCE Watch (a manual service for news credibility assessment) are used in building and evaluation of our model. Experimental evaluations show that our model has accuracy that exceeds 82% for both criteria.
Keywords: Arabic language; credibility; machine learning; natural language processing; news
Polarity Detection of Foursquare Tips BIBAKFull-Text 153-162
  Felipe Moraes; Marisa Vasconcelos; Patrick Prado; Daniel Dalip; Jussara M. Almeida; Marcos Gonçalves
In location-based social networks, such as Foursquare, users may post tips with their opinions about visited places. Tips may directly impact the behavior of future visitors, providing valuable feedback to business owners. Sentiment or polarity detection has attracted great attention due to its vast applicability in opinion summarization, ranking or recommendation. However, the automatic detection of polarity of tips faces challenges due to their short sizes and informal content. This paper presents an empirical study of supervised and unsupervised techniques to detect the polarity of Foursquare tips. We evaluate the effectiveness of four methods on two sets of tips, finding that a simpler lexicon-based approach, which does not require costly manual labeling, can be as effective as state-of-the-art supervised methods. We also find that a hybrid approach that combines all considered methods by means of stacking does not significantly outperform the best individual method.
Keywords: Web 2.0 applications; Sentiment Analysis; Micro-reviews
The Study of Social Mechanisms of Organization, Boundary Capabilities, and Information System BIBAKFull-Text 163-176
  Shiuann-Shuoh Chen; Pei-Yi Chen; Min Yu; Yu-Wei Chuang
Exploring how organizational antecedents affect the boundary capabilities, this study identifies the differing effects for three components of boundary capabilities. The results indicate that the organizational mechanisms associated with the coordination capabilities primarily enhance a team's syntactic transfer, semantic translation, and pragmatic transformation. The organizational mechanisms associated with the socialization capabilities primarily increase a team's semantic translation and pragmatic transformation. The information system primarily enhances the coordination and socialization capabilities. Our findings reveal why the teams may have difficulty managing the levels of syntactic transfer, semantic translation, and pragmatic transformation and vary in their ability to create the value from their boundary capabilities.
Keywords: organization mechanisms; boundary capabilities; knowledge sharing; information system; system capabilities; coordination capabilities; socialization capabilities
Predicting User's Political Party Using Ideological Stances BIBAKFull-Text 177-191
  Swapna Gottipati; Minghui Qiu; Liu Yang; Feida Zhu; Jing Jiang
Predicting users political party in social media has important impacts on many real world applications such as targeted advertising, recommendation and personalization. Several political research studies on it indicate that political parties' ideological beliefs on sociopolitical issues may influence the users political leaning. In our work, we exploit users' ideological stances on controversial issues to predict political party of online users. We propose a collaborative filtering approach to solve the data sparsity problem of users stances on ideological topics and apply clustering method to group the users with the same party. We evaluated several state-of-the-art methods for party prediction task on debate.org dataset. The experiments show that using ideological stances with Probabilistic Matrix Factorization (PMF) technique achieves a high accuracy of 88.9% at 22.9% data sparsity rate and 80.5% at 70% data sparsity rate on users' party prediction task.
Keywords: Collaborative Filtering; Ideological Stances; Memory-based CF; Model-based CF; Probabilistic Matrix Factorization
A Fast Method for Detecting Communities from Tripartite Networks BIBAKFull-Text 192-205
  Kyohei Ikematsu; Tsuyoshi Murata
This paper proposes a fast method for detecting communities from tripartite networks. Our method is based on an optimization of tripartite modularity, and the method combines both edge clustering and Blondel's Fast Unfolding. Experimental results on synthetic tripartite networks show that accurate communities are detected with our method. Furthermore, an experiment on a real tripartite network shows that our method is scalable to tripartite networks of tens of thousands of vertices. To the best of our knowledge, this is the first attempt for analyzing real tripartite networks composed of tens of thousands of vertices.
Keywords: community detection; modularity; tripartite networks
Predicting Social Density in Mass Events to Prevent Crowd Disasters BIBAFull-Text 206-215
  Bernhard Anzengruber; Danilo Pianini; Jussi Nieminen; Alois Ferscha
Human mobility behavior emerging in social events involving huge masses of individuals bears potential hazards for irrational social densities. We study the emergence of such phenomena in the context of very large public sports events, analyzing how individual mobility decision making induces undesirable mass effects. A time series based approach is followed to predict mobility patterns in crowds of spectators, and related to the event agenda over the time it evolves. Evidence is collected from an experiment conducted in one of the biggest international sports events (the Vienna city marathon with 40.000 actives and around 300.000 spectators). A smartphone app has been developed to voluntarily engage people to provide mobility data (1503 high-quality GPS traces and 1092694 Bluetooth relations have been collected), based on which prediction analysis has been performed. Using this data as training set, we compare density estimation approaches and evaluate them based on their forecasting precision. The most promising approach using Support Vector Regression (SMOreg) achieved prediction accuracies below 2 (root-mean-squared deviation) when compared to actual evidenced density distributions for a 12 minute forecasting interval.
Modeling Social Capital in Bureaucratic Hierarchy for Analyzing Promotion Decisions BIBAKFull-Text 216-226
  Jyi-Shane Liu; Zhuan-Yao Lin; Ke-Chih Ning
We report research results in applying social network analysis to develop a data-driven computational approach for social scientists to perform investigative exploration on analyzing bureaucratic promotion. We consider social capital as primary determinants of promotion decisions in bureaucratic hierarchy and propose a hybrid multiplex social network model for representing relational and structural information among entities. The approach develops quantified assessment of social capital and provides objective evaluation of promotion decisions in anterior prediction. Experimental results with actual government officials' career data provide evidence to the effectiveness and the utility of social capital evaluation for bureaucratic promotion decisions.
Keywords: social network analysis; social capital assessment; bureaucratic promotion decision
Information vs Interaction: An Alternative User Ranking Model for Social Networks BIBAKFull-Text 227-240
  Wei Xie; Ai Phuong Hoang; Feida Zhu; Ee-Peng Lim
The recent years have seen an unprecedented boom of social network services, such as Twitter, which boasts over 200 million users. In such big social platforms, the influential users are ideal targets for viral marketing to potentially reach an audience of maximal size. Most proposed algorithms use the linkage structure of the underlying network to measure the information flow and hence evaluate a users' influence. Yet that is not the full story for social networks. In this paper, we propose to examine users' influence from a social interaction perspective. We built a ranking model based on the dynamic user interactions taking place on top of these underlying linkage structures. In particular, in the Twitter setting we supposed a principle of balanced retweet reciprocity, and then formulated it to re-evaluate the value of Twitter users. Our experiments on real Twitter data demonstrated that our proposed model presents different yet equally insightful user ranking results.
Keywords: Twitter; user ranking; retweet behaviour
Feature Extraction and Summarization of Recipes Using Flow Graph BIBAFull-Text 241-254
  Yoko Yamakata; Shinji Imahori; Yuichi Sugiyama; Shinsuke Mori; Katsumi Tanaka
These days, there are more than a million recipes on the Web. When you search for a recipe with one query such as "nikujaga," the name of a typical Japanese food, you can find thousands of "nikujaga" recipes as the result. Even if you focus on only the top ten results, it is still difficult to find out the characteristic feature of each recipe because a cooking is a work-flow including parallel procedures. According to our survey, people place the most importance on the differences of cooking procedures when they compare the recipes. However, such differences are difficult to be extracted just by comparing the recipe texts as existing methods. Therefore, our system extracts (i) a general way to cook as a summary of cooking procedures and (ii) the characteristic features of each recipe by analyzing the work-flows of the top ten results. In the experiments, our method succeeded in extracting 54% of manually extracted features while the previous research addressed 37% of them.
Unsupervised Opinion Targets Expansion and Modification Relation Identification for Microblog Sentiment Analysis BIBAFull-Text 255-267
  Jenq-Haur Wang; Ting-Wei Ye
Microblog brings challenges to existing researches on sentiment analysis. First, microblog short messages might contain fewer content features. Second, it's difficult to know what users want to express without suitable contexts. On the other hand, people tend to express their opinions in microblog messages, which could be helpful to sentiment analysis. In this paper, we propose a sentiment analysis approach based on opinion target finding and modification relations identification in microblog. First, user comments on specific topics are collected from microblog and preprocessed to reduce noises. Then, opinion targets are expanded by discovering the most frequently co-occurring terms, named entities, and synonyms of the topic. Finally, according to modification relations among part-of-speech (POS) tags, we extract entities or aspects of the entities about which an opinion has been expressed and calculate the overall score of sentiment orientation. In our experiment on 1,000 reviews of 50 movies collected from Twitter, the proposed method can achieve an average accuracy of 84.4% and an average precision of 87.1%, which is better than content similarity with SVM and Naive Bayes. This validates the higher precision in sentiment orientation identification for the proposed approach.
Pilot Study toward Realizing Social Effect in O2O Commerce Services BIBAKFull-Text 268-273
  Tse-Ming Tsai; Ping-Che Yang; Wen-Nan Wang
Social media has become the most convenient space to retrieve the tremendous consumers' experience, opinion and preference -- toward each brands, products or even specific features. The real-time and big amount characteristics of Social media provide great opportunity for producer to know their customers (and potential ones) well. This paper proposes an Online to Offline (O2O) Commerce Service Model and takes the social relationship dashboard as an pilot study, which can help retailers or brands to understand their customers via social network existing data (especially Facebook for this case) by which we can adapt the current social commerce marketing strategy more quickly and responsively.
Keywords: Social Effect; Social Relation Management; O2O
The Estimation of aNobii Users' Reading Diversity Using Book Co-ownership Data: A Social Analytical Approach BIBAKFull-Text 274-283
  Muh-Chyun Tang; Yi-Ling Ke; Yi-Jin Sie
Usage data available through social media provides a great many opportunities to capture users' preference. Using books saved in users' online bookshelves, the study set out to explore social network analytical methods to capture the diversity of a reader's reading interests. "Reading diversity" denotes how widely scattered one's reading interests are. Drawing data from aNobii, a social networking site for book lovers, users' reading diversity was defined by the number of components created by the book co-ownership network of the books in their bookshelves. Five book-book similarity measures were proposed and their clustering results were tested against users' self-assessed reading diversity in order to identify the best suited similarity measure and threshold for such a task. One of the proposed similar measures produce a clustering results that is significantly correlated with users' self-assessed diversity. Furthermore, a multiple regression analysis showed that the proposed measure was able to provide explanatory power for reading diversity over and above mere counting the number of books in the bookshelf.
Keywords: social network analysis; preference structure; book co-ownership network
An Ontology-Based Technique for Online Profile Resolution BIBAKFull-Text 284-298
  Keith Cortis; Simon Scerri; Ismael Rivera; Siegfried Handschuh
Instance matching targets the extraction, integration and matching of instances referring to the same real-world entity. In this paper we present a weighted ontology-based user profile resolution technique which targets the discovery of multiple online profiles that refer to the same person identity. The elaborate technique takes into account profile similarities at both the syntactic and semantic levels, employing text analytics on top of open data knowledge to improve its performance. A two-staged evaluation of the technique performs various experiments to determine the best out of alternative approaches. These results are then considered in an improved algorithm, which is evaluated by real users, based on their real social network data. Here, a profile matching precision rate of 0.816 is obtained. The presented Social Semantic Web technique has a number of useful applications, such as detection of untrusted known persons behind anonymous profiles, and information sharing management across multiple social networks.
Keywords: profile resolution; person identity; named entity recognition; semantic relatedness; social networks; linked open data; social semantic web
Aspects of Rumor Spreading on a Microblog Network BIBAKFull-Text 299-308
  Sejeong Kwon; Meeyoung Cha; Kyomin Jung; Wei Chen; Yajun Wang
Rumors have been studied for several decades in social and psychological fields, where most studies were theory-driven and relied on surveys due to difficulties in gathering data. Rumor research is now gaining new perspectives, because online social media enable researchers to examine closely various kinds of information dissemination on the Internet. In this paper, we review social psychology literature on rumors and try to identify the key differences in the dissemination of rumors and non-rumors. The insights from this study can shed light on improving automatic classification of rumors and better comprehending rumor theories in online social media.
Keywords: Rumor; Social Media; Diffusion Structure; Linguistic Properties
Traffic Condition Is More Than Colored Lines on a Map: Characterization of Waze Alerts BIBAKFull-Text 309-318
  Thiago H. Silva; Pedro O. S. Vaz de Melo; Aline Carneiro Viana; Jussara M. Almeida; Juliana Salles; Antonio A. F. Loureiro
Participatory sensor network (PSN) enables the understanding of city dynamics and the urban behavioral patterns of their inhabitants. In this work, we focus our analysis on a specific PSN, derived from Waze, for sensing traffic conditions. Our objective is to characterize the properties of this PSN, its broad and global spatial coverage as well as its limitations. We also bring discussions on different opportunities for application design using this network. We claim that the PSN derived from Waze has the potential to help us in the better understanding of traffic problem reasons. Besides that, it could be useful for improving algorithms used in navigation services: (1) by exploiting the provided real-time traffic information or (2) by helping in the identification of valuable pieces of information that are hard to detect with traditional sensors, such as car accidents and potholes.
Keywords: Urban social behavior; city dynamics; participatory sensing; mobile social networks; social big data
The Three Dimensions of Social Prominence BIBAFull-Text 319-332
  Diego Pennacchioli; Giulio Rossetti; Luca Pappalardo; Dino Pedreschi; Fosca Giannotti; Michele Coscia
One classic problem definition in social network analysis is the study of diffusion in networks, which enables us to tackle problems like favoring the adoption of positive technologies. Most of the attention has been turned to how to maximize the number of influenced nodes, but this approach misses the fact that different scenarios imply different diffusion dynamics, only slightly related to maximizing the number of nodes involved. In this paper we measure three different dimensions of social prominence: the Width, i.e. the ratio of neighbors influenced by a node; the Depth, i.e. the degrees of separation from a node to the nodes perceiving its prominence; and the Strength, i.e. the intensity of the prominence of a node. By defining a procedure to extract prominent users in complex networks, we detect associations between the three dimensions of social prominence and classical network statistics. We validate our results on a social network extracted from the Last.Fm music platform.
Automatic Thematic Content Analysis: Finding Frames in News BIBAFull-Text 333-345
  Daan Odijk; Björn Burscher; Rens Vliegenthart; Maarten de Rijke
Framing in news is the way in which journalists depict an issue in terms of a 'central organizing idea.' Frames can be a perspective on an issue. We explore the automatic classification of four generic news frames: conflict, human interest, economic consequences, and morality. Complex characteristics of messages such as frames have been studied using thematic content analysis. Indicator questions are formulated, which are then manually coded by humans after reading a text and combined into a characterization of the message. We operationalize this as a classification task and, inspired by the way-of-working of media analysts, we propose a two-stage approach, where we first rate a news article using indicator questions for a frame and then use the outcomes to predict whether a frame is present. We approach human accuracy on almost all indicator questions and frames.
Optimal Scales in Weighted Networks BIBAKFull-Text 346-359
  Diego Garlaschelli; Sebastian E. Ahnert; Thomas M. A. Fink; Guido Caldarelli
The analysis of networks characterized by links with heterogeneous intensity or weight suffers from two long-standing problems of arbitrariness. On one hand, the definitions of topological properties introduced for binary graphs can be generalized in non-unique ways to weighted networks. On the other hand, even when a definition is given, there is no natural choice of the (optimal) scale of link intensities (e.g. the money unit in economic networks). Here we show that these two seemingly independent problems can be regarded as intimately related, and propose a common solution to both. Using a formalism that we recently proposed in order to map a weighted network to an ensemble of binary graphs, we introduce an information-theoretic approach leading to the least biased generalization of binary properties to weighted networks, and at the same time fixing the optimal scale of link intensities. We illustrate our method on various social and economic networks.
Keywords: Weighted Networks; Maximum Entropy Principle; Graph Theory; Network Science
Why Do I Retweet It? An Information Propagation Model for Microblogs BIBAFull-Text 360-369
  Fabio Pezzoni; Jisun An; Andrea Passarella; Jon Crowcroft; Marco Conti
Microblogging platforms are Web 2.0 services that represent a suitable environment for studying how information is propagated in social networks and how users can become influential. In this work we analyse the impact of the network features and of the users' behaviour on the information diffusion. Our analysis highlights a strong relation between the level of visibility of a message in the flow of information seen by a user and the probability that the user further disseminates the message. In addition, we also highlight the existence of other latent factors that impact on the dissemination probability, correlated with the properties of the user that generated the message. Considering these results we define an information propagation model that generates information cascades (i.e. flows of messages propagated from user to user) whose statistical properties match empirical observations.
Society as a Life Teacher -- Automatic Recognition of Instincts Underneath Human Actions by Using Blog Corpus BIBAKFull-Text 370-376
  Rafal Rzepka; Kenji Araki
In this paper we introduce a method for generating a set of possible reasons of an action needed by an AI program for reasoning about human behavior. We achieve this goal by using web-mining and lexicons of keywords reflecting 14 instincts categories developed by psychologist William McDougall. We describe our system, the experiment and analyze its results of 78% of correct retrievals. The paper is also meant to be a message to social scientists who might be interested in testing their theories on constantly growing group of Internet users.
Keywords: causal knowledge retrieval; human instincts; text-mining
Diversity-Based HITS: Web Page Ranking by Referrer and Referral Diversity BIBAFull-Text 377-390
  Yoshiyuki Shoji; Katsumi Tanaka
We propose a Web ranking method that considers the diversity of linked pages and linking pages. Typical link analysis algorithms such as HITS and PageRank calculate scores by the number of linking pages. However, even if the number of links is the same, there is a big difference between documents linked by pages with similar content and those linked by pages with very different content. We propose two types of link diversity, referral diversity (diversity of pages linked by the page) and referrer diversity (diversity of pages linking to the page), and use the resulting diversity scores to expand the basic HITS algorithm. The results of repeated experiments showed that the diversity-based method is more useful than the original HITS algorithm for finding useful information on the Web.
The Babel of Software Development: Linguistic Diversity in Open Source BIBAFull-Text 391-404
  Bogdan Vasilescu; Alexander Serebrenik; Mark G. J. van den Brand
Open source software (OSS) development communities are typically very specialised, on the one hand, and experience high turnover, on the other. Combination of specialization and turnover can cause parts of the system implemented in a certain programming language to become unmaintainable, if knowledge of that language has disappeared together with the retiring developers.
   Inspired by measures of linguistic diversity from the study of natural languages, we propose a method to quantify the risk of not having maintainers for code implemented in a certain programming language. To illustrate our approach, we studied risks associated with different languages in Emacs, and found examples of low risk due to high popularity (e.g., C, Emacs Lisp); low risk due to similarity with popular languages (e.g., C++, Java, Python); or high risk due to both low popularity and low similarity with popular languages (e.g., Lex). Our results show that methods from the social sciences can be successfully applied in the study of information systems, and open numerous avenues for future research.
Using and Asking: APIs Used in the Android Market and Asked about in StackOverflow BIBAFull-Text 405-418
  David Kavaler; Daryl Posnett; Clint Gibler; Hao Chen; Premkumar Devanbu; Vladimir Filkov
Programming is knowledge intensive. While it is well understood that programmers spend lots of time looking for information, with few exceptions, there is a significant lack of data on what information they seek, and why. Modern platforms, like Android, comprise complex APIs that often perplex programmers. We ask: which elements are confusing, and why? Increasingly, when programmers need answers, they turn to StackOverflow. This provides a novel opportunity. There are a vast number of applications for Android devices, which can be readily analyzed, and many traces of interactions on StackOverflow. These provide a complementary perspective on using and asking, and allow the two phenomena to be studied together. How does the market demand for the USE of an API drive the market for knowledge about it? Here, we analyze data from Android applications and StackOverflow together, to find out what it is that programmers want to know and why.
Temporal, Cultural and Thematic Aspects of Web Credibility BIBAKFull-Text 419-428
  Radoslaw Nielek; Aleksander Wawer; Michal Jankowski-Lorek; Adam Wierzbicki
Is trust to web pages related to nation-level factors? Do trust levels change in time and how? What categories (topics) of pages tend to be evaluated as not trustworthy, and what categories of pages tend to be trustworthy? What could be the reasons of such evaluations? The goal of this paper is to answer these questions using large scale data of trustworthiness of web pages, two sets of websites, Wikipedia and an international survey.
Keywords: trust; language; Wikipedia; temporal; national; credibility
Social-Urban Neighborhood Search Based on Crowd Footprints Network BIBAFull-Text 429-442
  Shoko Wakamiya; Ryong Lee; Kazutoshi Sumiya
Neighborhood is generally a geographically localized community often with face-to-face social interactions. However, modern cities and the widespread social networks have been drastically changing the concept of neighborhood, much beyond spatial constraint. Specifically, due to the complicated urban structures with entangled transportation network and the resulting spatio-temporally extended crowd activities, it is a non-trivial task to examine neighborhood areas from a location of interest. As a promising approach to investigate such a social-urban structure, we propose a social-urban neighborhood search which aims at identifying neighborhood areas from a specific location particularly considering social interactions between urban areas. We especially examine crowd movings through location-based social networks as an important indicator for measuring social interactions. We also introduce a data structure for aggregation of crowd movings as a simplified graph, with which we can easily analyze crowd movements in a large scale urban area. In the experiment, we will look into neighborhoods for several urban areas of our interests in terms of social interactions significantly focusing on how they are distorted from general localized vicinity.
A Notification-Centric Mobile Interaction Survey and Framework BIBAKFull-Text 443-456
  Jonas Elslander; Katsumi Tanaka
In this paper we describe the results of a survey amongst smartphone owners into the use and perception of mobile notifications against 160 parameters. We conclude that not all notifications should be created or treated equally by mobile operating systems. The current generation of notifications proves not diverse enough and doesn't fit the needs and preferences of most smartphone users. Based on our findings, we offer a framework of design guidelines for more effectively engaging users with interactions initiated by the system.
Keywords: mobile; notifications; interaction; social; behaviour; perception; survey; questionnaire; framework; design; guidelines
How Do Students Search during Class and Homework? BIBAKFull-Text 457-466
  Rafael López-García; Makoto P. Kato; Yoko Yamakata; Katsumi Tanaka
Strong points, weak points and interests of students are precious data for their teachers, but it is hard to learn them quickly, especially when students do not cooperate in class. This paper explores a method for analysing queries of students that are allowed to search during class and homework. For this purpose, we first established six hypotheses on the queries and the expertise of the students. Then, we collected 143 queries from several lectures of an IT subject at Kyoto University. 36 students of this subject had previously been profiled before each lecture by means of questionnaires. When we checked our hypotheses against this collection of queries, we found that experts and novices often search the same way, although experts send more queries about different subjects. Some students also search contents that the teacher has not presented yet.
Keywords: query log; query log analysis; education; faculty development
On Constrained Adding Friends in Social Networks BIBAKFull-Text 467-477
  Bao-Thien Hoang; Abdessamad Imine
Online social networks are currently experiencing a peak and they resemble real platforms of social conversion and content delivery. Indeed, they are exploited in many ways: from conducting public opinion polls about any political issue to planning big social events for a large public. To securely perform these large-scale computations, current protocols use a simple secret sharing scheme which enables users to obfuscate their inputs. However, these protocols require a minimum number of friends, i.e. the minimum degree of the social graph should be not smaller than a given threshold. Often this condition is not satisfied by all social graphs. Yet we can reuse these graphs after some structural modifications consisting in adding new friendship relations. In this paper, we provide the first definition and theoretical analysis of the "adding friends" problem. We formally describe this problem that, given a graph G and parameter c, asks for the graph satisfying the threshold c that results from G with the minimum of edge-addition operations. We present algorithms for solving this problem in centralized social networks. An experimental evaluation on real-world social graphs demonstrates that our protocols are accurate and inside the theoretical bounds.
Keywords: Social networks; Graph editing; Adding friends
Social Sensing for Urban Crisis Management: The Case of Singapore Haze BIBAFull-Text 478-491
  Philips Kokoh Prasetyo; Ming Gao; Ee-Peng Lim; Christie Napa Scollon
Sensing social media for trends and events has become possible as increasing number of users rely on social media to share information. In the event of a major disaster or social event, one can therefore study the event quickly by gathering and analyzing social media data. One can also design appropriate responses such as allocating resources to the affected areas, sharing event related information, and managing public anxiety. Past research on social event studies using social media often focused on one type of data analysis (e.g., hashtag clusters, diffusion of events, influential users, etc.) on a single social media data source. This paper adopts a comprehensive social event analysis framework covering content, emotion, activity, and network. We propose a set of measures for each dimension accordingly. The usefulness of these analyses are demonstrated through a haze event that severely affected Singapore and its neighbors in June 2013. The analysis, conducted on both Twitter and Foursquare data, shows that much user attention was given to the haze event. The event also saw substantial emotional and behavioral impact on the social media users. These additional insights will help both public and private sectors to prepare themselves for future haze related events.
Measurement Quality of Online Collaboration in Presence of Negative Relationships BIBAFull-Text 3-13
  Mikolaj Morzy; Tomasz Bartkowski; Krzysztof Jedrzejewski
Online collaboration services usually focus on positive relationships between constituting actors. Many environments in which social mechanisms are present harness positive feedback of social recognition, status visibility, or collective action. Simple mechanisms of commenting on status updates and up-voting of resources attributed to an actor may result in proverbial karma flow in the socially aware online collaboration environment. On the other hand, many services allow users to also express their dislike, irritation and contempt towards resources provided by users. For instance, down-voting mechanics is crucial in online news aggregation services, such as Digg or Reddit, to maintain a certain level of quality of presented contents. Despite the availability of data, not many works have been published on measuring the negative network effects in social networks. In this paper we analyze a large body of data harvested from a Polish online news aggregation site Wykop.pl and we examine the effects of a more considerate approach to negative network construction when measuring the overall parameters and characteristics of the social network derived from positive (up-voting) and negative (down-voting) behaviors of users.
What Makes a Good Team of Wikipedia Editors? A Preliminary Statistical Analysis BIBAKFull-Text 14-28
  Leszek Bukowski; Michal Jankowski-Lorek; Szymon Jaroszewicz; Marcin Sydow
The paper concerns studying the quality of teams of Wikipedia authors with statistical approach. We report preparation of a dataset containing numerous behavioural and structural attributes and its subsequent analysis and use to predict team quality. We have performed exploratory analysis using partial regression to remove the influence of attributes not related to the team itself. The analysis confirmed that the key issue significantly influencing article's quality are discussions between teem members. The second part of the paper successfully uses machine learning models to predict good articles based on features of the teams that created them.
Keywords: team quality; Wikipedia; dataset; statistical data mining
iPoster: A Collaborative Browsing Platform for Presentation Slides Based on Semantic Structure BIBAFull-Text 29-42
  Yuanyuan Wang; Kota Tomoyasu; Kazutoshi Sumiya
Coursera and SlideShare are crucial platforms for improving education; students are able to obtain various educational presentation materials through the Web. Recently, Prezi introduced a zoomable canvas as a substitute to the traditional presentations that allows users to zoom in and out of the presentation media. Teachers then attempt to provide presentations in a nonlinear fashion for enhancing the user interaction through these presentations. However, creation of non-linear presentations would be time-consuming, besides posing design challenges. In this paper, in order to support collaborative browsing, we build a novel collaborative browsing platform that generates meaningfully structured presentations, called "iPoster;" this enables users to automatically navigate through the slide-based educational materials. The system places elements such as text and graphics of presentation slides in a structural layout by semantically analyzing the slide structure. The structural layout can reveal the hierarchy of elements by moving from the overview to a detail using automatic transitions, such as zooms and pans. Through this, the collaborative browsing platform can support multiple students to interactively browse an iPoster in cyberspace on their tablets. The navigation information maps each student's specific needs by considering the student's operations, and detects other students who have similar learning purposes to help them share their interests with each other.
Using E-mail Communication Network for Importance Measurement in Collaboration Environments BIBAFull-Text 43-54
  Pawel Lubarski; Mikolaj Morzy
Can we establish the importance of people by simply analyzing the set of sent and received emails, having no access to subject lines or contents of messages? The answer, apparently, is "yes we can". Intrinsic behavior of people reveals simple patterns in choosing which emails to answer next. Our theory is based on two assumptions. We assume that people do their email communication in bursts, answering several messages consecutively and that they can freely choose the order of answers. Secondly, we believe that people use priority queues to manage their internal task lists, including the list of emails to be answered. Looking at timing and ordering of responses we derive individual rankings of importance of actors, because we posit that people have a tendency to reply to important actors first. These individual subjective rankings are significant because they reflect the relative importance of other actors as perceived by each actor. The individual rankings are aggregated into a global ranking of importance of all actors. We perform an experimental evaluation of our model by analyzing the dataset consisting of over 600,000 emails sent during one year period to 200 employees of our university. Our final ranking closely reflects the "true" importance of employees computed based on surveys. We think that our model is general and can be applied whenever behavioral data is available which includes any choice made by actors from a set of available alternatives with the alternatives having varying degrees of importance to individual actors.
Predicting Best Answerers for New Questions: An Approach Leveraging Topic Modeling and Collaborative Voting BIBAKFull-Text 55-68
  Yuan Tian; Pavneet Singh Kochhar; Ee-Peng Lim; Feida Zhu; David Lo
Community Question Answering (CQA) sites are becoming increasingly important source of information where users can share knowledge on various topics. Although these platforms bring new opportunities for users to seek help or provide solutions, they also pose many challenges with the ever growing size of the community. The sheer number of questions posted everyday motivates the problem of routing questions to the appropriate users who can answer them. In this paper, we propose an approach to predict the best answerer for a new question on CQA site. Our approach considers both user interest and user expertise relevant to the topics of the given question. A user's interests on various topics are learned by applying topic modeling to previous questions answered by the user, while the user's expertise is learned by leveraging collaborative voting mechanism of CQA sites. We have applied our model on a dataset extracted from StackOverflow, one of the biggest CQA sites. The results show that our approach outperforms the TF-IDF based approach.
Keywords: CQA; expert recommendation; topic modeling; collaborative voting
A Digital Humanities Approach to the History of Science BIBAFull-Text 71-85
  Pim Huijnen; Fons Laan; Maarten de Rijke; Toine Pieters
Comparative historical research on the intensity, diversity and fluidity of public discourses has been severely hampered by the extraordinary task of manually gathering and processing large sets of opinionated data in news media in different countries. At most 50,000 documents have been systematically studied in a single comparative historical project in the subject area of heredity and eugenics. Digital techniques, like the text mining tools WAHSP and BILAND we have developed in two successive demonstrator projects, are able to perform advanced forms of multi-lingual text-mining in much larger data sets of newspapers. We describe the development and use of WAHSP and BILAND to support historical discourse analysis in large digitized news media corpora. Furthermore, we argue how text mining techniques overcome the problem of traditional historical research that only documents explicitly referring to eugenics issues and debates can be incorporated. Our tools are able to provide information on ideas and notions about heredity, genetics and eugenics that circulate in discourses that are not directly related to eugenics (e.g., sport, education and economics).
Building the Social Graph of the History of European Integration BIBAKFull-Text 86-99
  Lars Wieneke; Marten Düring; Ghislain Silaume; Carine Lallemand; Vincenzo Croce; Marilena Lazzarro; Francesco Nucci; Chiara Pasini; Piero Fraternali; Marco Tagliasacchi; Mark Melenhorst; Jasminko Novak; Isabel Micheel; Erik Harloff; Javier Garcia Moron
The breadth and scale of multimedia archives provides a tremendous potential for historical research that hasn't been fully tapped up to know. In this paper we want to discuss the approach taken by the History of Europe application, a demonstrator for the integration of human and machine computation that combines the power of face recognition technology with two distinctively different crowd-sourcing approaches to compute co-occurrences of persons in historical image sets. These co-occurrences are turned into a social graph that connects persons with each other and positions them, through information about the date and location of recording, in time and space. The resulting visualization of the graph as well as analytical tools can help historians to find new impulses for research and to un-earth previously unknown relationships. As such the integration of human expertise and machine computation enables a new class of applications for the exploration of multimedia archives with significant potential for the digital humanities.
Keywords: Face recognition; Entity linking; User centered design; Data visualization; Digital Humanities; Human-machine Interaction; History; European Studies
From Diagram to Network BIBAKFull-Text 100-109
  Yanan Sun
This paper aims to remove a constraint of applying network approach to art history. First, it points out, although old diagrams of art history did not use the language of modern network theory, they have already shown ingenuous network thinking to theorize the development of arts. Meanwhile, the indirect visual devices and the embracive tradition of these diagrams, which includes entities in various properties, prevent the application of computer-aided network methods to decipher and re-analyze the contents of this heritage of art historical research. To break this shackle, this paper suggests a multi-mode network approach to "translate" the traditional network thinking of art diagrams to the conceptualization of graph-theoretical network analysis. By doing so, this paper demonstrates how art historical research could benefit from modern sociological approach to network theory. To explain the usefulness and advantage of this method, the diagrams of Covarrubias and Barr are taken as examples to be converted into graph-theoretical networks.
Keywords: multi-mode network; historic network research; art history; art-history diagram
Frame-Based Models of Communities and Their History BIBAKFull-Text 110-119
  Robert B. Allen
Previous models of communities and their history have focused on the entities in those communities such as their locations and people. We introduce models which incorporate behaviors and processes. We propose that approaches based on object-oriented modeling are particularly useful. Specifically, we explore the feasibility of developing object-oriented models which employ linguistic frames adapted from the FrameNet corpus. We apply these models to relatively straightforward and self-contained historical scenarios. We implement the models in Java and analyze some of advantages and challenges in that approach. Historical newspapers are particularly rich sources of natural language descriptions about communities but there are many sources of non-linguistic information about communities which may also be incorporated. We consider the possibilities of developing more coherent models of communities based on modeling processes, partonomies, systems, and situations. Finally, we consider enabling greater interactivity with the structured models and alternative architectures for the models.
Keywords: Behavior; Descriptive Modeling; Digital Humanities; Events; Functionality; FrameNet; Indexing; Information Organization; Java; Object-Oriented Modeling; Processes; Social Modeling
Documenting Social Unrest: Detecting Strikes in Historical Daily Newspapers BIBAFull-Text 120-133
  Kalliopi Zervanou; Marten Düring; Iris Hendrickx; Antal van den Bosch
The identification of relevant historical sources such as newspapers and letters and the extraction of information from them is an essential part of historical research. In this work, our aim is the detection of relevant primary sources with the goal to support researchers working on a specific historical event. We focus on the historical daily Dutch newspaper archive of the National Library of the Netherlands and strike events that happened in the Netherlands during the 1980s. Using a manually compiled database of strikes in the Netherlands, we first attempt to find reports on those strikes in historical daily newspapers by automatically associating database records to the daily press of the time covering the same strike. Then, we generalise our methodology to detect strike events in the press not currently covered by the strikes database, and support in this way the extension of secondary historical resources. Our methods are evaluated against the manually constructed database of strikes.
Collective Memory in Poland: A Reflection in Street Names BIBAKFull-Text 134-142
  Radoslaw Nielek; Aleksander Wawer; Adam Wierzbicki
Our article starts with an observation that street names fall into two general types: generic and historically inspired. We analyse street names distributions (of the second type) as a window to nation-level collective memory in Poland. The process of selecting street names is determined socially, as the selections reflect the symbols considered important to the nation-level society, but has strong historical motivations and determinants. In the article, we seek for these relationships in the available data sources. We use Wikipedia articles to match street names with their textual descriptions and assign them to the time points. We then apply selected text mining and statistical techniques to reach quantitative conclusions. We also present a case study: the geographical distribution of two particular street names in Poland to demonstrate the binding between history and political orientation of regions.
Keywords: collective memory; Wikipedia; street names