HCI Bibliography Home | HCI Conferences | About HYPER | HYPER Conf Proceedings | Detailed Records | RefWorks | EndNote | Hide Abstracts
HYPER Tables of Contents: 020304050607080910111213 ⇐ MORE

Proceedings of the 23rd ACM Conference on Hypertext and Social Media

Fullname:Proceedings of the 23rd ACM conference on Hypertext and Social Media
Editors:Ethan Munson; Markus Strohmaier
Location:Milwaukee, Wisconsin
Dates:2012-Jun-25 to 2012-Jun-28
Publisher:ACM
Standard No:ISBN: 978-1-4503-1335-3; ACM DL: Table of Contents hcibib: HYPER12
Papers:46
Pages:326
Links:Conference Website
Summary:It is our great pleasure to welcome you to ACM Hypertext 2012, the 23rd ACM Conference on Hypertext and Social Media (HT'2012) which is held at the campus of the University of Wisconsin-Milwaukee in Milwaukee, Wisconsin, from June 25-28, 2012.
    The ACM Hypertext and Social Media conference is a premium venue for high quality peerreviewed research on hypertext theory, systems and applications. It is concerned with all aspects of modern hypertext research including social media, semantic web, dynamic and computed hypertext and hypermedia as well as narrative systems and applications. This year's HT'12 conference focuses on exploring, studying and shaping relationships between four important dimensions of links in hypertextual systems and the World Wide Web: people, data, resources and stories. This focus is reflected in the four tracks we have:
    Social Media (Linking people) chaired by Claudia Müller-Birn and Munmun De Choudhury
    Semantic Data (Linking data) chaired by Harith Alani and Alexandre Passant
    Adaptive Hypertext and Hypermedia (Linking resources) chaired by Jill Freyne and Shlomo Berkovsky
    Hypertext and Narrative Connections (Linking stories) chaired by Andrew S. Gordon and Frank Nack
    Three keynote speakers for HT'12 present novel and stimulating ideas in this context: Steffen Staab, Jure Leskovec and Sinan Aral. Steffen Staab is a professor for databases and information systems at the University of Koblenz-Landau and director of the institute for Web-Science and Technologies (WeST). His interests lie in researching core technology for ontologies and semantic web as well as in applied research for exploiting these technologies for knowledge management, multimedia and software technology. Jure Leskovec is an assistant professor of Computer Science at Stanford University. His research focuses on mining and modeling large social and information networks, their evolution, and diffusion of information and influence over them. Sinan Aral is an Assistant Professor and Microsoft Faculty Fellow at the NYU Stern School of Business. His research focuses on social contagion and measuring and managing how information diffusion in massive social networks affects information, worker productivity, consumer demand and viral marketing.
    HT'12 received 120 submissions in total, with 89 full paper and 31 short paper submissions. Each submission was reviewed by at least three reviewers, discussed by the track program committee, and finally discussed among the program chair Markus Strohmaier and the track co-chairs Claudia Müller-Birn, Munmun De Choudhury, Harith Alani, Alexandre Passant, Jill Freyne, Shlomo Berkovsky. Andrew S. Gordon and Frank Nack. We were able to accept 33 full and short papersfor oral presentation and publication in the proceedings (27 full papers, 6 short papers). Thus, the overall paper acceptance rate for HT'2012 is 27.5%. In addition to this program, we have accepted a number of posters and we will also feature an exciting program of workshops and tutorials thanks to the effort of Arkaitz Zubiaga and Carlos Solis. Thanks to Alvin Chin, we have disseminated all information about important HT'12 deadlines and activities through online social media.
    The program with (free) workshops and tutorials, three keynotes, social activities and generous student grants has been made possible through the sponsorship of ACM SIGWEB. In addition, to SIGWEB's support, Hypertext 2012 has received generous support from Taylor & Francis, the publisher of the New Review of Hypermedia and Multimedia, and from our host institution, the College of Engineering and Applied Science of the University of Wisconsin-Milwaukee.
    We hope that this program provides interesting ideas and novel stimuli for the attendees of HT'12, and we are looking forward to an exciting event.
  1. Keynote address
  2. Social media
  3. Semantic data
  4. Adaptive hypertext and hypermedia
  5. Hypertext and narrative connections
  6. Engelbart/Nelson award nominees
  7. Keynote address
  8. Social media
  9. Social media & hypertext
  10. Social media
  11. Semantic data
  12. Engelbart/Nelson award nominees
  13. Social media
  14. Posters

Keynote address

How to do things with triples BIBAFull-Text 1-2
  Steffen Staab
Representing and computing pragmatics for linked data requires the usage of various models, including ontology patterns and navigation models, as well as new programming language constructs.

Social media

Revisiting reverts: accurate revert detection in wikipedia BIBAFull-Text 3-12
  Fabian Flöck; Denny Vrandecic; Elena Simperl
Wikipedia is commonly used as a proving ground for research in collaborative systems. This is likely due to its popularity and scale, but also to the fact that large amounts of data about its formation and evolution are freely available to inform and validate theories and models of online collaboration. As part of the development of such approaches, revert detection is often performed as an important pre-processing step in tasks as diverse as the extraction of implicit networks of editors, the analysis of edit or editor features and the removal of noise when analyzing the emergence of the content of an article. The current state of the art in revert detection is based on a rather naive approach, which identifies revision duplicates based on MD5 hash values. This is an efficient, but not very precise technique that forms the basis for the majority of research based on revert relations in Wikipedia. In this paper we prove that this method has a number of important drawbacks -- it only detects a limited number of reverts, while simultaneously misclassifying too many edits as reverts, and not distinguishing between complete and partial reverts. This is very likely to hamper the accurate interpretation of the findings of revert-related research. We introduce an improved algorithm for the detection of reverts based on word tokens added or deleted to addresses these drawbacks. We report on the results of a user study and other tests demonstrating the considerable gains in accuracy and coverage by our method, and argue for a positive trade-off, in certain research scenarios, between these improvements and our algorithm's increased runtime.
Leveraging editor collaboration patterns in wikipedia BIBAFull-Text 13-22
  Hoda Sepehri Rad; Aibek Makazhanov; Davood Rafiei; Denilson Barbosa
Predicting the positive or negative attitude of individuals towards each other in a social environment has long been of interest, with applications in many domains. We investigate this problem in the context of the collaborative editing of articles in Wikipedia, showing that there is enough information in the edit history of the articles that can be utilized for predicting the attitude of co-editors. We train a model using a distant supervision approach, by labeling interactions between editors as positive or negative depending on how these editors vote for each other in Wikipedia admin elections. We use the model to predict the attitude among other editors, who have neither run nor voted in an election. We validate our model by assessing its accuracy in the tasks of predicting the results of the actual elections, and identifying controversial articles. Our analysis reveals that the interactions in co-editing articles can accurately predict votes, although there are differences between positive and negative votes. For instance, the accuracy when predicting negative votes substantially increases by considering longer traces of the edit history. As for predicting controversial articles, we show that exploiting positive and negative interactions during the production of an article provides substantial improvements on previous attempts at detecting controversial articles in Wikipedia.
Slicepedia: providing customized reuse of open-web resources for adaptive hypermedia BIBAFull-Text 23-32
  Killian Levacher; Séamus Lawless; Vincent Wade
A key advantage of Adaptive Hypermedia Systems (AHS) is their ability to re-sequence and reintegrate content to satisfy particular user needs. However, this can require large volumes of content, with appropriate granularities and suitable meta-data descriptions. This represents a major impediment to the mainstream adoption of Adaptive Hypermedia. Open Adaptive Hypermedia systems have addressed this challenge by leveraging open corpus content available on the World Wide Web. However, the full reuse potential of such content is yet to be leveraged. Open corpus content is today still mainly available as only one-size-fits-all document-level information objects. Automatically customizing and right-fitting open corpus content with the aim of improving its amenability to reuse would enable AHS to more effectively utilise these resources.
   This paper presents a novel architecture and service called Slicepedia, which processes open corpus resources for reuse within AHS. The aim of this service is to improve the reuse of open corpus content by right-fitting it to the specific content requirements of individual systems. Complementary techniques from Information Retrieval, Content Fragmentation, Information Extraction and Semantic Web are leveraged to convert the original resources into information objects called slices. The service has been applied in an authentic language elearning scenario to validate the quality of the slicing and reuse. A user trial, involving language learners, was also conducted. The evidence clearly shows that the reuse of open corpus content in AHS is improved by this approach, with minimal decrease in the quality of the original content harvested.

Semantic data

Moving beyond SameAs with PLATO: partonomy detection for linked data BIBAFull-Text 33-42
  Prateek Jain; Pascal Hitzler; Kunal Verma; Peter Z. Yeh; Amit P. Sheth
The Linked Open Data (LOD) Cloud has gained significant traction over the past few years. With over 275 interlinked datasets across diverse domains such as life science, geography, politics, and more, the LOD Cloud has the potential to support a variety of applications ranging from open domain question answering to drug discovery.
   Despite its significant size (approx. 30 billion triples), the data is relatively sparely interlinked (approx. 400 million links). A semantically richer LOD Cloud is needed to fully realize its potential. Data in the LOD Cloud are currently interlinked mainly via the owl:sameAs property, which is inadequate for many applications. Additional properties capturing relations based on causality or partonomy are needed to enable the answering of complex questions and to support applications.
   In this paper, we present a solution to enrich the LOD Cloud by automatically detecting partonomic relationships, which are well-established, fundamental properties grounded in linguistics and philosophy. We empirically evaluate our solution across several domains, and show that our approach performs well on detecting partonomic properties between LOD Cloud data.
Foundations of traversal based query execution over linked data BIBAFull-Text 43-52
  Olaf Hartig; Johann-Christoph Freytag
Query execution over the Web of Linked Data has attracted much attention recently. A particularly interesting approach is link traversal based query execution which proposes to integrate the traversal of data links into the creation of query results. Hence -- in contrast to traditional query execution paradigms -- this does not assume a fixed set of relevant data sources beforehand; instead, the traversal process discovers data and data sources on the fly and, thus, enables applications to tap the full potential of the Web.
   While several authors have studied possibilities to implement the idea of link traversal based query execution and to optimize query execution in this context, no work exists that discusses theoretical foundations of the approach in general. Our paper fills this gap.
   We introduce a well-defined semantics for queries that may be executed using a link traversal based approach. Based on this semantics we formally analyze properties of such queries. In particular, we study the computability of queries as well as the implications of querying a potentially infinite Web of Linked Data. Our results show that query computation in general is not guaranteed to terminate and that for any given query it is undecidable whether the execution terminates. Furthermore, we define an abstract execution model that captures the integration of link traversal into the query execution process. Based on this model we prove the soundness and completeness of link traversal based query execution and analyze an existing implementation approach.
Building enriched web page representations using link paths BIBAFull-Text 53-62
  Tim Weninger; ChengXiang Zhai; Jiawei Han
Anchor text has a history of enriching documents for a variety of tasks within the World Wide Web. Anchor texts are useful because they are similar to typical Web queries, and because they express the document's context. Therefore, it is a common practice for Web search engines to incorporate incoming anchor text into the document's standard textual representation. However, this approach will not suffice for documents with very few inlinks, and it does not incorporate the document's full context. To mediate these problems, we employ link paths, which contain anchor texts from paths through the Web ending at the document in question. We propose and study several different ways to aggregate anchor text from link paths, and we show that the information from link paths can be used to (1) improve known item search in site-specific search, and (2) map Web pages to database records. We rigorously evaluate our proposed approach on several real world test collections. We find that our approach significantly improves performance over baseline and existing techniques in both tasks.

Adaptive hypertext and hypermedia

Navigational efficiency of broad vs. narrow folksonomies BIBAFull-Text 63-72
  Denis Helic; Christian Körner; Michael Granitzer; Markus Strohmaier; Christoph Trattner
Although many social tagging systems share a common tripartite graph structure, the collaborative processes that are generating these structures can differ significantly. For example, while resources on Delicious are usually tagged by all users who bookmark the web page cnn.com, photos on Flickr are usually tagged just by a single user who uploads the photo. In the literature, this distinction has been described as a distinction between broad vs. narrow folksonomies. This paper sets out to explore navigational differences between broad and narrow folksonomies in social hypertextual systems. We study both kinds of folksonomies on a dataset provided by Mendeley -- a collaborative platform where users can annotate and organize scientific articles with tags. Our experiments suggest that broad folksonomies are more useful for navigation, and that the collaborative processes that are generating folksonomies matter qualitatively. Our findings are relevant for system designers and engineers aiming to improve the navigability of social tagging systems.
Measuring the influence of tag recommenders on the indexing quality in tagging systems BIBAFull-Text 73-82
  Klaas Dellschaft; Steffen Staab
In this paper, we investigate a methodology for measuring the influence of tag recommenders on the indexing quality in collaborative tagging systems. We propose to use the inter-resource consistency as an indicator of indexing quality. The inter-resource consistency measures the degree to which the tag vectors of indexed resources reflect how the users understand the resources. We use this methodology for evaluating how tag recommendations coming from (1) the popular tags at a resource or from (2) the user's own vocabulary influence the indexing quality. We show that recommending popular tags decreases the indexing quality and that recommending the user's own vocabulary increases the indexing quality.
Short links under attack: geographical analysis of spam in a URL shortener network BIBAFull-Text 83-88
  Florian Klien; Markus Strohmaier
URL shortener services today have come to play an important role in our social media landscape. They direct user attention and disseminate information in online social media such as Twitter or Facebook. Shortener services typically provide short URLs in exchange for long URLs. These short URLs can then be shared and diffused by users via online social media, e-mail or other forms of electronic communication. When another user clicks on the shortened URL, she will be redirected to the underlying long URL. Shortened URLs can serve many legitimate purposes, such as click tracking, but can also serve illicit behavior such as fraud, deceit and spam. Although usage of URL shortener services today is ubiquitous, our research community knows little about how exactly these services are used and what purposes they serve. In this paper, we study usage logs of a URL shortener service that has been operated by our group for more than a year. We expose the extent of spamming taking place in our logs, and provide first insights into the planetary-scale of this problem. Our results are relevant for researchers and engineers interested in understanding the emerging phenomenon and dangers of spamming via URL shortener services.

Hypertext and narrative connections

Storyspace: a story-driven approach for creating museum narratives BIBAFull-Text 89-98
  Annika Wolff; Paul Mulholland; Trevor Collins
In a curated exhibition of a museum or art gallery, a selection of heritage objects and associated information is presented to a visitor for the purpose of telling a story about them. The same underlying story can be presented in a number of different ways. This paper describes techniques for creating multiple alternative narrative structures from a single underlying story, by selecting different organising principles for the events and plot structures of the story. These authorial decisions can produce different dramatic effects. Storyspace is a web interface to an ontology for describing curatorial narratives. We describe how the narrative component of the Storyspace software can produce multiple narratives from the underlying stories and plots of curated exhibitions. Based on the curator's choice, the narrative module suggests a coherent ordering for the events of a story and its associated heritage objects. Narratives constructed through Storyspace can be tailored to suit different audiences and can be presented in different forms, such as physical exhibitions, museum tours, leaflets and catalogues, or as online experiences.
Story/story BIBAFull-Text 99-102
  David A. Kolb
This paper starts with an introductory essay stating the issues and discussing the notion of metafiction. Then it continues in an online hypertext narrative demonstration of the interweaving of story and meta-story. The hypertext attempts to show in action how seemingly unified narratives and narrative voices are surrounded and influenced by other voices and meta-stories. No narrative is un-mediated and no narrative voice is alone. The hypertext concludes with some musings on the complexities of narrative reading and writing, also with counterpoint voices. Throughout, the text comments on issues about the reading and writing of hypertext narratives.
The paradox of rereading in hypertext fiction BIBAFull-Text 103-112
  Alex Mitchell; Kevin McGee
Rereading often involves reading the same thing again to see something new. This paradox becomes more pronounced in an interactive story, where a reader's choices can literally change what the reader sees in each reading. There has been some discussion of rereading in both non-interactive and interactive stories. There has not, however, been any detailed study of what readers think they are doing as they reread hypertext fiction that changes dynamically as the result of reader choice. An understanding of this would help authors/designers of hypertext fiction create better hypertext that is explicitly intended to encourage rereading.
   To explore this issue, we conducted semi-structured interviews with participants who repeatedly read a complex hypertext fiction. Participants had trouble describing what they were doing as "rereading", and were looking for either the text, or their understanding of the story, to remain constant between readings. This difficulty highlights the paradoxical nature of rereading in interactive stories, and suggests the need for further research into this phenomenon.

Engelbart/Nelson award nominees

Evaluating tag-based information access in image collections BIBAFull-Text 113-122
  Christoph Trattner; Yi-ling Lin; Denis Parra; Zhen Yue; William Real; Peter Brusilovsky
The availability of social tags has greatly enhanced access to information. Tag clouds have emerged as a new "social" way to find and visualize information, providing both one-click access to information and a snapshot of the "aboutness" of a tagged collection. A range of research projects explored and compared different tag artifacts for information access ranging from regular tag clouds to tag hierarchies. At the same time, there is a lack of user studies that compare the effectiveness of different types of tag-based browsing interfaces from the users point of view. This paper contributes to the research on tag-based information access by presenting a controlled user study that compared three types of tag-based interfaces on two recognized types of search tasks -- lookup and exploratory search. Our results demonstrate that tag-based browsing interfaces significantly outperform traditional search interfaces in both performance and user satisfaction. At the same time, the differences between the two types of tag-based browsing interfaces explored in our study are not as clear.
Understanding factors that affect response rates in Twitter BIBAFull-Text 123-132
  Giovanni Comarela; Mark Crovella; Virgilio Almeida; Fabricio Benevenuto
In information networks where users send messages to one another, the issue of information overload naturally arises: which are the most important messages? In this paper we study the problem of understanding the importance of messages in Twitter. We approach this problem in two stages. First, we perform an extensive characterization of a very large Twitter dataset which includes all users, social relations, and messages posted from the beginning of the service up to August 2009. We show evidence that information overload is present: users sometimes have to search through hundreds of messages to find those that are interesting to reply or retweet. We then identify factors that influence user response or retweet probability: previous responses to the same tweeter, the tweeter's sending rate, the age and some basic text elements of the tweet. In our second stage, we show that some of these factors can be used to improve the presentation order of tweets to the user. First, by inspecting user activity over time, we construct a simple on-off model of user behavior that allows us to infer when a user is actively using Twitter. Then, we explore two methods from machine learning for ranking tweets: a Naive Bayes predictor and a Support Vector Machine classifier. We show that it is possible to reorder tweets to increase the fraction of replied or retweeted messages appearing in the first p positions of the list by as much as 50-60%.
Graph and matrix metrics to analyze ergodic literature for children BIBAFull-Text 133-142
  Eugenia-Maria Kontopoulou; Maria Predari; Thymios Kostakis; Efstratios Gallopoulos
What can graph and matrix based mathematical models tell us about ergodic literature? A digraph of storylets connected by links and the corresponding adjacency matrix encoding is used to formulate some queries regarding hypertexts of this type. It is reasoned that the Google random surfer provides a useful model for the behavior of the reader of such fiction. This motivates the use of graph and Web based metrics for ranking storylets and some other tasks. A dataset, termed childif, based on printed books from three series popular with children and young adults and its characteristics are described. Two link-based metrics, SMrank and versions of PageRank, are described and applied on childif to rank storylets. It is shown that several characteristics of these stories can be expressed as and computed with matrix operations. An interpretation of the ranking results is provided. Results on some acyclic digraphs indicate that the rankings convey useful information regarding plot development. In conclusion, using matrix and graph theoretic techniques one can extract useful information from this type of ergodic literature that would be harder to obtain by simply reading it or by examining the underlying digraph.

Keynote address

Human navigation in networks BIBAFull-Text 143-144
  Jure Leskovec
World around us interconnected in giant networks and we are daily navigating and finding paths through such networks. For example, we browse the Web [2], search for connections among friends in social networks, follow leads in citation networks [6,3], of scientific literature, and look up things in cross-referenced dictionaries and encyclopedias. Even though navigating networks is an essential part of our everyday lives, little is known about the mechanisms humans use to navigate networks as well as the properties of networks that allow for efficient navigation.
   We conduct two large scale studies of human navigation in networks. First, we present a study an instance of Milgram's small-world experiment where the task is to navigate from a given source to a given target node using only the local network information [5]. We perform a computational analysis of a planetary-scale social network of 240 million people and 1.3 billion edges and investigate the importance of geographic cues for navigating the network. Second, we also discuss a large-scale study of human wayfinding, in which, given a network of links between the concepts of Wikipedia, people play a game of finding a short path from a given start to a given target concept by following hyperlinks (Figure 1) [7]. We study more than 30,000 goal-directed human search paths through Wikipedia network and identify strategies people use when navigating information spaces.
   Even though the domains of social and information networks are very different, we find many commonalities in navigation of the two networks. Humans tend to be good at finding short paths, despite the fact that the networks are very large [8]. Human paths differ from shortest paths in characteristic ways. At the early stages of the search navigating to a high-degree hub node helps, while in the later stage, content features and geography provide the most important clues. We also observe a trade-off between simplicity and efficiency: conceptually simple solutions are more common but tend to be less efficient than more complex ones [9].
   One potential reason for good human performance could be that humans possess vast amounts of background knowledge about the network, which they leverage to make good guesses about possible paths. So we ask the question: Are human-like high-level reasoning skills really necessary for finding short paths? To answer this question, we design a number of navigation agents without such skills, which use only simple numerical features [8]. We evaluate the agents on the task of navigating both networks. We observe that the agents find shorter paths than humans on average and therefore conclude that, perhaps surprisingly, no sophisticated background knowledge or high-level reasoning is required for navigating a complex network.

Social media

TrustSplit: usable confidentiality for social network messaging BIBAFull-Text 145-154
  Sascha Fahl; Marian Harbach; Thomas Muders; Matthew Smith
It is well known that online social networking sites (OSNs) such as Facebook pose risks to their users' privacy. OSNs store vast amounts of users' private data and activities and therefore subject the user to the risk of undesired disclosure. The regular non tech-savvy Facebook user either has little awareness of his privacy needs or is not willing or capable to invest much extra effort into securing his online activities.
   In this paper, we present a non-disruptive and easy to-use service that helps to protect users' most private information, namely their private messages and chats against the OSN provider itself and external adversaries. Our novel Confidentiality as a Service paradigm was designed with usability and non-obtrusiveness in mind and requires little to no additional knowledge on the part of the users. The simplicity of the service is achieved through a novel trust splitting approach integrated into the Confidentiality as a Service paradigm. To show the feasibility of our approach we present a fully-working prototype for Facebook and an initial usability study. All of the participating subjects completed the study successfully without any problems or errors and only required three minutes on average for the entire installation and setup procedure.
Maximizing circle of trust in online social networks BIBAFull-Text 155-164
  Yilin Shen; Yu-Song Syu; Dung T. Nguyen; My T. Thai
As an imperative channel for fast information propagation, Online Social Networks (OSNs) also have their defects. One of them is the information leakage, i.e., information could be spread via OSNs to the users whom we are not willing to share with. Thus the problem of constructing a circle of trust to share information with as many friends as possible without further spreading it to unwanted targets has become a challenging research topic but still remained open.
   Our work is the first attempt to study the Maximum Circle of Trust problem seeking to share the information with the maximum expected number of poster's friends such that the information spread to the unwanted targets is brought to its knees. First, we consider a special and more practical case with the two-hop information propagation and a single unwanted target. In this case, we show that this problem is NP-hard, which denies the existence of an exact polynomial-time algorithm. We thus propose a Fully Polynomial-Time Approximation Scheme (FPTAS), which can not only adjust any allowable performance error bound but also run in polynomial time with both the input size and allowed error. FPTAS is the best approximation solution one can ever wish for an NP-hard problem. We next consider the number of unwanted targets is bounded and prove that there does not exist an FPTAS in this case. Instead, we design a Polynomial-Time Approximation Scheme (PTAS) in which the allowable error can also be controlled. Finally, we consider a general case with many hops information propagation and further show its #P-hardness and propose an effective Iterative Circle of Trust Detection (ICTD) algorithm based on a novel greedy function. An extensive experiment on various real-word OSNs has validated the effectiveness of our proposed approximation and ICTD algorithms.
Cheap, easy, and massively effective viral marketing in social networks: truth or fiction? BIBAFull-Text 165-174
  Thang N. Dinh; Dung T. Nguyen; My T. Thai
Online social networks (OSNs) have become one of the most effective channels for marketing and advertising. Since users are often influenced by their friends, "word-of-mouth" exchanges so-called viral marketing in social networks can be used to increases product adoption or widely spread content over the network. The common perception of viral marketing about being cheap, easy, and massively effective makes it an ideal replacement of traditional advertising. However, recent studies have revealed that the propagation often fades quickly within only few hops from the sources, counteracting the assumption on the self-perpetuating of influence considered in literature. With only limited influence propagation, is massively reaching customers via viral marketing still affordable? How to economically spend more resources to increase the spreading speed?
   We investigate the cost-effective massive viral marketing problem, taking into the consideration the limited influence propagation. Both analytical analysis based on power-law network theory and numerical analysis demonstrate that the viral marketing might involve costly seeding. To minimize the seeding cost, we provide mathematical programming to find optimal seeding for medium-size networks and propose VirAds, an efficient algorithm, to tackle the problem on large-scale networks. VirAds guarantees a relative error bound of O(1) from the optimal solutions in power-law networks and outperforms the greedy heuristics which realizes on the degree centrality. Moreover, we also show that, in general, approximating the optimal seeding within a ratio better than O(log n) is unlikely possible.

Social media & hypertext

Graph data partition models for online social networks BIBAFull-Text 175-180
  Prima Chairunnanda; Simon Forsyth; Khuzaima Daudjee
Online social networks have become important vehicles for connecting people for work and leisure. As these networks grow, data that are stored over these networks also grow, and management of these data becomes a challenge. Graph data models are a natural fit for representing online social networks but need to support distribution to allow the associated graph databases to scale while offering acceptable performance. We provide scalability by considering methods for partitioning graph databases and implement one within the Neo4j architecture based on distributing the vertices of the graph. We evaluate its performance in several simple scenarios and demonstrate that it is possible to partition a graph database without incurring significant overhead other than that required by network delays. We identify and discuss several methods to reduce the observed network delays in our prototype.
Exploring (the poetics of) strange (and fractal) hypertexts BIBAFull-Text 181-186
  Charlie Hargood; Rosamund Davies; David E. Millard; Matt R. Taylor; Samuel Brooker
The ACM Hypertext conference has a rich history of challenging the node-link hegemony of the web. At Hypertext 2011 Pisarski [12] suggested that to refocus on nodes in hypertext might unlock a new poetics, and at Hypertext 2001 Bernstein [3] lamented the lack of strange hypertexts: playful tools that experiment with hypertext structure and form. As part of the emerging Strange Hypertexts community project we have been exploring a number of exotic hypertext tools, and in this paper we set out an early experiment with media and creative writing undergraduates to see what effect one particular form -- Fractal Narratives, a hypertext where readers drill down into text in a reoccurring pattern -- would have on their writing. In this particular trial, we found that most students did not engage in the structure from a storytelling point of view, although they did find value from a planning point of view. Participants conceptually saw the value in non-linear storytelling but few exploited the fractal structure to actually do this. Participant feedback leads us to conclude that while new poetics do emerge from strange hypertexts, this should be viewed as an ongoing process that can be reinforced and encouraged by designing tools that highlight and support those emerging poetics in a series of feedback loops, and by providing writing contexts where they can be highlighted and collaboratively explored.
Content vs. context for sentiment analysis: a comparative analysis over microblogs BIBAFull-Text 187-196
  Fotis Aisopos; George Papadakis; Konstantinos Tserpes; Theodora Varvarigou
Microblog content poses serious challenges to the applicability of traditional sentiment analysis and classification methods, due to its inherent characteristics. To tackle them, we introduce a method that relies on two orthogonal, but complementary sources of evidence: content-based features captured by n-gram graphs and context-based ones captured by polarity ratio. Both are language-neutral and noise-tolerant, guaranteeing high effectiveness and robustness in the settings we are considering. To ensure our approach can be integrated into practical applications with large volumes of data, we also aim at enhancing its time efficiency: we propose alternative sets of features with low extraction cost, explore dimensionality reduction and discretization techniques and experiment with multiple classification algorithms. We then evaluate our methods over a large, real-world data set extracted from Twitter, with the outcomes indicating significant improvements over the traditional techniques.

Social media

Evaluation of a domain-aware approach to user model interoperability BIBAFull-Text 197-206
  Eddie Walsh; Alexander O'Connor; Vincent Wade
It is becoming increasingly important to facilitate the integrated management of user information. Exchanging user information across heterogeneous systems has many benefits, particularly in enhancing the quality and quantity of user information available for personalization. One common approach to user model interoperability is the use of mapping tools to manually build rich executable mappings between user models. A key problem with existing approaches is that the mapping tools are often too generic for these specialized tasks and do not provide any support to an administrator mapping in a specific domain such as user models. This paper presents a novel approach to user model interoperability which lowers the complexity and provides support to administrators in completing user model mappings. The domain-aware approach to user model interoperability incorporates interchangeable domain knowledge directly into the integration tools. This approach was implemented in a system called FUMES which is a mapping creation and execution environment that includes two domain-aware mechanisms; a canonical user model and user model mapping types. FUMES was deployed in an integration of existing user models and the domain-aware approach was then evaluated in a user study. The evaluation consisted of a direct comparison with a generic approach to user model interoperability which was applied using the commercial mapping tool, Altova Mapforce. The results of this evaluation demonstrate improvements in mapping accuracy and usability when using the domain-aware approach compared to the generic mapping approach.
Learning user characteristics from social tagging behavior BIBAFull-Text 207-212
  Karin Schöfegger; Christian Körner; Philipp Singer; Michael Granitzer
In social tagging systems the tagging activities of users leave a huge amount of implicit information about them. The users choose tags for the resources they annotate based on their interests, background knowledge, personal opinion and other criteria. Whilst existing research in mining social tagging data mostly focused on gaining a deeper understanding of the user's interests and the emerging structures in those systems, little work has yet been done to use the rich implicit information in tagging activities to unveil to what degree users' tags convey information about their background. The automatic inference of user background information can be used to complete user profiles which in turn supports various recommendation mechanisms. This work illustrates the application of supervised learning mechanisms to analyze a large online corpus of tagged academic literature for extraction of user characteristics from tagging behavior. As a representative example of background characteristics we mine the user's research discipline. Our results show that tags convey rich information that can help designers of those systems to better understand and support their prolific users -- users that tag actively -- beyond their interests.
Detecting overlapping communities in folksonomies BIBAFull-Text 213-218
  Abhijnan Chakraborty; Saptarshi Ghosh; Niloy Ganguly
Folksonomies like Delicious and LastFm are modeled as tripartite (user-resource-tag) hypergraphs for studying their network properties. Detecting communities of similar nodes from such networks is a challenging problem. Most existing algorithms for community detection in folksonomies assign unique communities to nodes, whereas in reality, users have multiple topical interests and the same resource is often tagged with semantically different tags. The few attempts to detect overlapping communities work on projections of the hypergraph, which results in significant loss of information contained in the original tripartite structure. We propose the first algorithm to detect overlapping communities in folksonomies using the complete hypergraph structure. Our algorithm converts a hypergraph into its corresponding line-graph, using measures of hyperedge similarity, whereby any community detection algorithm on unipartite graphs can be used to produce overlapping communities in the folksonomy. Through extensive experiments on synthetic as well as real folksonomy data, we demonstrate that the proposed algorithm can detect better community structures as compared to existing state-of-the-art algorithms for folksonomies.

Semantic data

Predicting semantic annotations on the real-time web BIBAFull-Text 219-228
  Elham Khabiri; James Caverlee; Krishna Y. Kamath
The explosion of the real-time web has spurred a growing need for new methods to organize, monitor, and distill relevant information from these large-scale social streams. One especially encouraging development is the self-curation of the real-time web via user-driven linking, in which users annotate their own status updates with lightweight semantic annotations -- or hashtags. Unfortunately, there is evidence that hashtag growth is not keeping pace with the growth of the overall real-time web. In a random sample of 3 million tweets, we find that only 10.2% contain at least one hashtag. Hence, in this paper we explore the possibility of predicting hashtags for un-annotated status updates. Toward this end, we propose and evaluate a graph-based prediction framework. Three of the unique features of the approach are: (i) a path aggregation technique for scoring the closeness of terms and hashtags in the graph; (ii) pivot term selection, for identifying high value terms in status updates; and (iii) a dynamic sliding window for recommending hashtags reflecting the current status of the real-time web. Experimentally we find encouraging results in comparison with Bayesian and data mining-based approaches.
Understanding and leveraging tag-based relations in on-line social networks BIBAFull-Text 229-238
  Marek Lipczak; Borkur Sigurbjornsson; Alejandro Jaimes
In most social networks, measuring similarity between users is crucial for providing new functionalities, understanding the dynamics of such networks, and growing them (e.g., people you may know recommendations depend on similarity, as does link prediction). In this paper, we study a large sample of Flickr user actions and compare tags across different explicit and implicit network relations. In particular, we compare tag similarities in explicit networks (based on contact, friend, and family links), and implicit networks (created by actions such as comments and selecting favorite photos). We perform an in-depth analysis of these five types of links specifically focusing on tagging, and compare different tag similarity metrics. Our motivation is that understanding the differences in such networks, as well as how different similarity metrics perform, can be useful in similarity-based recommendation applications (e.g., collaborative filtering), and in traditional social network analysis problems (e.g., link prediction). We specifically show that different types of relationships require different similarity metrics. Our findings could lead to the construction of better user models, among others.
Using the overlapping community structure of a network of tags to improve text clustering BIBAFull-Text 239-244
  Nuno Cravino; José Devezas; Álvaro Figueira
Breadcrumbs is a folksonomy of news clips, where users can aggregate fragments of text taken from online news. Besides the textual content, each news clip contains a set of metadata fields associated with it. User-defined tags are one of the most important of those information fields. Based on a small data set of news clips, we build a network of co-occurrence of tags in news clips, and use it to improve text clustering. We do this by defining a weighted cosine similarity proximity measure that takes into account both the clip vectors and the tag vectors. The tag weight is computed using the related tags that are present in the discovered community. We then use the resulting vectors together with the new distance metric, which allows us to identify socially biased document clusters. Our study indicates that using the structural features of the network of tags leads to a positive impact in the clustering process.

Engelbart/Nelson award nominees

Anatomy of a conference BIBAFull-Text 245-254
  Bjoern-Elmar Macek; Christoph Scholz; Martin Atzmueller; Gerd Stumme
This paper presents an anatomy of Hypertext 2011 -- focusing on the dynamic and static behavior of the participants. We consider data collected by the CONFERATOR system at the conference, and provide statistics concerning participants, presenters, session chairs, different communities, and according roles. Additionally, we perform an in-depth analysis of these actors during the conference concerning their communication and track visiting behavior.
Diversity dynamics in online networks BIBAFull-Text 255-264
  Jérôme Kunegis; Sergej Sizov; Felix Schwagereit; Damien Fay
Diversity is an important characterization aspect for online social networks that usually denotes the homogeneity of a network's content and structure. This paper addresses the fundamental question of diversity evolution in large-scale online communities over time. In doing so, we study different established notions of network diversity, based on paths in the network, degree distributions, eigenvalues, cycle distributions, and control models. This leads to five appropriate characteristic network statistics that capture corresponding aspects of network diversity: effective diameter, Gini coefficient, fractional network rank, weighted spectral distribution, and number of driver nodes of a network. Consequently, we present and discuss comprehensive experiments with a broad range of directed, undirected, and bipartite networks from several different network categories -- including hyperlink, interaction, and social networks. An important general observation is that network diversity shrinks over time. From the conceptual perspective, our work generalizes previous work on shrinking network diameters, putting it in the context of network diversity. We explain our observations by means of established network models and introduce the novel notion of eigenvalue centrality preferential attachment.
An evaluation of tailored web materials for public administration BIBAFull-Text 265-274
  Nathalie Colineau; Cécile Paris; Keith Vander Linden
Public Administration organizations generally write their citizen-focused, informational materials for generic audiences because they don't have the resources to produce personalized materials for everyone. The goal of this project is to replace these generic materials, which must include careful discussions of the conditions distinguishing the various constituencies within the generic audience, with tailored materials, which can be automatically personalized to focus on the information relevant to an individual reader. Two key questions must be addressed. First, are the automatically produced, tailored forms more effective than the generic forms they replace, and second, is the time the reader spends specifying the demographic information on which the tailoring is based too costly to be worth the effort. This paper describes an adaptive hypermedia application that produces tailored materials for students exploring government educational entitlement programs, and focuses in particular on the effectiveness of the generated tailored material.

Social media

Early detection of buzzwords based on large-scale time-series analysis of blog entries BIBAFull-Text 275-284
  Shinsuke Nakajima; Jianwei Zhang; Yoichi Inagaki; Reyn Nakamoto
In this paper, we discuss a method for early detection of "gradual buzzwords" by analyzing time-series data of blog entries. We observe the process in which certain topics grow to become major buzzwords and determine the key indicators that are necessary for their early detection. From the analysis results based on 81,922,977 blog entries from 3,776,154 blog websites posted in the past two years, we find that as topics grow to become major buzzwords, the percentages of blog entries from the blogger communities closely related to the target buzzword decrease gradually, and the percentages of blog entries from the weakly related blogger communities increase gradually. We then describe a method for early detection of these buzzwords, which is dependent on identifying the blogger communities which are closely related to these buzzwords. Moreover, we verify the effectiveness of the proposed method through experimentation that compares the rankings of several buzzword candidates with a real-life idol group popularity competition.
Semantics + filtering + search = twitcident. exploring information in social web streams BIBAFull-Text 285-294
  Fabian Abel; Claudia Hauff; Geert-Jan Houben; Richard Stronkman; Ke Tao
Automatically filtering relevant information about a real-world incident from Social Web streams and making the information accessible and findable in the given context of the incident are non-trivial scientific challenges. In this paper, we engineer and evaluate solutions that analyze the semantics of Social Web data streams to solve these challenges. We introduce Twitcident, a framework and Web-based system for filtering, searching and analyzing information about real-world incidents or crises. Given an incident, our framework automatically starts tracking and filtering information that is relevant for the incident from Social Web streams and Twitter particularly. It enriches the semantics of streamed messages to profile incidents and to continuously improve and adapt the information filtering to the current temporal context. Faceted search and analytical tools allow people and emergency services to retrieve particular information fragments and overview and analyze the current situation as reported on the Social Web.
   We put our Twitcident system into practice by connecting it to emergency broadcasting services in the Netherlands to allow for the retrieval of relevant information from Twitter streams for any incident that is reported by those services. We conduct large-scale experiments in which we evaluate (i) strategies for filtering relevant information for a given incident and (ii) search strategies for finding particular information pieces. Our results prove that the semantic enrichment offered by our framework leads to major and significant improvements of both the filtering and the search performance. A demonstration is available via: http://wis.ewi.tudelft.nl/twitcident/
Finding and exploring memes in social media BIBAFull-Text 295-304
  Hohyon Ryu; Matthew Lease; Nicholas Woodward
Online critical literacy challenges readers to recognize and question how online textual information has been shaped by its greater context. While comparing information from multiple sources provides a foundation for such awareness, keeping pace with everything being written is a daunting proposition, especially for the casual reader. We propose a new form of technological assistance for critical literacy which automatically discovers and displays underlying memes: ideas represented by similar phrases which occur across diýerent information sources. By surfacing these memes to users, we create a rich hypertext representation in which underlying memes can be explored in context. Given the vast scale of social media, we describe a highly-scalable system architecture designed for MapReduce distributed computing. To validate our approach, we report on use of our system to discover and browse memes in a 1.5 TB collection of crawled social media. Our primary contributions include: 1) a novel technological approach and hypertext browsing design for supporting critical literacy; and 2) a highly-scalable system architecture for meme discovery, providing a solid foundation for further system extensions and refinements.

Posters

On the rise of artificial trending topics in Twitter BIBAFull-Text 305-306
  Raquel Recuero; Ricardo Araujo
We present a quanti-qualitative research about Trending Topics in Twitter. Our goal was to investigate how social networks can interfere in Trending Topics seeking for visibility and based on social capital, using bridging and bonding ties. We collected, analyzed and classified 460 topics from the Brazilian Trending Topics' List and the social networks associated to 40 of those. Our results point to two types of topics: artificial topics, created by groups of users consciously acting to put their message among the Trending Topics, usually to make statements and gain visibility to their causes; and organic topics, which emerge without effortful coordination by a group of people. While organic topics rely on values such as novelty and spread through bridging ties, artificial topics are based on bonding ties, with associated values such as engagement, cooperation and trust among the actors.
QualityRank: assessing quality of wikipedia articles by mutually evaluating editors and texts BIBAFull-Text 307-308
  Yu Suzuki; Masatoshi Yoshikawa
In this paper, we propose a method to identify high-quality Wikipedia articles by mutually evaluating editors and texts. A major approach for assessing articles using edit history is a text survival ratio based approach. However, the problem is that many high-quality articles are identified as low quality, because many vandals delete high-quality texts, then the survival ratios of high-quality texts are decreased by vandals. Our approach's strongest point is its resistance to vandalism. Using our method, if we calculate text quality values using editor quality values, vandals do not affect any quality values of the other editors, then the accuracy of text quality values should improve. However, the problem is that editor quality values are calculated by text quality values, and text quality values are calculated by editor quality values. To solve this problem, we mutually calculate editor and text quality values until they converge. Using this method, we can calculate a quality value of a text that takes into consideration that of its editors.
A real-time architecture for detection of diseases using social networks: design, implementation and evaluation BIBAFull-Text 309-310
  Mustafa Sofean; Matthew Smith
In this work we developed a surveillance architecture to detect diseases-related postings in social networks using Twitter as an example for a high-traffic social network. Our real-time architecture uses Twitter streaming API to crawl Twitter messages as they are posted. Data mining techniques have been used to index, extract and classify postings. Finally, we evaluate the performance of the classifier with a dataset of public health postings and also evaluate the run-time performance of whole system with respect to latency and throughput.
SHI3LD: an access control framework for the mobile web of data BIBAFull-Text 311-312
  Luca Costabello; Serena Villata; Nicolas Delaforge; Fabien Gandon
We present Shi3ld, a context-aware access control framework for consuming the Web of Data from mobile devices.
Adaptive spatial hypermedia in computational journalism BIBAFull-Text 313-314
  Luis Francisco-Revilla; Alvaro Figueira
Computational journalism allows journalists to collect large collections of information chunks from separate sources. The analysis of these collections can reveal hidden relationships between of relationships, but due to their size, diversity, and varying nuances it is necessary to use both computational and human analysis. Breadcrumbs PDL is an adaptive spatial hypermedia system that brings together human cognition and machine computation in order to analyze a collection of user-generated news clips. The project demonstrates the effectiveness of spatial hypermedia in the domain of computational journalism.
Structuring folksonomies with implicit tag relations BIBAFull-Text 315-316
  Florian Matthes; Christian Neubert; Alexander Steinhoff
Tagging systems allow users to assign arbitrary text labels (i.e., tags) to various types of resources, such as photos or web pages, to facilitate future retrieval and selective sharing of contents. The resulting system of classification is referred to as a folksonomy. The uncontrolled nature of tags leads to inconsistencies in the usage of terms which impairs the utility of the system. Approaches to this problem that map tags to concepts of external knowledge representations, such as ontologies, are often inapplicable since they require that corresponding concepts exist and that they reflect the meaning of tags as intended by the users. In this paper, we present the notion of implicit tag relations. Our aim is to improve the accessibility of contents in tagging systems without significantly reducing the flexibility and universal applicability of tags. Instead of explicitly relating tags to each other, we propose to give users the ability to retroactively alter folksonomies by changing the tags of many resources with a single operation. This way, the usage of tags can be harmonized and it can be controlled how they are used in combination. We highlight the benefits of our approach compared to explicit tag relations and discuss important implications as well as its limitations.
Following the follower: detecting communities with common interests on Twitter BIBAFull-Text 317-318
  Kwan Hui Lim; Amitava Datta
We propose an efficient approach for detecting communities that share common interests on Twitter, based on linkages among followers of celebrities representing an interest category. This approach differs from existing ones that detects all communities before determining the interest of these communities, a computationally intensive process given the large scale of online social networks. In addition, we also study the characteristics of these communities and the effects of deepening or specialization of interest.
Towards real-time summarization of scheduled events from Twitter streams BIBAFull-Text 319-320
  Arkaitz Zubiaga; Damiano Spina; Enrique Amigó; Julio Gonzalo
We deal with shrinking the stream of tweets for scheduled events in real-time, following two steps: (i) sub-event detection, which determines if something new has occurred, and (ii) tweet selection, which picks a tweet to describe each sub-event. By comparing summaries in three languages to live reports by journalists, we show that simple text analysis methods which do not involve external knowledge lead to summaries that cover 84% of the sub-events on average, and 100% of key types of sub-events (such as goals in soccer).
Linked open corpus models, leveraging the semantic web for adaptive hypermedia BIBAFull-Text 321-322
  Ian O'Keeffe; Alexander O'Connor; Philip Cass; Séamus Lawless; Vincent Wade
Despite the recent interest in extending Adaptive Hypermedia beyond the closed corpus domain and into the open corpus world of the web, many current approaches are limited by their reliance on closed metadata model repositories. The need to produce large quantities of high quality metadata is an expensive task which results in silos of high quality metadata. These silos are often underutilized due to the proprietary nature of the content described by the metadata and the perceived value of the metadata itself. Meanwhile, the Linked Open Data movement is promoting a pragmatic approach to exposing, sharing and connecting pieces of machine-readable data and knowledge on the WWW using an agreed set of best practices. In this paper we identify the potential issues that arise from building personalization systems based on Linked Open Data.
A gender based study of tagging behavior in Twitter BIBAFull-Text 323-324
  Evandro Cunha; Gabriel Magno; Virgilio Almeida; Marcos André Gonçalves; Fabricio Benevenuto
Gender plays a key role in the process of language variation. Men and women use language in different ways, according to the expected behavior patterns associated with their status in the communities. In this paper, we present a first description of gender distinctions in the usage of Twitter hashtags. After analyzing data collected from more than 650,000 tagged tweets concerning three different subjects, we concluded that gender can be considered a social factor that influences the user's choice of particular hashtags about a given topic. This study aims to increase knowledge about human behavior in free tagging environments and may be useful to the development of tag recommendation systems based on users' collective preferences.
Query prediction with context models for populating personal linked data caches BIBAFull-Text 325-326
  Olaf Hartig; Tom Heath
The emergence of a Web of Linked Data [2] enables new forms of application that require expressive query access, for which mature, Web-scale information retrieval techniques may not be suited. Rather than attempting to deliver expressive query capabilities at Web-scale, we propose the use of smaller, pre-populated data caches whose contents are personalized to the needs of an individual user. Such caches can act as personal data stores supporting a range of different applications. Furthermore, we discuss a user evaluation which demonstrates that our approach can accurately predict queries and their execution probability, thereby optimizing the cache population process. In this paper we formally introduce a strategy for predicting queries that can then be used to inform an a priori population of a personal cache of Linked Data harvested from Web. Based on a comprehensive user evaluation we demonstrate that our approach can accurately predict queries and their execution probability, thereby optimizing the cache population process.