HCI Bibliography Home | HCI Conferences | HCM Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
HCM Tables of Contents: 060708

Proceedings of the 2006 ACM International Workshop on Human-Centered Multimedia

Fullname:HCM'06: Proceedings of the 1st ACM International Workshop on Human-Centered Multimedia
Editors:Daniel Gatica-Perez; Alejandro Jaimes; Nicu Sebe
Location:Santa Barbara, California, USA
Standard No:ISBN: 1-59593-500-2; ACM DL: Table of Contents hcibib: HCM06
  1. Invited presentation: interaction
  2. Invited presentation: content production
  3. Invited presentation: content analysis
  4. Regular contributions

Invited presentation: interaction

Human-centered collaborative interaction BIBAKFull-Text 1-8
  Paulo Barthelmess; Edward Kaiser; Rebecca Lunsford; David McGee; Philip Cohen; Sharon Oviatt
Recent years have witnessed an increasing shift in interest from single user multimedia/multimodal interfaces towards support for interaction among groups of people working closely together, e.g. during meetings or problem solving sessions. However, the introduction of technology to support collaborative practices has not been devoid of problems. It is not uncommon that technology meant to support collaboration may introduce disruptions and reduce group effectiveness.
   Human-centered multimedia and multimodal approaches hold a promise of providing substantially enhanced user experiences by focusing attention on human perceptual and motor capabilities, and on actual user practices. In this paper we examine the problem of providing effective support for collaboration, focusing on the role of human-centered approaches that take advantage of multimodality and multimedia. We show illustrative examples that demonstrate human-centered multimodal and multimedia solutions that provide mechanisms for dealing with the intrinsic complexity of human-human interaction support.
Keywords: design, guidelines, human-centered systems, multimedia, multimodal systems

Invited presentation: content production

Multimedia: is it always better? BIBAKFull-Text 9-10
  Nahum Gershon
As with almost every new medium and technology, we become enchanted with it and very easily take the proclaimed benefits of the new creation for granted. Slide presentation tools are one example. Since they became available, most of the presentations in the technical and professional communities have transitioned into using it. One of the reasons is that they make the preparation ("production") of presentations and presenting it seem easy. Making the content of the presentation understood and getting its messages across, however, is another matter. The big question with presentation tools is -- when is it better to use it and when other modes of presentation are more appropriate? Now that almost everyone can produce a multimedia presentation, a similar trend might be developing. This mindless transition, I feel, must be stopped. First and foremost, we need to understand what are the advantages and, yes, the disadvantages of multimedia. Since multimedia use and production are human centered activities, practical knowledge of how humans beings perceive, process information, and understand is essential to understanding these advantages and disadvantages of this medium. Once we know the advantages and disadvantages of multimedia, we should use it only when it offers advantages to a particular presentation over other media. Sometimes, we might find out that a simple oral (or even audio) presentation without a single visual might do the trick. Sometimes not. As Neal Postman pointed out, for example, it could be more difficult to effectively present a series of logical arguments using video than with text or oral deliberations. On other occasions, we might find out that a silent presentation of pictorial slides might deliver the message quite effectively. Multimedia is not only about presentation. It is also about production and thinking. As with writing or drawing, composing a multimedia vignette could help the creative and critical thinking process about a topic. This too needs an understanding of when multimedia is appropriate and when it's not. The developer community is not exempt from the need to develop this type of understanding. Without it, the tools will not be very useful. All of these communities, the multimedia users, the production crowd, and the multimedia developer community need to become more multimedia literate through training in school, college, and work or through a personal transformative quest. This, I believe, is essential yet possible.
Keywords: human-centered multimedia, multimedia advantages, multimedia disadvantages, multimedia literacy, multimedia presentation, multimedia production, thinking through multimedia production

Invited presentation: content analysis

Human-centered multimedia: representations and challenges BIBAKFull-Text 11-18
  Ahmed Elgammal
Human has always been a part of the computational loop. So, what do we mean by human-centered computing (HCC)? aren't humans always the focus of computations some how? The goal of this paper is to help answer this question within the context of multimedia applications. So, what do we mean by human-centered multimedia systems. We discuss some issues and challenges facing developing real human-centered multimedia applications.
Keywords: human-centered computing, multimedia systems

Regular contributions

What should be automated?: The fundamental question underlying human-centered computing BIBAKFull-Text 19-24
  Matti Tedre
In 1989 the ACM task force on the Core of Computer Science argued that "What can be (effectively) automated?" is "the fundamental question underlying all of computing". The task force's view of computing was a machine-oriented one; the task force recognized the theoretical, empirical, and design-oriented aspects of computer science. The question "What can be effectively automated?" indeed draws some fundamental limits of automatic computation. However, since the 1980s there has been an ongoing shift away from the machine-centered view of computing, towards a human-centered view of computing. In this paper I argue that humancentered computing necessitates a perspective shift in computer science. I note that the central question of machine-centered computing fails to recognize the driving issues of human-centered computing. I argue that in all branches of human-centered computing there is another fundamental question that should be asked: "What should be automated?"
Keywords: ethical questions, fundamental questions, human-centered computing, normative questions
Lifetrak: music in tune with your life BIBAKFull-Text 25-34
  Sasank Reddy; Jeff Mascia
Advances in sensing technology and wider availability of network services is beckoning the use of context-awareness in ubiquitous computing applications. One region in which these technologies can play a major role is in the area of entertainment. Particularly, context-awareness can be used to provide higher quality interaction between humans and the media they are interacting with. We propose a music player, Lifetrak, that is in tune with a person's life by using a context-sensitive music engine to drive what music is played. This context engine is influenced by (i) the location of the user, (ii) the time of operation, (iii) the velocity of the user, and (iv) urban environment information such as traffic, weather, and sound modalities. Furthermore, we adjust the context engine by implementing a learning model that is based on user feedback on whether a certain song is appropriate for a particular context. Also, we introduce the idea of a context equalizer that adjusts how much a certain sensing modality affects what song is chosen. Since the music player will be implemented on a mobile device, there is a strong focus on creating a user interface that can be manipulated by users on the go. The goal of Lifetrak is to liberate a user from having to consciously specify the music that they want to play. Instead, Lifetrak intends to create a music experience for the user that is in rhythm with themselves and the space they reside in.
Keywords: context, entertainment, mobile, music, sensors
Human-centered interaction with documents BIBAKFull-Text 35-44
  Andreas Dengel; Stefan Agne; Bertin Klein; Achim Ebert; Matthias Deller
In this paper, we discuss a new user interface, a complementary environment for the work with personal document archives, i.e. for document filing and retrieval. We introduce our implementation of a spatial medium for document interaction, explorative search and active navigation, which exploits and further stimulates the human strengths of visual information processing. Our system achieves a high degree of immersion of the user, so that he/she forgets the artificiality of his/her environment. This is done by means of a tripartite ensemble of allowing users to interact naturally with gestures and postures (as an option gestures and postures can be individually taught to the system by users), exploiting 3D technology, and supporting the user to maintain structures he/she discovers, as well as provide computer calculated semantic structures. Our ongoing evaluation shows that even non-expert users can efficiently work with the information in a document collection, and have fun.
Keywords: 3D displays, 3D user interface, data glove, gesture recognition, immersion
Creating serendipitous encounters in a geographically distributed community BIBAKFull-Text 45-54
  Adithya Renduchintala; Aisling Kelliher; Hari Sundaram
This paper is focused on the development of serendipitous interfaces that promote casual and chance encounters within a geographically distributed community. The problem is particularly important for distributed workforces, where there is little opportunity for chance encounters that are crucial to the formation of a sense of community. There are three contributions of this paper. (a) development of a robust communication architecture facilitating serendipitous casual interaction using online media repositories coupled to two multimodal interfaces (b) development of multimodal interfaces that allow users to browse, leave audio comments, and asynchronously listen to other community members, and (c) a multimodal gesture driven control (vision and ultrasonic) of the audio-visual display. Our user studies reveal that the interfaces are well liked, and promote social interaction.
Keywords: image repository, mediated communication, online media repository, remote interfaces, serendipitous interaction, social computing
Discovering groups of people in Google news BIBAKFull-Text 55-64
  Dhiraj Joshi; Daniel Gatica-Perez
In this paper, we study the problem of content-based social network discovery among people who frequently appear in world news. Google news is used as the source of data. We describe a probabilistic framework for associating people with groups. A low-dimensional topic-based representation is first obtained for news stories via probabilistic latent semantic analysis (PLSA). This is followed by construction of semantic groups by clustering such representations. Unlike many existing social network analysis approaches, which discover groups based only on binary relations (e.g. co-occurrence of people in a news article), our model clusters people using their topic distribution, which introduces contextual information in the group formation process (e.g. some people belong to several groups depending on the specific subject). The model has been used to study evolution of people with respect to topics over time. We also illustrate the advantages of our approach over a simple co-occurrence-based social network extraction method.
Keywords: probabilistic latent semantic indexing, social network analysis, text mining, topic evolution
Interactive video authoring and sharing based on two-layer templates BIBAKFull-Text 65-74
  Xian-Sheng Hua; Shipeng Li
The rapid adoption of digital cameras and camcorders leads to a huge demand for new tools and systems that enables average users to more efficiently and more effectively process, manage, author and share digital media contents, in particular, a powerful video authoring tool that can dramatically reduce the users' efforts in editing and sharing home video. Though there are many commercial video authoring tools available today, video authoring remains as a tedious and extremely time consuming task that often requires trained professional skills. To tackle this problem, this paper presents a novel interactive end-to-end system that enables fast, flexible and personalized video authoring and sharing. The novel system, called LazyMedia, is based on both content analysis techniques and the proposed content-aware twolayer authoring templates: Composition Template and Presentation Template. Moreover, it is designed as an open and extensible framework that can support dynamic update of core components such as content analysis algorithms, editing methods, and the two-layer authoring templates. Furthermore, the two layers of authoring templates separate the video authoring from video presentation. Once authored with LazyMedia, the video contents can be easily and flexibly presented in other forms according to users' preference. LazyMedia provides a semiautomatic video authoring and sharing system that significantly reduces users' efforts in video editing while preserving sufficient flexibility and personalization.
Keywords: interactive multimedia, multimedia authoring, multimedia management, template, video editing
User modeling in a speech translation driven mediated interaction setting BIBAKFull-Text 75-80
  JongHo Shin; Panayiotis G. Georgiou; Shrikanth Narayanan
The paper address user behavior modeling in a machine-mediated setting involving bidirectional speech translation. Specifically, usability data from doctor-patient dialogs involving a two way English-Persian speech translation system are analyzed to understand the nature, and extent, of user accommodation to machine errors. We consider user type "categorized along the classes of Accommodating, Normal and Picky" as it relates to the user's tendency to accept poor speech recognition and translation or retry to speak these again. For modeling, we employ a dynamic Bayesian network that can identify the user type with high accuracy after a few interactions of consistent user behavioral patterns. This model can be utilized for the design of machine strategies that can aid a user in operating the device more efficiently.
Keywords: dynamic Bayesian network, inference, reasoning, speech-to-speech, translation, user interaction, user modeling, user type, user-centered
Tillarom: an AJAX based folk song search and retrieval system with gesture interface based on kodály hand BIBAKFull-Text 81-88
  Attila Licsár; Tamás Szirányi; László Kovács; Balázs Pataki
A digital folk song search and retrieval system with a hand gesture based interface is presented. Tillarom is a comprehensive collection of original Hungarian folk songs recorded using different technologies such as phonographs and/or stereo DAT cassettes. This digital archive contains professional quality metadata records as well as MIDI recordings for presenting the different types of clustered folk songs. An AJAX based search and retrieval interface was developed that can be used together with optically recognized Kodály's hand signs to formulate queries through a web browser. The appearance based recognition of hand gestures utilizes contour analysis and SVM based classification. We evaluated the performance of the recognition of hand signs and investigated the main problems of their usage in our system.
Keywords: computer vision, digital archive, vision based hand gesture recognition, web based information search and retrieval
Community annotation and remix: a research platform and pilot deployment BIBAKFull-Text 89-98
  Ryan Shaw; Patrick Schmitz
We present a platform for community-supported media annotation and remix, including a pilot deployment with a major film festival. The platform was well received by users as fun and easy to use. An analysis of the resulting data yielded insights into user behavior. Completed remixes exhibited a range of genres, with over a third showing thematic unity and a quarter showing some attempt at narrative. Remixes were often complex, using many short segments taken from various source media. Reuse of spoken and written language in source media, and the use of written language in user-defined overlay text segments proved to be essential for most users. We describe how community remix statistics can be leveraged for media summarization, browsing, and editing support. Further, the platform as a whole provides a solid base for a range of ongoing research into community annotation and remix including analysis of remix syntax, identification of reusable segments, media and segment tagging, structured annotation of media, collaborative media production, and hybrid content-based and community-in-the-loop approaches to understanding media semantics.
Keywords: HCM, UGC, community media, human-centered multimedia, remix, tagging, video annotation
Toward multimodal fusion of affective cues BIBAKFull-Text 99-108
  Marco Paleari; Christine L. Lisetti
During face to face communication, it has been suggested that as much as 70% of what people communicate when talking directly with others is through paralanguage involving multiple modalities combined together (e.g. voice tone and volume, body language). In an attempt to render human-computer interaction more similar to human-human communication and enhance its naturalness, research on sensory acquisition and interpretation of single modalities of human expressions have seen ongoing progress over the last decade. These progresses are rendering current research on artificial sensor fusion of multiple modalities an increasingly important research domain in order to reach better accuracy of congruent messages on the one hand, and possibly to be able to detect incongruent messages across multiple modalities (incongruency being itself a message about the nature of the information being conveyed). Accurate interpretation of emotional signals -- quintessentially multimodal -- would hence particularly benefit from multimodal sensor fusion and interpretation algorithms. In this paper we provide a state of the art multimodal fusion and describe one way to implement a generic framework for multimodal emotion recognition. The system is developed within the MAUI framework [31] and Scherer's Component Process Theory (CPT) [49, 50, 51, 24, 52], with the goal to be modular and adaptive. We want the designed framework to be able to accept different single and multi modality recognition systems and to automatically adapt the fusion algorithm to find optimal solutions. The system also aims to be adaptive to channel (and system) reliability.
Keywords: HCI, affective computing, emotion recognition, multimodal fusion
Using model trees for evaluating dialog error conditions based on acoustic information BIBAKFull-Text 109-114
  Abe Kazemzadeh; Sungbok Lee; Shrikanth Narayanan
This paper examines the use of model trees for evaluating user utterances for response to system error in dialogs from the Communicator 2000 corpus. The features used by the model trees are limited to those which can be automatically obtained through acoustic measurements. These features are derived from pitch and energy measurements. The curve of the model tree output versus dialog turn is interpreted to be a measure of the level of user activation in the dialog. We test the premise that user response to error at the utterance level is related to user satisfaction at the dialog level. Several different evaluation tasks are investigated: on an utterance level we applied the model tree output to detecting response to error and on the dialog level we analyzed the relation of model tree output to estimating user satisfaction. For the former, we achieve 65% precision and 63% recall and for the latter our predictions show significant .48 correlation with user surveys.
Keywords: evaluation of human-computer dialog systems, paralinguistic feedback, user response to error
Driver monitoring for a human-centered driver assistance system BIBAFull-Text 115-122
  Joel McCall; Mohan M. Trivedi
Driving is a very complex task which, at its core, involves the interaction between the driver and his/her environment. It is therefore extremely important to develop driver assistance systems that are centered around the driver from the ground up. In this paper, we explore one aspect of such a system. Specifically, we focus on monitoring the driver's face and facial regions. We demonstrate a real-world system for tracking face and facial regions and provide insight as to it importance and placement in human-centered driver assistance systems. Result demonstrating its impact on driver assistance systems as well as its performance in real-world driving scenarios are shown.
A methodological study of situation understanding utilizing environments for multimodal observation of infant behavior BIBAKFull-Text 123-130
  Shogo Ishikawa; Shinya Kiriyama; Hiroaki Horiuchi; Shigeyoshi Kitazawa; Yoichi Takebayashi
We have developed a framework to understand situations and intentions of speakers focusing on the utterances of demonstratives. We aim at constructing a 'Multimodal Infant Behavior Corpus', which makes a valuable contribution to the elucidation of human commonsense knowledge and its acquisition mechanism. For this purpose, we have constructed environments for multimodal observation of infant behavior, in particular, environments for infant behavior recording; we have set up multiple cameras and microphones in the Cedar yurt. We have also developed a wearable speech recording device of high quality to capture infant utterances clearly. Moreover, we have developed a comment-collecting system which allows everyone to make comments easily from the multi-viewpoints. Those construction and developments make it possible to realize a framework for multimodal observation of infant behavior. Utilizing the multimodal environments, we propose a situation description model based on observation of demonstratives uttered by infants, since demonstratives appear frequently in their conversations and become a precious clue to understand situations. The proposed model, which represents the mental distances of speakers and listeners to objects on a general and simple model, enables us to predict speakers' next behavior. The consideration results enable us to conclude that the constructed environments lead to development and realization of human interaction models applicable to spoken dialog systems for elder people supporting.
Keywords: human interaction modeling, multimodal observation, situation understanding model