HCI Bibliography Home | HCI Conferences | GazeIn Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
GazeIn Tables of Contents: 121314

Proceedings of the 2014 Workshop on Eye Gaze in Intelligent Human Machine Interaction

Fullname:Proceedings of the 7th Workshop on Eye Gaze in Intelligent Human Machine Interaction: Gaze in Multimodal Interaction
Editors:Hung-Hsuan Huang; Roman Bednarik; Kristiina Jokinen; Yukiko I. Nakano
Location:Istanbul, Turkey
Standard No:ISBN: 978-1-4503-0125-1; ACM DL: Table of Contents; hcibib: GazeIn14
Links:Workshop Website | Conference Website
  1. Keynote Talk
  2. Long Papers
  3. Short Paper

Keynote Talk

Attention and Gaze in Situated Language Interaction BIBAFull-Text 1
  Dan Bohus
The ability to engage in natural language interaction in physically situated settings hinges on a set of competencies such as managing conversational engagement, turn taking, understanding, language and behavior generation, and interaction planning. In human-human interaction these are mixed-initiative, collaborative processes, that often involve a wide array of finely coordinated verbal and non-verbal actions. Eye gaze, and more generally attention, among many other channels, play a fundamental role. In this talk, I will discuss samples of research work we have conducted over the last few years on developing models for supporting physically situated dialog in relatively unconstrained environments. Throughout, I will highlight the role that gaze and attention play in these models. I will discuss and showcase several prototype systems that we have developed, and describe opportunities for reasoning about, interpreting and producing gaze signals in support of fluid, seamless spoken language interaction.

Long Papers

Spatio-Temporal Event Selection in Basic Surveillance Tasks using Eye Tracking and EEG BIBAFull-Text 3-8
  Jutta Hild; Felix Putze; David Kaufman; Christian Kühnle; Tanja Schultz; Jürgen Beyerer
In safety- and security-critical applications like video surveillance it is crucial that human operators detect task-relevant events in the continuous video streams and select them for report or dissemination to other authorities. Usually, the selection operation is performed using a manual input device like a mouse or a joystick. Due to the visually rich and dynamic input, the required high attention, the long working time, and the challenging manual selection of moving objects, it occurs that relevant events are missed. To alleviate this problem we propose adding another event selection process, using eye-brain input. Our approach is based on eye tracking and EEG, providing spatio-temporal event selection without any manual intervention. We report ongoing research, building on prior work where we showed the general feasibility of the approach. In this contribution, we extend our work testing the feasibility of the approach using more advanced and less artificial experimental paradigms simulating frequently occurring, basic types of real surveillance tasks. The paradigms are much closer to a real surveillance task in terms of the used visual stimuli, the more subtle cues for event indication, and the required viewing behavior. As a methodology we perform an experiment (N=10) with non-experts. The results confirm the feasibility of the approach for event selection in the advanced tasks. We achieve spatio-temporal event selection accuracy scores of up to 77% and 60% for different stages of event indication.
Gaze-Based Virtual Task Predictor BIBAFull-Text 9-14
  Çagla Çig; Tevfik Metin Sezgin
Pen-based systems promise an intuitive and natural interaction paradigm for tablet PCs and stylus-enabled phones. However, typical pen-based interfaces require users to switch modes frequently in order to complete ordinary tasks. Mode switching is usually achieved through hard or soft modifier keys, buttons, and soft-menus. Frequent invocation of these auxiliary mode switching elements goes against the goal of intuitive, fluid, and natural interaction. In this paper, we present a gaze-based virtual task prediction system that has the potential to alleviate dependence on explicit mode switching in pen-based systems. In particular, we show that a range of virtual manipulation commands, that would otherwise require auxiliary mode switching elements, can be issued with an 80% success rate with the aid of users' natural eye gaze behavior during pen-only interaction.
Analysis of Timing Structure of Eye Contact in Turn-changing BIBAFull-Text 15-20
  Ryo Ishii; Kazuhiro Otsuka; Shiro Kumano; Junji Yamato
With the aim of constructing a model for predicting the next speaker and the start of the next utterance in multi-party meetings, we focus on the timing structure of the eye contact between the speaker, the listener, and the next speaker: who looks at whom first, who looks away first, and when the eye contact happens. We analyze the differences in the timing structure for the listener and next speaker in turn-changing and turn-keeping. The results of analysis show that the listeners in turn-keeping tend to look at the speaker more often first before the speaker looks at the listeners than the next speaker in turn-changing looks at the speaker first before the speaker looks at the next speaker when the eye contact with the speaker happens. The listeners in turn-keeping tend to look away from the speaker more often later after the speaker looks away from the listener than listeners and the next speaker in turn-changing looks away from the speaker later when the eye contact with the speaker happens. In addition, the interval between the end of eye contact, the end of the speaker's utterance, and the start of next speaker's utterance is different between the listener in turn-keeping, the listener in turn-changing, and the next speaker in turn-changing.
Fusing Multimodal Human Expert Data to Uncover Hidden Semantics BIBAFull-Text 21-26
  Xuan Guo; Qi Yu; Rui Li; Cecilia Ovesdotter Alm; Anne R. Haake
Problem solving in complex visual domains involves multiple levels of cognitive processing. Analyzing and representing these cognitive processes requires the elicitation and study of multimodal human data. We have developed methods for extracting experts' visual behaviors and verbal descriptions during medical image inspection. Now we address fusion of these data towards building a novel framework for organizing elements of expertise as a foundation for knowledge-dependent computational systems. In this paper, a multimodal graph-regularized non-negative matrix factorization approach is developed and used to fuse multimodal data collected during medical image inspection. Our experimental results on new data representation demonstrate the effectiveness of the proposed data fusion approach.
Evaluating the Impact of Embodied Conversational Agents (ECAs) Attentional Behaviors on User Retention of Cultural Content in a Simulated Mobile Environment BIBAFull-Text 27-32
  Ioannis Doumanis; Serengul Smith
The paper presents an evaluation study of the impact of an ECA's attentional behaviors using a custom research method that combines facial expression analysis, eye-tracking and a retention test. The method provides additional channels to EEG-based methods (e.g., [8]) for the study of user attention and emotions. In order to validate the proposed approach, two tour guide applications were created with an embodied conversational agent (ECA) that presents cultural content about a real-tourist attraction. The agent simulates two attention-grabbing mechanisms -- humorous and serious to attract the users' attention. A formal study was conducted to compare two tour guide applications in the lab. The data collected from the facial expression analysis and eye-tracking helped to explain particularly good and bad performances in retention tests. In terms of the study results, strong quantitative and qualitative evidence was found that an ECA should not attract more attention to itself than necessary, to avoid becoming a distraction from the flow of the content. It was also found that the ECA had an inverse effect on the retention performance of participants with different gender and their use on computer interfaces is not a good idea for elderly users.
Analyzing Co-occurrence Patterns of Nonverbal Behaviors in Collaborative Learning BIBAFull-Text 33-37
  Sakiko Nihonyanagi; Yuki Hayashi; Yukiko I. Nakano
In collaborative learning, participants work on the learning task together. In this environment, linguistic information via speech as well as non-verbal information such as gaze and writing actions are important elements. It is expected that integrating the information from these behaviors will contribute to assessing the learning activity and characteristics of each participant in a more objective manner. With the objective of characterizing participants in the collaborative learning activity, this study analyzed the verbal and nonverbal behaviors and found that the gaze behaviors of individual participants and those between the participants provides useful information in distinguishing a leader of the group, one who follows the leader, or one who attends to other participants who do not appear to understand.

Short Paper

Study on Participant-controlled Eye Tracker Calibration Procedure BIBAFull-Text 39-41
  Pawel Kasprowski; Katarzyna Harezlak
The analysis of an eye movement signal, which can reveal o lot of information about the way human brain works, has recently attracted the attention of many researchers. The basis for such studies is data returned by specialized devices called eye-trackers. The first step of their usage is a calibration process, allowing to reflect an eye position to a point of regard. The main research problem analyzed in this paper is to check whether and how the chosen calibration scenario influences the calibration result (calibration errors). Based on this analysis of possible scenarios, a new user-controlled calibration procedure was developed. It was checked and compared with a classic approach during pilot studies using the Eye Tribe system as an eye-tracker device. The results obtained for both methods were examined in terms of provided accuracy.