HCI Bibliography Home | HCI Conferences | GazeIn Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
GazeIn Tables of Contents: 121314

Proceedings of the 2013 Workshop on Eye Gaze in Intelligent Human Machine Interaction

Fullname:Proceedings of the 6th Workshop on Eye Gaze in Intelligent Human Machine Interaction: Gaze in Multimodal Interaction
Editors:Roman Bednarik; Hung-Hsuan Huang; Kristiina Jokinen; Yukiko I. Nakano
Location:Sydney, Australia
Standard No:ISBN: 978-1-4503-2563-9; ACM DL: Table of Contents; hcibib: GazeIn13
Links:Workshop Website | Conference Website
  1. Conversation
  2. Applications
  3. Gaze and mind


Context aware addressee estimation for human robot interaction BIBAFull-Text 1-6
  Samira Sheikhi; Dinesh Babu Jayagopi; Vasil Khalidov; Jean-Marc Odobez
The paper investigates the problem of addressee recognition -- to whom a speaker's utterance is intended -- in a setting involving a humanoid robot interacting with multiple persons. More specifically, as it is well known that addressee can primarily be derived from the speaker's visual focus of attention (VFOA) defined as whom or what a person is looking at, we address the following questions: how much does the performance degrade when using automatically extracted VFOA from head pose instead of the VFOA ground-truth? Can the conversational context improve addressee recognition by using it either directly as a side cue in the addressee classifier, or indirectly by improving the VFOA recognition, or in both ways? Finally, from a computational perspective, which VFOA features and normalizations are better and does it matter whether the VFOA recognition module only monitors whether a person looks at potential addressee targets (the robot, people) or if it also considers objects of interest in the environment (paintings in our case) as additional VFOA targets? Experiments on the public Vernissage database where the humanoid Nao robots make a quiz to two participants shows that reducing VFOA confusion (either through context, or by ignoring VFOA targets) improves addressee recognition.
The acoustics of eye contact: detecting visual attention from conversational audio cues BIBAFull-Text 7-12
  Florian Eyben; Felix Weninger; Lucas Paletta; Björn W. Schuller
An important aspect in short dialogues is attention as is manifested by eye-contact between subjects. In this study we provide a first analysis whether such visual attention is evident in the acoustic properties of a speaker's voice. We thereby introduce the multi-modal GRAS2 corpus, which was recorded for analysing attention in human-to-human interactions of short daily-life interactions with strangers in public places in Graz, Austria. Recordings of four test subjects equipped with eye tracking glasses, three audio recording devices, and motion sensors are contained in the corpus. We describe how we robustly identify speech segments from the subjects and other people in an unsupervised manner from multi-channel recordings. We then discuss correlations between the acoustics of the voice in these segments and the point of visual attention of the subjects. A significant relation between the acoustic features and the distance between the point of view and the eye region of the dialogue partner is found. Further, we show that automatic classification of binary decision eye-contact vs. no eye-contact from acoustic features alone is feasible with an Unweighted Average Recall of up to 70%.
A dominance estimation mechanism using eye-gaze and turn-taking information BIBAFull-Text 13-18
  Misato Yatsushiro; Naoya Ikeda; Yuki Hayashi; Yukiko I. Nakano
With a goal of contributing to multiparty conversation management, this paper proposes a mechanism for estimating conversational dominance in group interaction. Based on our corpus analysis, we have already established a regression model for dominance estimation using speech and gaze information. In this study, we implement the model as a dominance estimation mechanism, and propose an idea of utilizing the mechanism in moderating multiparty conversations between a conversational robot and three human users. The system decides whom the system should talk to based on the dominance level of each user.
Finding the timings for a guide agent to intervene inter-user conversation in considering their gaze behaviors BIBAFull-Text 19-24
  Shochi Otogi; Hung-Hsuan Huang; Ryo Hotta; Kyoji Kawagoe
As the advance of embodied conversational agent (ECA) technologies, there are more and more real-world deployed applications of ECA's like the guides in museums or exhibitions. However, in those situations, the agent systems are usually used by groups of visitors rather than individuals. In such multi-user situation which is much more complex than single user one, specific features are required. One of them is the ability for the agent to smoothly intervene user-user conversation. This feature is supposed to facilitate mixed-initiative human-agent conversation and more proactive service for the users. This paper presents the results of the first step of our project that aims to build an information providing the agent for collaborative decision making tasks, finding the timings for the agent to intervene user-user conversation to provide active support by focusing on the user's gaze. In order to realize this, at first, a Wizard-of-Oz (WOZ) experiment was conducted for collecting human interaction data. By analyzing the collected corpus, eight kinds of timings which allow the agent to do intervention potentially were found. Second, a method was developed to automatically identify four of the eight kinds of timings only by using nonverbal cues, gaze direction, body posture, and speech information. Although the performance of the method is moderate (F-measure 0.4), it should be able to be improved by integrating context information in the future.


Situated multi-modal dialog system in vehicles BIBAFull-Text 25-28
  Teruhisa Misu; Antoine Raux; Ian Lane; Joan Devassy; Rakesh Gupta
In this paper, we address Townsurfer, a situated multi-modal dialog system in vehicles. The system integrates multi-modal inputs of speech, geo-location, gaze (face direction) and dialog history to answer drivers' queries about their surroundings. To select appropriate data source used to answer queries, we apply belief tracking across the above modalities. We conducted a preliminary data collection and an evaluation focusing on the effect of gaze (head direction) and geo-location estimations. We report the result and analysis on the data.
Agent-assisted multi-viewpoint video viewer and its gaze-based evaluation BIBAFull-Text 29-34
  Takatsugu Hirayama; Takafumi Marutani; Daishi Tanoue; Shogo Tokai; Sidney Fels; Kenji Mase
Humans see things from various viewpoints but nobody attempts to see anything from every viewpoint owing to physical restrictions and the great effort required. Intelligent interfaces for viewing multi-viewpoint videos may remove the restrictions in effective ways and direct us toward a new visual world. We propose an agent-assisted multi-viewpoint video viewer that incorporates (1) target-centered viewpoint switching and (2) social viewpoint recommendation. The viewer stabilizes an object at the center of the display field using the former function, which helps to fix the user's gaze on the target object. To identify the popular viewing behavior for particular content, the latter function exploits a histogram of the viewing log in terms of time, viewpoints, and the target of many personal viewing experiences. We call this knowledge source of the director agent a viewgram. The agent automatically constructs the preferred viewpoint sequence for each target. We conducted user studies to analyze user behavior, especially eye movement, while using the viewer. The results of statistical analyses showed that the viewpoint sequence extracted from a viewgram includes a more distinct perspective for each target, and the target-centered viewpoint switching encourages the user to gaze at the display center where the target is located during the viewing. The proposed viewer can provide more effective perspectives for the main attractions in scenes.
Mutual disambiguation of eye gaze and speech for sight translation and reading BIBAFull-Text 35-40
  Rucha Kulkarni; Kritika Jain; Himanshu Bansal; Srinivas Bangalore; Michael Carl
Researchers are proposing interactive machine translation as a potential method to make language translation process more efficient and usable. Introduction of different modalities like eye gaze and speech are being explored to add to the interactivity of language translation system. Unfortunately, the raw data provided by Automatic Speech Recognition (ASR) and Eye-Tracking is very noisy and erroneous. This paper describes a technique for reducing the errors of the two modalities, speech and eye-gaze with the help of each other in context of sight translation and reading. Lattice representation and composition of the two modalities was used for integration. F-measure for Eye-Gaze and Word Accuracy for ASR were used as metrics to evaluate our results. In reading task, we demonstrated a significant improvement in both Eye-Gaze f-measure and speech Word Accuracy. In sight translation task, significant improvement was found in gaze f-measure but not in ASR.

Gaze and mind

Learning aspects of interest from Gaze BIBAFull-Text 41-44
  Kei Shimonishi; Hiroaki Kawashima; Ryo Yonetani; Erina Ishikawa; Takashi Matsuyama
This paper presents a probabilistic framework to model the gaze generative process when a user is browsing a content consisting of multiple regions. The model enables us to learn multiple aspects of interest from gaze data, to represent and estimate user's interest as a mixture of aspects, and to predict gaze behavior in a unified framework. We recorded gaze data of subjects when they were browsing a digital pictorial book, and confirmed the effectiveness of the proposed model in terms of predicting the gaze target.
Feature selection for gaze, pupillary, and EEG signals evoked in a 3D environment BIBAFull-Text 45-50
  David C. Jangraw; Paul Sajda
As we navigate our environment, we are constantly assessing the objects we encounter and deciding on their subjective interest to us. In this study, we investigate the neural and ocular correlates of this assessment as a step towards their potential use in a mobile human-computer interface (HCI). Past research has shown that multiple physiological signals are evoked by objects of interest during visual search in the laboratory, including gaze, pupil dilation, and neural activity; these have been exploited for use in various HCIs. We use a virtual environment to explore which of these signals are also evoked during exploration of a dynamic, free-viewing 3D environment. Using a hierarchical classifier and sequential forward floating selection (SFFS), we identify a small, robust set of features across multiple modalities that can be used to distinguish targets from distractors in the virtual environment. The identification of these features may serve as an important factor in the design of mobile HCIs.
Lying through the eyes: detecting lies through eye movements BIBAFull-Text 51-56
  Kai Keat Lim; Max Friedrich; Jenni Radun; Kristiina Jokinen
In this pilot study, we investigated if it is possible to detect lies through eye gaze behavior. Earlier research suggests that lying increases the cognitive load, resulting in less eye-movements and shorter saccade amplitudes. To investigate these findings further, a structured interview was conducted with three subjects. During the interview, subjects were supposed to lie in half of their answers. The subjects' eye gazes were tracked during the interview session. We hypothesized that people show shorter saccade amplitudes and tend to engage in less eye movements when lying. A significant difference could be observed for saccade amplitudes between the truth telling and lie telling situations. The overall results support the theory that cognitive load decreases the number of eye movements, but our analysis also revealed significant individual differences. This raised the question whether different individuals have different ways of handling deception and whether different viewing behavior patterns could be found for different groups of individuals.
Unrawelling the interaction strategies and gaze in collaborative learning with online video lectures BIBAFull-Text 57-62
  Roman Bednarik; Marko Kauppinen
Using dual eye tracking we performed a study characterising the differences in interaction patterns while learning online materials individually or with a peer. The findings show that in majority of cases, user prefer to use the online learning materials in parallel when working on a learning task on their own tool. Collaborative learning took longer due to negotiation overheads, and most attention was paid to the materials. However, collaboration did not have effects on the overall distribution of gaze.