HCI Bibliography Home | HCI Conferences | GazeIn Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
GazeIn Tables of Contents: 121314

Proceedings of the 2012 Workshop on Eye Gaze in Intelligent Human Machine Interaction

Fullname:Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction
Location:Santa Monica, California
Dates:2012-Oct-26
Publisher:ACM
Standard No:ISBN: 978-1-4503-1516-6; ACM DL: Table of Contents; hcibib: GazeIn12
Papers:18
Links:Workshop Website | Conference Website
Summary:Eye gaze is one of the most important aspects in modeling human-human communication, and has great potential in improving human-humanoid interaction. In human face-to-face communication, eye gaze plays an important role in floor management, grounding, and engagement in conversation. In human-robot interaction research, social gaze, gaze directed at an interaction partner, has been a subject of increased attention.
    This is the fourth workshop on Eye Gaze on Intelligent Human Machine Interaction. In the previous workshops, we have discussed a wide range of issues for eye gaze; technologies for sensing human attentional behaviors, roles of attentional behaviors as social gaze in human-human and human-humanoid interaction, attentional behaviors in problem-solving and task-performing, gaze-based intelligent user interfaces, and evaluation of gaze-based UI. In addition to these topics, this year's workshop focuses on eye gaze in multimodal interpretation and generation. Since eye gaze is one of the facial communication modalities, gaze information can be combined with other modalities or bodily motions to contribute meanings of utterances and serve as communication signals.
    This workshop aims to continue exploring this growing area of research by bringing together researchers including human sensing, multimodal processing, humanoid interfaces, intelligent user interface, and communication science. We will exchange ideas to develop and improve methodologies for this research area with the long term goal of establishing a strong interdisciplinary research community in "attention aware interactive systems".
Brain-enhanced synergistic attention (BESA) BIBAFull-Text 1
  Deepak Khosla; Matthew Keegan; Lei Zhang; Kevin R. Martin; Darrel J. VanBuer; David J. Huber
In this paper, we describe a hybrid human-machine system for searching and detecting Objects of Interest (OI) in imagery. Automated methods for OI detection based on models of human visual attention have received much interest, but are inherently bottom-up and driven by features. Humans fixate on regions of imagery based on a much stronger top-down component. While it may be possible to include some aspects of top-down cognition into these methods, it is difficult to fully capture all aspects of human cognition into an automated algorithm. Our hypothesis is that combination of automated methods with human fixations will provide a better solution than either alone. In this work, we describe a Brain-Enhanced Synergistic Attention (BESA) system that combines models of visual attention with real-time eye fixations from a human for accurate search and detections of OI. We describe two different BESA schemes and provide implementation details. Preliminary studies were conducted to determine the efficacy of the system and initial results are promising. Typical applications of this technology are in surveillance, reconnaissance and intelligence analysis.
Multi-modal object of interest detection using eye gaze and RGB-D cameras BIBAFull-Text 2
  Christopher McMurrough; Jonathan Rich; Christopher Conly; Vassilis Athitsos; Fillia Makedon
This paper presents a low-cost, wearable headset for mobile 3D Point of Gaze (PoG) estimation in assistive applications. The device consists of an eye tracking camera and forward facing RGB-D scene camera which are able to provide an estimate of the user gaze vector and its intersection with a 3D point in space. A computational approach that considers object 3D information and visual appearance together with the visual gaze interactions of the user is also given to demonstrate the utility of the device. The resulting system is able to identify, in real-time, known objects within a scene that intersect with the user gaze vector.
Perception of gaze direction for situated interaction BIBAFull-Text 3
  Samer Al Moubayed; Gabriel Skantze
Accurate human perception of robots' gaze direction is crucial for the design of a natural and fluent situated multimodal face-to-face interaction between humans and machines. In this paper, we present an experiment targeted at quantifying the effects of different gaze cues synthesized using the Furhat back-projected robot head, on the accuracy of perceived spatial direction of gaze by humans using 18 test subjects. The study first quantifies the accuracy of the perceived gaze direction in a human-human setup, and compares that to the use of synthesized gaze movements in different conditions: viewing the robot eyes frontal or at a 45 degrees angle side view. We also study the effect of 3D gaze by controlling both eyes to indicate the depth of the focal point (vergence), the use of gaze or head pose, and the use of static or dynamic eyelids. The findings of the study are highly relevant to the design and control of robots and animated agents in situated face-to-face interaction.
A head-eye coordination model for animating gaze shifts of virtual characters BIBAFull-Text 4
  Sean Andrist; Tomislav Pejsa; Bilge Mutlu; Michael Gleicher
We present a parametric, computational model of head-eye coordination that can be used in the animation of directed gaze shifts for virtual characters. The model is based on research in human neurophysiology. It incorporates control parameters that allow for adapting gaze shifts to the characteristics of the environment, the gaze targets, and the idiosyncratic behavioral attributes of the virtual character. A user study confirms that the model communicates gaze targets as effectively as real humans do, while being preferred subjectively to state-of-the-art models.
From the eye to the heart: eye contact triggers emotion simulation BIBAFull-Text 5
  Magdalena Rychlowska; Leah Zinner; Serban C. Musca; Paula M. Niedenthal
Smiles are complex facial expressions that carry multiple meanings. Recent literature suggests that deep processing of smiles via embodied simulation can be triggered by achieved eye contact. Three studies supported this prediction. In Study 1, participants rated the emotional impact of portraits, which varied in eye contact and smiling. Smiling portraits that achieved eye contact were more emotionally impactful than smiling portraits that did not achieve eye contact. In Study 2, participants saw photographs of smiles in which eye contact was manipulated. The same smile of the same individual caused more positive emotion and higher ratings of authenticity when eye contact was achieved than when it was not. In Study 3, participants' facial EMG was recorded. Activity over the zygomatic major (i.e. smile) muscle was greater when participants observed smiles that achieved eye contact compared to smiles that did not. These results support the role of eye contact as a trigger of embodied simulation. Implications for human-machine interactions are discussed.
Addressee identification for human-human-agent multiparty conversations in different proxemics BIBAFull-Text 6
  Naoya Baba; Hung-Hsuan Huang; Yukiko I. Nakano
This paper proposes a method for identifying the addressee based on speech and gaze information, and shows that the proposed method can be applicable to human-human-agent multiparty conversations in different proxemics. First, we collected human-human-agent interaction in different proxemics, and by analyzing the data, we found that people spoke with a higher tone of voice and more loudly and slowly when they talked to the agent. We also confirmed that this speech style was consistent regardless of the proxemics. Then, by employing SVM, we proposed a general addressee estimation model that can be used in different proxemics, and the model achieved over 80% accuracy in 10-fold cross-validation.
Hard lessons learned: mobile eye-tracking in cockpits BIBAFull-Text 7
  Hana Vrzakova; Roman Bednarik
Eye-tracking presents an attractive tool in testing of design alternatives in all stages of interface evaluation. Access to the operator's visual attention behaviors provides information supporting design decisions. While mobile eye-tracking increases ecological validity it also brings about numerous constraints. In this work, we discuss mobile eye-tracking issues in the complex environment of a business jet flight simulator in industrial research settings. The cockpit and low illumination directly limited the setup of the eye-tracker and quality of recordings and evaluations. Here we present lessons learned and the best practices in setting up the eye-tracker under challenging simulation conditions.
Analysis on learners' gaze patterns and the instructor's reactions in ballroom dance tutoring BIBAFull-Text 8
  Kosuke Kimura; Hung-Hsuan Huang; Kyoji Kawagoe
The use of virtual conversational agents is awaited in the tutoring of physical skills such as sports or dances. This paper describes about an ongoing project aiming to realize a virtual instructor for ballroom dance. First, a human-human experiment is conducted to collect the interaction corpus between a professional instructor and six learners. The verbal and non-verbal behaviors of the instructor is analyzed and served as the base of a state transition model for ballroom dance tutoring. In order to achieve intuitive and efficient instruction during the multi-modal interaction between the virtual instructor and the learner, the eye gaze patterns of the learner and the reaction from the instructor were analyzed. From the analysis results, it was found that the learner's attitude (confidence and concentration) could be approximated by their gaze patterns, and the instructor's tutoring strategy supported this as well.
Multimodal corpus of conversations in mother tongue and second language by same interlocutors BIBAFull-Text 9
  Kosuke Kabashima; Kristiina Jokinen; Masafumi Nishida; Seiichi Yamamoto
We describe data on multi-modal information that were collected from conversations both in the mother tongue and the second language in this paper. We also compare eye movements and utterance styles between communications in the mother tongue and second language. The results we obtained from analyzing eye movements and utterance styles are presented.
Gaze and conversational engagement in multiparty video conversation: an annotation scheme and classification of high and low levels of engagement BIBAFull-Text 10
  Roman Bednarik; Shahram Eivazi; Michal Hradis
When using a multiparty video mediated system, interacting participants assume a range of various roles and exhibit behaviors according to how engaged in the communication they are. In this paper we focus on estimation of conversational engagement from gaze signal. In particular, we present an annotation scheme for conversational engagement, a statistical analysis of gaze behavior across varying levels of engagement, and we classify vectors of computed eye tracking measures. The results show that in 74% of cases the level of engagement can be correctly classified into either high or low level. In addition, we describe the nuances of gaze during distinct levels of engagement.
Visual interaction and conversational activity BIBAFull-Text 11
  Andres Levitski; Jenni Radun; Kristiina Jokinen
In addition to the contents of their speech, people who are engaged in a conversation express themselves in many nonverbal ways. This means that people interact and are attended to even when they are not speaking. In this pilot study, we created an experimental setup for a three-party interactive situation where one of the participants remained silent throughout the session, and the gaze of one of the active subjects was tracked. The eye-tracked subject was unaware of the setup. The pilot study used only two test subjects, but the results provide some clues towards estimating how the behavior and activity of the non-speaking participant might affect other participants' conversational activity and the situation itself. We also found that the speaker's gaze activity is different in the beginning of the utterance than at the end of the utterance, indicating that the speaker's focus of attention towards the partner differs depending on the turn taking situation. Using the experience gained in this trial, we point out several things to consider that might help to avoid pitfalls when designing a more extensive study into the subject.
Move it there, or not?: the design of voice commands for gaze with speech BIBAFull-Text 12
  Monika Elepfandt; Martin Grund
This paper presents an experiment that was conducted to investigate gaze combined with voice commands. There has been very little research about the design of voice commands for this kind of input. It is not known yet if users prefer longer sentences like in natural dialogues or short commands. In the experiment three different voice commands are compared during a simple task in which participants had to drag & drop, rotate, and resize objects. It turned out that the shortness of a voice command -- in terms of number of words -- is more important than it being absolutely natural. Participants preferred the voice command with the fewest words and the fewest syllables. For the voice commands which had the same number of syllables, the users also preferred the one with the fewest words, even though there were no big differences in time and errors.
Eye gaze assisted human-computer interaction in a hand gesture controlled multi-display environment BIBAFull-Text 13
  Tong Cha; Sebastian Maier
A special human-computer interaction (HCI) framework processing user input in a multi-display environment has the ability to detect and interpret dynamic hand gesture input. In an environment equipped with large displays, full contactless application control is possible with this system. This framework was extended with a new input modality that involves human gaze in the interaction. The main contribution of this work is the possibility to unite any types of computer input and obtain a detailed view on the behaviour of every modality. Information is then available in the form of high speed data samples received in real time. The framework is designed with a special regard to gaze and hand gesture input modality in multi-display environments with large-area screens.
A framework of personal assistant for computer users by analyzing video stream BIBAFull-Text 14
  Zixuan Wang; Jinyun Yan; Hamid Aghajan
The engagement time on the computer is increasing steadily with the rapid development of the Internet. During the long period in front of the computer, bad postures and habits will result in some health risks, and the unawareness of fatigue will impair the work efficiency. We investigate how users behave in front of the computer with a camera. Face pose, eye gaze, eye blinking, and yawn frequency are considered. These visual cues are then used to give suggestions to users for correcting wrong posture and indicating the need for a break. We propose a novel framework of personal assistant for a user when he uses computer for a long time. The camera produces the video stream which records the user behavior. The automatically assistant system will analyze the visual inputs and give suggestions at the right time. Our experiment shows that it achieves high accuracy of detecting visual cues, and makes reasonable suggestions to users. The work initializes the area of assistant system for individuals who use computer frequently.
Simple multi-party video conversation system focused on participant eye gaze: "Ptolemaeus" provides participants with smooth turn-taking BIBAFull-Text 15
  Saori Yamamoto; Nazomu Teraya; Yumika Nakamura; Narumi Watanabe; Yande Lin; Mayumi Bono; Yugo Takeuchi
This paper shows a prototype system that provides a natural multi-party conversation environment among participants in different places. Eye gaze is an important feature for maintaining smooth multi-party conversations because it indicates whom the speech is addressing or nominates the next speaker. Nevertheless, most popular video conversation systems, such as Skype or FaceTime, do not support the interaction of eye gaze. Serious confusion is caused in multi-party video conversation systems that have no eye gaze support. For example, who is the addressee of the speech? Who is the next speaker? We propose a simple multi-party video conversation environment called Ptolemaeus that realizes eye gaze interaction among more than three participants without any special equipment. This system provides natural turn-taking in face-to-face video conversations and can be implemented more easily than former schemes concerned with eye gaze interaction.
Sensing visual attention using an interactive bidirectional HMD BIBAFull-Text 16
  Tobias Schuchert; Sascha Voth; Judith Baumgarten
This paper presents a novel system for sensing of attentional behavior in Augmented Reality (AR) environments by analyzing eye movement. The system is based on light weight head mounted optical see-through glasses containing bidirectional microdisplays, which allow displaying images and eye tracking on a single chip. The sensing and interaction application has been developed in the European project ARtSENSE in order to (1) detect museum visitors attention/interest in artworks as well as in presented AR content, (2) present appropriate personalized information based on the detected attention as augmented overlays, and (3) allow museum visitors gaze-based interaction with the system or the AR content. In this paper we present a novel algorithm for pupil estimation in low resolution eye-tracking images and show first results on attention estimation by eye movement analysis and interaction with the system by gaze.
Semantic interpretation of eye movements using designed structures of displayed contents BIBAFull-Text 17
  Erina Ishikawa; Ryo Yonetani; Hiroaki Kawashima; Takatsugu Hirayama; Takashi Matsuyama
This paper presents a novel framework to interpret eye movements using semantic relations and spatial layouts of displayed contents, i.e., the designed structure. We represent eye movements in a multi-scale, interval-based manner and associate them with various semantic relations derived from the designed structure. In preliminary experiments, we apply the proposed framework to the eye movements when browsing catalog contents, and confirm the effectiveness of the framework via user-state estimation.
A communication support interface based on learning awareness for collaborative learning BIBAFull-Text 18
  Yuki Hayashi; Tomoko Kojiri; Toyohide Watanabe
The development of information communication technologies allows learners to study together with others through networks. To realize successful collaborative learning in such distributed environments, supporting their communication is important because participants acquire their knowledge through exchanging utterances. To address this issue, this paper proposes a communication supporting interface for network-based remote collaborative learning. In order to utilize communication opportunities, it is desirable that participants can be aware of the information in collaborative learning environments, and feels the sense of togetherness with others. Our system facilitates three types of awareness for the interface: awareness of participants, that of utterances, and contribution to discussion. We believe our system facilitates communication among the participants in CSCL environment.