HCI Bibliography Home | HCI Conferences | ETRA Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
ETRA Tables of Contents: 000204060810121416

Proceedings of the 2012 Symposium on Eye Tracking Research & Applications

Fullname:Proceedings of the 2012 Symposium Eye Tracking Research & Applications
Editors:Carlos H. Morimoto; Howell Istance; Stephen N. Spencer; Jeffrey B. Mulligan; Pernilla Qvarfordt
Location:Santa Barbara, California
Dates:2012-Mar-28 to 2012-Mar-30
Standard No:ISBN: 1-4503-1221-7, 978-1-4503-1221-9; ACM DL: Table of Contents hcibib: ETRA12
Links:Conference Series Home Page
Summary:On its seventh occasion, it is our pleasure to bring ETRA 2012 to Santa Barbara, CA. The series of ETRA symposiums has become the leading international conference in eye tracking technology and its applications, bringing together people from a wide range of backgrounds. Authors have been encouraged to submit papers on topics such as advances in eye tracking hardware and software, eye movement data analysis, visual attention, and eye movement control.
    ETRA 2012's special theme has been Mobile Eye Tracking. Mobile devices such as smartphones and tablet computers are becoming increasingly powerful. Embedding the capability to track eye movements and support gaze-based applications in these devices raises new challenges and opportunities for many aspects of eye tracking research. These proceedings contain several papers that address these problems.
    As in previous years, ETRA 2012 has had two kinds of submissions: Long Papers (8 pages), and Short Papers (4 pages). Authors were requested to send an abstract in advance, and submit their papers in blind-format. For the second consecutive occasion, ETRA has received more submissions than for any of the previous symposia, with 53 long and 101 short papers being submitted. These proceedings contain the 18 long and 22 short papers that were accepted for oral presentations, and the 43 short papers accepted as posters. These papers were selected after a rigorous and impartial double-blind review process, where each original submission was reviewed by at least 3 reviewers, followed by careful examination from one of our 7 Area Chairs. Each Area Chair wrote a meta-review for the papers within their area of expertise, and the final selection was made by the Program Chairs and Area Chairs, on the basis of the reviews and the meta-reviews.
  1. Gaze visualization
  2. Eye tracking systems
  3. Gaze informed user interfaces
  4. Visual attention: studies, tools, methods
  5. Gaze based interaction
  6. Eye tracking systems issues I
  7. Eye tracking applications I
  8. Eye tracking systems issues II
  9. Eye tracking applications II
  10. Systems, tools, methods
  11. Uses and applications

Gaze visualization

Aggregate gaze visualization with real-time heatmaps BIBAFull-Text 13-20
  Andrew T. Duchowski; Margaux M. Price; Miriah Meyer; Pilar Orero
A GPU implementation is given for real-time visualization of aggregate eye movements (gaze) via heatmaps. Parallelization of the algorithm leads to substantial speedup over its CPU-based implementation and, for the first time, allows real-time rendering of heatmaps atop video. GLSL shader colorization allows the choice of color ramps. Several luminance-based color maps are advocated as alternatives to the popular rainbow color map, considered inappropriate (harmful) for depiction of (relative) gaze distributions.
A method to construct an importance map of an image using the saliency map model and eye movement analysis BIBAFull-Text 21-28
  Akira Egawa; Susumu Shirayama
Interpretability and recognizability of images have played important roles in applications such as the analysis of surveillance images, medical image diagnosis, and visual communication in education. In order to make an image as interpretable and recognizable as possible, unimportant visual information is removed or minimized, and regions that are of higher importance than others are clearly identified. Several methods have been developed to identify the important regions in an image. Most of these methods consist of two stages: segmentation of the image and ordering the segments hierarchically according to their relative importance. In the present paper, we propose a new method by which an importance map of a source image can be constructed. First, the source image is divided into segments based on a saliency map model that indicates high-saliency regions. Second, the segments are ordered according to the attention shift induced by the saliency map. Third, eye movement data is acquired and mapped into the segments. A network for the eye movements is generated by regarding the segments as nodes. The importance score can be calculated by the PageRank algorithm. Finally, an importance map of the image is constructed by combining the attention shift among the segments and the scores determined from eye movements. The usefulness of the proposed method is then investigated through several experiments.
Measuring and visualizing attention in space with 3D attention volumes BIBAFull-Text 29-36
  Thies Pfeiffer
Knowledge about the point of regard is a major key for the analysis of visual attention in areas such as psycholinguistics, psychology, neurobiology, computer science and human factors. Eye tracking is thus an established methodology in these areas, e. g., for investigating search processes, human communication behavior, product design or human-computer interaction. As eye tracking is a process which depends heavily on technology, the progress of gaze use in these scientific areas is tied closely to the advancements of eye-tracking technology. It is thus not surprising that in the last decades, research was primarily based on 2D stimuli and rather static scenarios, regarding both content and observer.
   Only with the advancements in mobile and robust eye-tracking systems, the observer is freed to physically interact in a 3D target scenario. Measuring and analyzing the point of regards in 3D space, however, requires additional techniques for data acquisition and scientific visualization. We describe the process for measuring the 3D point of regard and provide our own implementation of this process, which extends recent approaches of combining eye tracking with motion capturing, including holistic estimations of the 3D point of regard. In addition, we present a refined version of 3D attention volumes for representing and visualizing attention in 3D space.
Automatic analysis of 3D gaze coordinates on scene objects using data from eye-tracking and motion-capture systems BIBAFull-Text 37-44
  Kai Essig; Daniel Dornbusch; Daniel Prinzhorn; Helge Ritter; Jonathan Maycock; Thomas Schack
We implemented a system, called the VICON-EyeTracking Visualizer, that combines mobile eye tracking data with motion capture data to calculate and visualize the 3D gaze vector within the motion capture co-ordinate system. To ensure that both devices were temporally synchronized we used previously developed software by us. By placing reflective markers on objects in the scene, their positions are known and by spatially synchronizing both the eye tracker and the motion capture system allows us to automatically compute how many times and where fixations occur, thus overcoming the time consuming and error-prone disadvantages of the traditional manual annotation process. We evaluated our approach by comparing its outcome for a simple looking task and a more complex grasping task against the average results produced by the manual annotation process. Preliminary data reveals that the program only differed from the average manual annotation results by approximately 3 percent in the looking task with regard to the number of fixations and cumulative fixation duration on each point in the scene. In case of the more complex grasping task the results depend on the object size: for larger objects there was good agreement (less than 16 percent (or 950ms)), but this degraded for smaller objects, where there are more saccades towards object boundaries. The advantages of our approach are easy user calibration, the ability to have unrestricted body movements (due to the mobile eye-tracking system), and that it can be used with any wearable eye tracker and marker based motion tracking system. Extending existing approaches, our system is also able to monitor fixations on moving objects. The automatic analysis of gaze and movement data in complex 3D scenes can be applied to a variety of research domains, i. e., Human Computer Interaction, Virtual Reality or grasping and gesture research.

Eye tracking systems

Eye tracker data quality: what it is and how to measure it BIBAFull-Text 45-52
  Kenneth Holmqvist; Marcus Nyström; Fiona Mulvey
Data quality is essential to the validity of research results and to the quality of gaze interaction. We argue that the lack of standard measures for eye data quality makes several aspects of manufacturing and using eye trackers, as well as researching eye movements and vision, more difficult than necessary. Uncertainty regarding the comparability of research results is a considerable impediment to progress in the field. In this paper, we illustrate why data quality matters and review previous work on how eye data quality has been measured and reported. The goal is to achieve a common understanding of what data quality is and how it can be defined, measured, evaluated, and reported.
A probabilistic approach for the estimation of angle kappa in infants BIBAFull-Text 53-58
  Dmitri Model; Moshe Eizenman
This paper presents a probabilistic approach for the estimation of the angle between the optical and visual axes (angle kappa) in infants. The approach assumes that when patterned calibration targets are presented on a uniform background, subjects are more likely to look at the calibration targets than at the uniform background, but it does not require accurate and continuous fixation on presented targets. Simulations results show that when subjects attend to roughly half of the presented targets, angle kappa can be estimated accurately with low probability (< 1%) of false detection. In experiments with five babies who attended to the calibration target for only 47% of the time (range from 26% to 70%), the average difference between repeated measurements of angle kappa was 0.04 ± 0.31°.
Augmenting the robustness of cross-ratio gaze tracking methods to head movement BIBAFull-Text 59-66
  Flávio Luiz Coutinho; Carlos H. Morimoto
Remote gaze estimation using a single non-calibrated camera, simple user calibration or calibration free, and robust to head movements are very desirable features of eye tracking systems. Because cross-ratio (CR) is an invariant property of projective geometry, gaze estimation methods that rely on this property have the potential to provide these features, though most current implementations rely on a few simplifications that compromise the performance of the method. In this paper, the CR method for gaze tracking is revisited, and we introduce a new method that explicitly compensates head movements using a simple 3 parameter eye model. The method uses a single non-calibrated camera and requires a simple calibration procedure per user to estimate the eye parameters. We have conducted simulations and experiments with real users that show significant improvements over current state-of-the-art CR methods that do not explicitly compensate for head motion.

Gaze informed user interfaces

Impact of subtle gaze direction on short-term spatial information recall BIBAFull-Text 67-74
  Reynold Bailey; Ann McNamara; Aaron Costello; Srinivas Sridharan; Cindy Grimm
Contents of Visual Short-Term Memory depend highly on viewer attention. It is possible to influence where attention is allocated using a technique called Subtle Gaze Direction (SGD). SGD combines eye tracking with subtle image-space modulations to guide viewer gaze about a scene. Modulations are terminated before the viewer can scrutinize them with high acuity foveal vision. This approach is preferred to overt techniques that require permanent alterations to images to highlight areas of interest. In our study, participants were asked to recall the location of objects or regions in images. We investigated if using SGD to guide attention to these regions would improve recall. Results showed that the influence of SGD significantly improved accuracy of target count and spatial location recall. This has implications for a wide range of applications including spatial learning in virtual environments as well as image search applications, virtual training and perceptually based rendering.
Subtle gaze manipulation for improved mammography training BIBAFull-Text 75-82
  Srinivas Sridharan; Ann McNamara; Cindy Grimm
We use the Subtle Gaze Direction technique (SGD) to guide novices as they try to find abnormalities in mammograms. SGD works by performing image-space modulations on specific regions of the peripheral vision to attract attention. Gaze is monitored and modulations are terminated before they are scrutinized with high-acuity foveal vision. This approach is preferred to overt techniques which permanently alter images to highlight areas of interest. SGD is used to guide novices along the scanpath of an expert radiologist. We hypothesized that this would increase the likelihood of novices correctly identifying irregularities. Results reveal that novices who were guided in this manner performed significantly better than the control group (no gaze manipulation). Furthermore, a short-term post-training lingering effect was observed among subjects guided using SGD. They continued to perform better than the control group once the training was complete and gaze manipulation was disabled.
What do you want to do next: a novel approach for intent prediction in gaze-based interaction BIBAFull-Text 83-90
  Roman Bednarik; Hana Vrzakova; Michal Hradis
Interaction intent prediction and the Midas touch have been a longstanding challenge for eye-tracking researchers and users of gaze-based interaction. Inspired by machine learning approaches in biometric person authentication, we developed and tested an offline framework for task-independent prediction of interaction intents. We describe the principles of the method, the features extracted, normalization methods, and evaluation metrics. We systematically evaluated the proposed approach on an example dataset of gaze-augmented problem-solving sessions. We present results of three normalization methods, different feature sets and fusion of multiple feature types. Our results show that accuracy of up to 76% can be achieved with Area Under Curve around 80%. We discuss the possibility of applying the results for an online system capable of interaction intent prediction.
Gaze guided object recognition using a head-mounted eye tracker BIBAFull-Text 91-98
  Takumi Toyama; Thomas Kieninger; Faisal Shafait; Andreas Dengel
Wearable eye trackers open up a large number of opportunities to cater for the information needs of users in today's dynamic society. Users no longer have to sit in front of a traditional desk-mounted eye tracker to benefit from the direct feedback given by the eye tracker about users' interest. Instead, eye tracking can be used as a ubiquitous interface in a real-world environment to provide users with supporting information that they need. This paper presents a novel application of intelligent interaction with the environment by combining eye tracking technology with real-time object recognition. In this context we present i) algorithms for guiding object recognition by using fixation points ii) algorithms for generating evidence of users' gaze on particular objects iii) building a next generation museum guide called Museum Guide 2.0 as a prototype application of gaze-based information provision in a real-world environment. We performed several experiments to evaluate our gaze-based object recognition methods. Furthermore, we conducted a user study in the context of Museum Guide 2.0 to evaluate the usability of the new gaze-based interface for information provision. These results show that an enormous amount of potential exists for using a wearable eye tracker as a human-environment interface.

Visual attention: studies, tools, methods

Audio description as an aural guide of children's visual attention: evidence from an eye-tracking study BIBAFull-Text 99-106
  Izabela Krejtz; Agnieszka Szarkowska; Krzysztof Krejtz; Agnieszka Walczak; Andrew Duchowski
Audio description (AD) has become a cultural revolution for the visually impaired; however, the range of AD beneficiaries can be much broader. We claim that AD is useful for guiding children's attention. The paper presents an eye-tracking study testing the usefulness of AD in selective attention to described elements of a video scene. Forty-four children watched 2 clips from an educational animation series while their eye movements were recorded. Average fixation duration, fixation count, and saccade amplitude served as primary dependent variables. The results confirmed that AD guides children's attention towards described objects resulting e. g., in more fixations on specific regions of interest. We also evaluated eye movement patterns in terms of switching between focal and ambient processing. We postulate that audio description could complement regular teaching tools for guiding and focusing children's attention, especially when new concepts are introduced.
Let's look at the cockpit: exploring mobile eye-tracking for observational research on the flight deck BIBAFull-Text 107-114
  Nadir Weibel; Adam Fouse; Colleen Emmenegger; Sara Kimmich; Edwin Hutchins
As part of our research on multimodal analysis and visualization of activity dynamics, we are exploring the integration of data produced by a variety of sensor technologies within ChronoViz, a tool aimed at supporting the simultaneous visualization of multiple streams of time series data. This paper reports on the integration of a mobile eye-tracking system with data streams collected from HD video cameras, microphones, digital pens, and simulation environments. We focus on the challenging environment of the commercial airline flight deck, analyzing the use of mobile eye tracking systems in aviation human factors and reporting on techniques and methods that can be applied in this and other domains in order to successfully collect, analyze and visualize eye-tracking data in combination with the array of data types supported by ChronoViz.
Multi-mode saliency dynamics model for analyzing gaze and attention BIBAFull-Text 115-122
  Ryo Yonetani; Hiroaki Kawashima; Takashi Matsuyama
We present a method to analyze a relationship between eye movements and saliency dynamics in videos for estimating attentive states of users while they watch the videos. The multi-mode saliency-dynamics model (MMSDM) is introduced to segment spatio-temporal patterns of the saliency dynamics into multiple sequences of primitive modes underlying the saliency patterns. The MMSDM enables us to describe the relationship by the local saliency dynamics around gaze points, which is modeled by a set of distances between gaze points and salient regions characterized by the extracted modes. Experimental results show the effectiveness of the proposed model to classify the attentive states of users by learning the statistical difference of the local saliency dynamics on gaze-paths at each level of attentiveness.
A robust realtime reading-skimming classifier BIBAFull-Text 123-130
  Ralf Biedert; Jörn Hees; Andreas Dengel; Georg Buscher
Distinguishing whether eye tracking data reflects reading or skimming already proved to be of high analytical value. But with a potentially more widespread usage of eye tracking systems at home, in the office or on the road the amount of environmental and experimental control tends to decrease. This in turn leads to an increase in eye tracking noise and inaccuracies which are difficult to address with current reading detection algorithms. In this paper we propose a method for constructing and training a classifier that is able to robustly distinguish reading from skimming patterns. It operates in real time, considering a window of saccades and computing features such as the average forward speed and angularity. The algorithm inherently deals with distorted eye tracking data and provides a robust, linear classification into the two classes read and skimmed. It facilitates reaction times of 750ms on average, is adjustable in its horizontal sensitivity and provides confidence values for its classification results; it is also straightforward to implement. Trained on a set of six users and evaluated on an independent test set of six different users it achieved a 86% classification accuracy and it outperformed two other methods.

Gaze based interaction

Designing gaze-based user interfaces for steering in virtual environments BIBAFull-Text 131-138
  Sophie Stellmach; Raimund Dachselt
Since eye gaze may serve as an efficient and natural input for steering in virtual 3D scenes, we investigate the design of eye gaze steering user interfaces (UIs) in this paper. We discuss design considerations and propose design alternatives based on two selected steering approaches differing in input condition (discrete vs. continuous) and velocity selection (constant vs. gradient-based). The proposed UIs have been iteratively advanced based on two user studies with twelve participants each. In particular, the combination of continuous and gradient-based input shows a high potential, because it allows for gradually changing the moving speed and direction depending on a user's point-of-regard. This has the advantage of reducing overshooting problems and dwell-time activations. We also investigate discrete constant input for which virtual buttons are toggled using gaze dwelling. As an alternative, we propose the Sticky Gaze Pointer as a more flexible way of discrete input.
Eye-based head gestures BIBAFull-Text 139-146
  Diako Mardanbegi; Dan Witzner Hansen; Thomas Pederson
A novel method for video-based head gesture recognition using eye information by an eye tracker has been proposed. The method uses a combination of gaze and eye movement to infer head gestures. Compared to other gesture-based methods a major advantage of the method is that the user keeps the gaze on the interaction object while interacting. This method has been implemented on a head-mounted eye tracker for detecting a set of predefined head gestures. The accuracy of the gesture classifier is evaluated and verified for gaze-based interaction in applications intended for both large public displays and small mobile phone screens. The user study shows that the method detects a set of defined gestures reliably.
Simple gaze gestures and the closure of the eyes as an interaction technique BIBAFull-Text 147-154
  Henna Heikkilä; Kari-Jouko Räihä
We created a set of gaze gestures that utilize the following three elements: simple one-segment gestures, off-screen space, and the closure of the eyes. These gestures are to be used as the moving tool in a gaze-only controlled drawing application. We tested our gaze gestures with 24 participants and analyzed the gesture durations, the accuracy of the stops, and the gesture performance. We found that the difference in gesture durations between short and long gestures was so small that there is no need to choose between them. The stops made by closing both eyes were accurate, and the input method worked well for this purpose. With some adjustments and with the possibility for personal settings, the gesture performance and the accuracy of the stops can become even better.

Eye tracking systems issues I

Self-localization using fixations as landmarks BIBAFull-Text 155-160
  Lisa M. Tiberio; Roxanne L. Canosa
Self-localization is the process of knowing your position and location relative to your surroundings. This research integrated artificial intelligence techniques into a custom-built portable eye tracker for the purpose of automating the process of determining indoor self-localization. Participants wore the eye tracker and walked a series of corridors while a video of the scene was recorded along with fixation locations. Patches of the scene video without fixation information were used to train the classifier by creating feature maps of the corridors. For testing the classifier, fixation locations in the scene were extracted and used to determine the location of the participant. Scene patches surrounding fixations were used for the classification instead of objects in the environment. This eliminated the need for complex computer vision object recognition algorithms and made scene classification less dependent upon objects and their placement in the environment. This allowed for a sparse representation of the scene since image processing to detect and recognize objects was not necessary to determine location. Experimentally, image patches surrounding fixations were found to be a highly reliable indicator of location, as compared to random image patches, non-fixated salient image patches, or other non-salient scene locations. In some cases, only a single fixation was needed to accurately identify the correct location of the participant. To the best of our knowledge, this technique has not been used before for determining human self-localization in either indoor or outdoor settings.
Measuring cognitive workload across different eye tracking hardware platforms BIBAFull-Text 161-164
  Michael Bartels; Sandra P. Marshall
As pertinent technologies continue to evolve, eye tracking hardware options grow more diverse. Consequently, it is important that researchers verify that new systems and parameters used in testing meet data collection quality standards. The current study evaluated hardware from four manufacturers: SR Research, Seeing Machines, SensoMotoric Instruments and Tobii Technology. The eye trackers included different system types and different sampling rates. The purpose of this research was to determine whether or not the pupil recording of each system was precise enough to effectively utilize the Index of Cognitive Activity, a validated cognitive workload metric. Results indicated that each system effectively captured Index of Cognitive Activity data. System factors such as system type sampling rate did not affect the metric. To maintain the integrity of data collected by succeeding generations of eye tracker, it is important that this type of quality-control research continues.
Parallel scan-path visualization BIBAFull-Text 165-168
  Michael Raschke; Xuemei Chen; Thomas Ertl
Eye tracking analysis is the state of the art technique to study questions of usability and cognition of graphical user interfaces. This paper presents a new technique for the visualization of eye tracking data, the Parallel Scan-Path Visualization. A key feature is the visualization of eye movements of many subjects on a single screen in a parallel layout. The visualization presents various properties of scan-paths, such as fixations, gaze durations and eye shift frequencies at one glance. The paper concludes with an example of use of the Parallel Scan-Path Visualization technique.
Permutation test for groups of scanpaths using normalized Levenshtein distances and application in NMR questions BIBAFull-Text 169-172
  Hui Tang; Joseph J. Topczewski; Anna M. Topczewski; Norbert J. Pienta
This paper presents a permutation test that statistically compares two groups of scanpaths. The test uses normalized Levenshtein distances when the lengths of scanpaths are not the same. This method was applied in a recent eye-tracking experiment in which two groups of chemistry students viewed nuclear magnetic resonance (NMR) spectroscopic signals and chose the corresponding molecular structure from the candidates. A significant difference was detected between the two groups, which is consistent with the fact that students in the expert group showed more efficient scan patterns in the experiment than the novice group. Various numbers of permutations were tested and the results showed that p-values only varied in a small range with different permutation numbers and that the statistical significance was not affected.
Robust real-time pupil tracking in highly off-axis images BIBAFull-Text 173-176
  Lech Swirski; Andreas Bulling; Neil Dodgson
Robust, accurate, real-time pupil tracking is a key component for online gaze estimation. On head-mounted eye trackers, existing algorithms that rely on circular pupils or contiguous pupil regions fail to detect or accurately track the pupil. This is because the pupil ellipse is often highly eccentric and partially occluded by eyelashes. We present a novel, real-time dark-pupil tracking algorithm that is robust under such conditions. Our approach uses a Haar-like feature detector to roughly estimate the pupil location, performs a k-means segmentation on the surrounding region to refine the pupil centre, and fits an ellipse to the pupil using a novel image-aware Random Sample Concensus (RANSAC) ellipse fitting. We compare our approach against existing real-time pupil tracking implementations, using a set of manually labelled infra-red dark-pupil eye images. We show that our technique has a higher pupil detection rate and greater pupil tracking accuracy.
Detection of smooth pursuits using eye movement shape features BIBAFull-Text 177-180
  Mélodie Vidal; Andreas Bulling; Hans Gellersen
Smooth pursuit eye movements hold information about the health, activity and situation of people, but to date there has been no efficient method for their automated detection. In this work we present a method to tackle the problem, based on machine learning. At the core of our method is a novel set of shape features that capture the characteristic shape of smooth pursuit movements over time. The features individually represent incomplete information about smooth pursuits but are combined in a machine learning approach. In an evaluation with eye movements collected from 18 participants, we show that our method can detect smooth pursuit movements with an accuracy of up to 92%, depending on the size of the feature set used for their prediction. Our results have twofold significance. First, they demonstrate a method for smooth pursuit detection in mainstream eye tracking, and secondly they highlight the utility of machine learning for eye movement analysis.

Eye tracking applications I

Parsing visual stimuli into temporal units through eye movements BIBAFull-Text 181-184
  Carlo Robino; Sofia Crespi; Ottavia Silva; Claudio de'Sperati
Automatic segmentation of a video stream poses a serious challenge to multimedia research. Here we explore the idea that temporal segmentation might include the observers' watching style. We propose a way to parse a visual stimulus into temporally-defined units by exploiting the difference of exploratory eye movements between novice and expert observers. The difference was condensed into a single quantity, the quasi-instantaneous spatial extension of the regions fixated significantly longer by either group of observers, which we termed Visual Differential Attractor (VDA). As test-bed, we presented a videotaped billiard match to novice and professional players, and recorded their eye movements. We assessed whether VDA, in tracing over time the oculomotor difference between experts and novices, would mark the individual shots embedded in the movie. Indeed, VDA showed systematic modulations over time, with peaks and toughs occurring before and after the shots, respectively. The effect disappeared by analyzing separately the scanpath of novices and experts. This finding suggests that it is possible to parse a visual stimulus into behaviorally relevant temporal units by comparing the gaze of expert and naïf observers.
Methodological triangulation to assess sign placement BIBAFull-Text 185-188
  Simon J. Buechner; Jan Wiener; Christoph Hölscher
This paper presents a study that investigated the potential effect of an additional sign on people's simulated wayfinding behavior in a transfer situation at an airport. Participants were presented with photographs of the status quo and digitally edited images of the potential redesign. Path choice behavior, gaze behavior and confidence ratings were analyzed. The combination of the three methods proved to capture the situation better than any of the methods alone. The results provide evidence that the re-design has a positive effect on passengers' wayfinding behavior.
Goal-driven and bottom-up gaze in an active real-world search task BIBAFull-Text 189-192
  Tom Foulsham; Alan Kingstone
Mobile eye tracking has become a useful tool in studies of vision and attention in real-world tasks. However, there remains a disconnection between such studies and the laboratory paradigms used by cognitive psychology. In particular, visual search has been studied intensively, but lab search often differs from search in the real world in many respects (e.g., in reality one must walk and move head and eyes to find the target, target and distractors are not equally visible, and objects are frequently occluded). Here, we took a broader view of search behaviour and analyzed the gaze of participants who were asked to walk around within a building, find a room, and then locate a target mailbox. Our aim was to describe the differences in behaviour according to principles of (lab-based) visual search, and we did this by testing the effects of top-down instructions (i.e. having more or less information about where to go) and target saliency (i.e. having a more or less distinctive target to look for). These factors made a difference in a real world context by changing the frequency with which signs and cues in the environment were fixated, and by affecting head and eye movements in the mail-room. Bottom-up saliency had little effect on search time, but our approach revealed how it influenced the coordination of gaze, while still allowing us to make contact with laboratory paradigms.
Using ScanMatch scores to understand differences in eye movements between correct and incorrect solvers on physics problems BIBAFull-Text 193-196
  Adrian Madsen; Adam Larson; Lester Loschky; N. Sanjay Rebello
Using a ScanMatch algorithm we investigate scan path differences between subjects who answer physics problems correctly and incorrectly. This algorithm bins a saccade sequence spatially and temporally, recodes this information to create a sequence of letters representing fixation location, duration and order, and compares two sequences to generate a similarity score. We recorded eye movements of 24 individuals on six physics problems containing diagrams with areas consistent with a novice-like response and areas of high perceptual salience. We calculated average ScanMatch similarity scores comparing correct solvers to one another (C-C), incorrect solvers to one another (I-I), and correct solvers to incorrect solvers (C-I). We found statistically significant differences between the C-C and I-I comparisons on only one of the problems. This seems to imply that top down processes relying on incorrect domain knowledge, rather than bottom up processes driven by perceptual salience, determine the eye movements of incorrect solvers.
Visual attention patterns during program debugging with an IDE BIBAFull-Text 197-200
  Prateek Hejmady; N. Hari Narayanan
Integrated Development Environments (IDE) generate multiple graphical and textual representations of programs. Co-ordination of these representations during program comprehension and debugging can be a complex task. In order to better understand the role and effectiveness of multiple representations, we conducted an empirical study of Java program debugging with a professional, multi-representation IDE. We found that program code and dynamic representations (dynamic viewer, variable watch and output) attracted the most attention of programmers. Static representations like Unified Modeling Language (UML) Diagrams and Control Structure Diagrams (CSD) saw significantly lesser usage. We analyzed gaze patterns by segmenting the debugging sessions into three, five and fifteen minute intervals, and classifying gazes into short and long gazes. Novel data mining techniques were used to detect high frequency patterns from eye tracking data. Visual pattern differences were found among participants based on their programming experience, familiarity with the IDE and debugging performance.
Towards robust gaze-based objective quality measures for text BIBAFull-Text 201-204
  Ralf Biedert; Andreas Dengel; Mostafa Elshamy; Georg Buscher
An increasing amount of text is being read digitally. In this paper we explore how eye tracking devices can be used to aggregate reading data of many readers in order to provide authors and editors with objective and implicitly gathered quality feedback. We present a robust way to jointly evaluate the gaze data of multiple readers, with respect to various reading-related features. We conducted an experiment in which a group of high school students composed essays subsequently read and rated by a group of seven other students. Analyzing the recorded data, we find that the amount of regression targets, the reading-to-skimming ratio, reading speed and reading count are the most discriminative features to distinguish very comprehensible from barely comprehensible text passages. By employing machine learning techniques, we are able to classify the comprehensibility of text automatically with an overall accuracy of 62%.

Eye tracking systems issues II

Error characterization and compensation in eye tracking systems BIBAFull-Text 205-208
  Juan J. Cerrolaza; Arantxa Villanueva; Maria Villanueva; Rafael Cabeza
The development of systems that track the eye while allowing head movement is one of the most challenging objectives of gaze tracking researchers. Tracker accuracy decreases as the subject moves from the calibration position and is especially influenced by changes in depth with respect to the screen. In this paper, we demonstrate that the pattern of error produced due to user movement mainly depends on the system configuration and hardware element placement rather than the user. Thus, we suggest alternative calibration techniques for error reduction that compensate for the lack of accuracy due to subject movement. Using these techniques, we can achieve an error reduction of more than 50%.
Shifts in reported gaze position due to changes in pupil size: ground truth and compensation BIBAFull-Text 209-212
  Jan Drewes; Guillaume S. Masson; Anna Montagnini
Camera-based eye trackers are the mainstay of today's eye movement research and countless practical applications of eye tracking. Recently, a significant impact of changes in pupil size on the accuracy of camera-based eye trackers during fixation has been reported [Wyatt 2010]. We compared the pupil-size effect between a scleral search coil based eye tracker (DNI) and an up-to-date infrared camera-based eye tracker (SR Research Eyelink 1000) by simultaneously recording human eye movements with both techniques. Between pupil-constricted and pupil-relaxed conditions we find a subject-specific shift in reported gaze position exceeding 2 degrees only with the camera based eye tracker, while the scleral search coil system simultaneously reported steady fixation. This confirms that the actual point of fixation did not change during pupil constriction/relaxation, and the resulting shift in measured gaze position is solely an artifact of the camera-based eye tracking system. We demonstrate a method to partially compensate the pupil-based shift using separate calibrations in pupil-constricted and pupil-dilated conditions, with pupil size as an index to dynamically weight the two calibrations.
Automatic acquisition of a 3D eye model for a wearable first-person vision device BIBAFull-Text 213-216
  Akihiro Tsukada; Takeo Kanade
A wearable gaze tracking device can work with users in daily-life. For long time of use, a non-active method that does not employ an infrared illumination system is desirable from safety standpoint. It is well known that the eye model constraints substantially improve the accuracy and robustness of gaze estimation. However, the eye model needs to be calibrated for each person and each device. We propose a method to automatically build the eye model for a wearable gaze tracking device. The key idea is that the eye model, which includes the eye structure and eye-camera relationship, impose constraints on image analysis even when it is incomplete, so we adopt an iterative eye model building process with gradually increasing eye model constraints. Performance of the proposed method is evaluated in various situations, including different eye colors of users and camera configurations. We have confirmed that the gaze tracking system using our eye model works well under general situations: indoor, outdoor and driving scene.
Evaluation of pupil center-eye corner vector for gaze estimation using a web cam BIBAFull-Text 217-220
  Laura Sesma; Arantxa Villanueva; Rafael Cabeza
Low cost eye tracking is an actual challenging research topic for the eye tracking community. Gaze tracking based on a web cam and without infrared light is a searched goal to broaden the applications of eye tracking systems. Web cam based eye tracking results in new challenges to solve such as a wider field of view and a lower image quality. In addition, no infrared light implies that glints cannot be used anymore as a tracking feature. In this paper, a thorough study has been carried out to evaluate pupil (iris) center-eye corner (PC-EC) vector as feature for gaze estimation based on interpolation methods in low cost eye tracking, as it is considered to be partially equivalent to the pupil center-corneal reflection (PC-CR) vector. The analysis is carried out both based on simulated and real data. The experiments show that eye corner positions in the image move slightly when the user is looking at different points of the screen, even with a static head position. This lowers the possible accuracy of the gaze estimation, significantly reducing the accuracy of the system under standard working conditions to 2-3 degrees.
Ego-motion compensation improves fixation detection in wearable eye tracking BIBAFull-Text 221-224
  Thomas Kinsman; Karen Evans; Glenn Sweeney; Tommy Keane; Jeff Pelz
The objective is an efficient means to improve the accuracy of detected fixations. The context is studies of natural behavior of subjects wearing eye trackers while observing distant objects. Fixation detection algorithms try to determine when the image on the retina is stable. Previous algorithms for wearable eye trackers consider only eye-in-head motion. In the presence of the vestibular-ocular response (VOR), however, the motion of the head counteracts eye-in-head rotation. Compensating for this ego-motion increases the number of detected fixations for all subjects. This compensation significantly affects the number and size of the fixations detected, more accurately reflecting mobile observers' natural gaze behavior.

Eye tracking applications II

Gaze input for mobile devices by dwell and gestures BIBAFull-Text 225-228
  Morten Lund Dybdal; Javier San Agustin; John Paulin Hansen
This paper investigates whether it is feasible to interact with the small screen of a smartphone using eye movements only. Two of the most common gaze-based selection strategies, dwell time selections and gaze gestures are compared in a target selection experiment. Finger-strokes and accelerometer-based interaction, i. e. tilting, are also considered. In an experiment with 11 subjects we found gaze interaction to have a lower performance than touch interaction but comparable to the error rate and completion time of accelerometer (i.e. tilt) interaction. Gaze gestures had a lower error rate and were faster than dwell selections by gaze, especially for small targets, suggesting that this method may be the best option for hands-free gaze control of smartphones.
Gaze gestures or dwell-based interaction? BIBAFull-Text 229-232
  Aulikki Hyrskykari; Howell Istance; Stephen Vickers
The two cardinal problems recognized with gaze-based interaction techniques are: how to avoid unintentional commands, and how to overcome the limited accuracy of eye tracking. Gaze gestures are a relatively new technique for giving commands, which has the potential to overcome these problems. We present a study that compares gaze gestures with dwell selection as an interaction technique. The study involved 12 participants and was performed in the context of using an actual application. The participants gave commands to a 3D immersive game using gaze gestures and dwell icons. We found that gaze gestures are not only a feasible means of issuing commands in the course of game play, but they also exhibited performance that was at least as good as or better than dwell selections. The gesture condition produced less than half of the errors when compared with the dwell condition. The study shows that gestures provide a robust alternative to dwell-based interaction with the reliance on positional accuracy being substantially reduced.
The validity of using non-representative users in gaze communication research BIBAFull-Text 233-236
  Howell Istance; Stephen Vickers; Aulikki Hyrskykari
Gaze-based interaction techniques have been investigated for the last two decades, and in many cases the evaluation of these has been based on trials with able-bodied users and conventional usability criteria, mainly speed and accuracy. The target user group of many of the gaze-based techniques investigated is, however, people with different types of physical disabilities. We present the outcomes of two studies that compare the performance of two groups of participants with a type of physical disability (one being cerebral palsy and the other muscular dystrophy) with that of a control group of able-bodied participants doing a task using a particular gaze interaction technique. One study used a task based on dwell-time selection, and the other used a task based on gaze gestures. In both studies, the groups of participants with physical disabilities performed significantly worse than the able-bodied control participants. We question the ecological validity of research into gaze interaction intended for people with physical disabilities that only uses able-bodied participants in evaluation studies without any testing using members of the target user population.
Eye typing of Chinese characters BIBAFull-Text 237-240
  Zhen Liang; Qiang Fu; Zheru Chi
Eye typing is one of the most intensively investigated topics in eye tracking technology. Currently, almost all eye typing systems are developed for English typing. Some preliminary studies have been made on developing eye typing systems for inputting Chinese characters/text. In this paper, a novel eye typing system is proposed for inputting Chinese characters, where a software keyboard is specially designed based on a study of Chinese Pinyin. Experimental results show the efficiency and usability of the proposed system.
The potential of dwell-free eye-typing for fast assistive gaze communication BIBAFull-Text 241-244
  Per Ola Kristensson; Keith Vertanen
We propose a new research direction for eye-typing which is potentially much faster: dwell-free eye-typing. Dwell-free eye-typing is in principle possible because we can exploit the high redundancy of natural languages to allow users to simply look at or near their desired letters without stopping to dwell on each letter. As a first step we created a system that simulated a perfect recognizer for dwell-free eye-typing. We used this system to investigate how fast users can potentially write using a dwell-free eye-typing interface. We found that after 40 minutes of practice, users reached a mean entry rate of 46 wpm. This indicates that dwell-free eye-typing may be more than twice as fast as the current state-of-the-art methods for writing by gaze. A human performance model further demonstrates that it is highly unlikely traditional eye-typing systems will ever surpass our dwell-free eye-typing performance estimate.

Systems, tools, methods

Analysing the potential of adapting head-mounted eye tracker calibration to a new user BIBAFull-Text 245-248
  Benedict Fehringer; Andreas Bulling; Antonio Krüger
A key issue with state-of-the-art mobile eye trackers, particularly during long-term recordings in daily life, is the need for cumbersome and time consuming (re)calibration. To reduce this burden, in this paper we investigate the feasibility of adapting the calibration obtained for one user to another. Calibration adaptation is automatically performed using a light-weight linear translation. We compare three different methods to compute the translation: "multi-point", where all calibration-points are used, "1-point", and "0-point" that uses only an external parameter. We evaluate these methods in a 6-participant user study in a controlled laboratory setting by measuring the error in visual angle between the predicted gaze point and the true gaze point. Our results show that, averaged across all participants, the best adapted calibration is only 0.8° (mean) off the calibration obtained for that specific user. We also show the potential of the 1-point and 0-point methods compared to the time-consuming multi-point computation.
Long range eye tracking: bringing eye tracking into the living room BIBAFull-Text 249-252
  Craig Hennessey; Jacob Fiset
The demand for improved human computer interaction will lead to increasing adoption of eye tracking in everyday devices. For interaction with devices such as Smart TVs, the eye tracker must operate in more challenging environments such as the home living room. In this paper we present a non-contact eye tracking system that allows for freedom of viewer motion in a living room environment. A pan and tilt mechanism is used to orient the eye tracker, guided by face tracking information from a wide-angle camera. The estimated point of gaze is corrected for viewer movement in realtime, avoiding the need for recalibration. The proposed technique achieves comparable accuracy to desktop systems near the calibration position of less than 1° of visual angle and accuracy of less than 2° of visual angle when the viewer moved a large distance, such as standing or sitting on the other side of the couch. The system performance achieved was more than sufficient to operate a novel, hands-free Smart TV interface.
A general framework for extension of a tracking range of user-calibration-free remote eye-gaze tracking systems BIBAFull-Text 253-256
  Dmitri Model; Moshe Eizenman
Stereo-camera Remote Eye-Gaze Tracking (REGT) systems can provide calibration-free estimation of gaze. However, such systems have a limited tracking range due to the requirement for the eye to be tracked in both cameras. This paper presents a general framework for extension of a tracking range of stereo-camera user-calibration-free REGT systems. The proposed method consists of two distinct phases. In the brief initial phase, estimates of eye-features [the center of the pupil and corneal reflections] in pairs of stereo-images are used to estimate automatically a set of subject-specific eye parameters. In the second phase, these subject-specific eye parameters are used with estimates of eye-features in images from any one of the systems' cameras to compute the Point-of-Gaze (PoG). Experiments were conducted with a system that includes two cameras in a horizontal plane. The experimental results demonstrate that the tracking range for horizontal gaze directions can be extended by more than 50%: from ±23.2° when the two cameras are used as a stereo pair to ±35.5° when the two cameras are used independently to estimate the PoG. By adding more cameras to the system, the proposed framework allows further extension of the tracking range in both horizontal and vertical direction, while preserving a user-calibration-free status of a REGT system.
Mathematical model for wide range gaze tracking system based on corneal reflections and pupil using stereo cameras BIBAFull-Text 257-260
  Takashi Nagamatsu; Michiya Yamamoto; Ryuichi Sugano; Junzo Kamahara
In this paper, we propose a mathematical model for a wide range gaze tracking system based on corneal reflections and pupil using calibrated stereo cameras and light sources. We demonstrate a general calculation method for estimating the optical axis of the eye for a combination of non-coaxial and coaxial configurations of many cameras and light sources. Gaze estimation is possible only when light is reflected from the spherical surface of the cornea. Moreover, we provide a method for calculating the eye rotation range where gaze tracking can be achieved, which is useful for positioning cameras and light sources in real world applications.
Towards pervasive eye tracking using low-level image features BIBAFull-Text 261-264
  Yanxia Zhang; Andreas Bulling; Hans Gellersen
We contribute a novel gaze estimation technique, which is adaptable for person-independent applications. In a study with 17 participants, using a standard webcam, we recorded the subjects' left eye images for different gaze locations. From these images, we extracted five types of basic visual features. We then sub-selected a set of features with minimum Redundancy Maximum Relevance (mRMR) for the input of a 2-layer regression neural network for estimating the subjects' gaze. We investigated the effect of different visual features on the accuracy of gaze estimation. Using machine learning techniques, by combing different features, we achieved average gaze estimation error of 3.44° horizontally and 1.37° vertically for person-dependent.
A GPU-accelerated software eye tracking system BIBAFull-Text 265-268
  Jeffrey B. Mulligan
Current microcomputers are powerful enough to implement a realtime eye tracking system, but the computational throughput still limits the types of algorithms that can be implemented in real time. Many of the image processing algorithms that are typically used in eye tracking applications can be significantly accelerated when the processing is delegated to a graphics processing unit (GPU). This paper describes a real-time gaze tracking system developed using the CUDA programming environment distributed by nVidia. The current implementation of the system is capable of processing a 640 by 480 image in less than 4 milliseconds, and achieves an average accuracy close to 0.5 degrees of visual angle.
Extending the visual field of a head-mounted eye tracker for pervasive eye-based interaction BIBAFull-Text 269-272
  Jayson Turner; Andreas Bulling; Hans Gellersen
Pervasive eye-based interaction refers to the vision of eye-based interaction becoming ubiquitously usable in everyday life, e. g. across multiple displays in the environment. While current head-mounted eye trackers work well for interaction with displays at similar distances, the scene camera often fails to cover both remote and close proximity displays, e. g. a public display on a wall and a handheld portable device. In this paper we describe an approach that allows for robust detection and gaze mapping across multiple such displays. Our approach uses an additional scene camera to extend the viewing and gaze mapping area of the eye tracker and automatically switches between both cameras depending on the display in view. Results from a pilot study show that our system achieves a similar gaze estimation accuracy to a single-camera system while at the same time increasing usability.
Gaze tracking in wide area using multiple camera observations BIBAFull-Text 273-276
  Akira Utsumi; Kotaro Okamoto; Norihiro Hagita; Kazuhiro Takahashi
We propose a multi-camera-based gaze tracking system that provides a wide observation area. In our system, multiple camera observations are used to expand the detection area by employing mosaic observations. Each facial feature and eye region image can be observed by different cameras, and in contrast to stereo-based systems, no shared observations are required. This feature relaxes the geometrical constraints in terms of head orientation and camera viewpoints and realizes wide availability of gaze tracking with a small number of cameras. In experiments, we confirmed that our implemented system can track head rotation of 120° with two cameras. The gaze estimation accuracy is 5.4° horizontally and 9.7° vertically.
Eye tracking on unmodified common tablets: challenges and solutions BIBAFull-Text 277-280
  Corey Holland; Oleg Komogortsev
This work describes the design and implementation of an eye tracking system on an unmodified common tablet PC. A neural network eye tracker is employed as a solution to eye tracking in the visible spectrum of light. We discuss the challenges related to image recognition and processing, and provide an objective evaluation of the accuracy and sampling rate of eye-gaze-based interaction with such an eye tracker. The results indicate that it is possible to obtain an average accuracy of 4.42° and a sampling rate of 0.70 Hz with the described system.
Comparison of eye movement filters used in HCI BIBAFull-Text 281-284
  Oleg Spakov
We compared various real-time filters designed to denoise eye movements from low-sampling devices. Most of the filters found in literature were implemented and tested on data gathered in a previous study. An improvement was proposed for one of the filters. Parameters of each filter were adjusted to ensure their best performance. Four estimation parameters were proposed as criteria for comparison. The output from the filters was compared against two idealized signals (the signals denoised offline). The study revealed that FIR filters with triangular or Gaussian kernel (weighting) functions and parameters dependent on signal state show the best performance.
Bayesian online clustering of eye movement data BIBAFull-Text 285-288
  Enkelejda Tafaj; Gjergji Kasneci; Wolfgang Rosenstiel; Martin Bogdan
The task of automatically tracking the visual attention in dynamic visual scenes is highly challenging. To approach it, we propose a Bayesian online learning algorithm. As the visual scene changes and new objects appear, based on a mixture model, the algorithm can identify and tell visual saccades (transitions) from visual fixation clusters (regions of interest). The approach is evaluated on real-world data, collected from eye-tracking experiments in driving sessions.
The precision of eye-trackers: a case for a new measure BIBAFull-Text 289-292
  Pieter Blignaut; Tanya Beelders
Several possible measures for the precision of an eye-tracker exist. The fact that the commonly used measures of standard deviation and RMS lack with respect to their ability to produce replicable results with varying frame rate, gaze distance and arrangement of samples within a fixation, makes it difficult to compare eye-trackers. It is proposed that an area-based measure, BCEA, is adapted to provide a one dimensional quantity that is intuitive, independent of frame rate and sensitive to small jerks in the reported fixation position.
TrackStick: a data quality measuring tool for Tobii eye trackers BIBAFull-Text 293-296
  Pieter Blignaut; Tanya Beelders
It is important that eye-tracking studies report the accuracy and precision of the eye tracker. It is argued that the values provided by the manufacturers are representative of the best possible capability of the eye tracker under ideal conditions and for participants with good tracking probabilities. A tool is introduced that will allow researchers to determine the actual data quality as it applies for individual participants at the time of data capturing. Results of a study where the tool was implemented is discussed and compared with the accuracy and precision values as reported by the manufacturer for the same model of eye-tracker.
Entropy-based correction of eye tracking data for static scenes BIBAFull-Text 297-300
  Samuel John; Erik Weitnauer; Hendrik Koesling
In a typical head-mounted eye tracking system, any small slippage of the eye tracker headband on the participant's head leads to a systematic error in the recorded gaze positions. While various approaches exist that reduce these errors at recording time, only few methods reduce the errors of a given tracking system after recording. In this paper we introduce a novel correction algorithm that can significantly reduce the drift in recorded gaze data for eye tracking experiments that use static stimuli. The algorithm is entropy-based and needs no prior knowledge about the stimuli shown or the tasks participants accomplish during the experiment.
A flexible gaze tracking algorithm evaluation workbench BIBAFull-Text 301-304
  Detlev Droege; Dietrich Paulus
The development of gaze tracking algorithms is very much bound to the specific setup and properties of the respective system they are used in. This makes it hard e. g. to compare their performance. We propose Gazelnut, a modular system to ease the development and comparison of gaze tracking algorithms, which also makes it independent from the permanent access to specific hardware.
   Building on the message passing architecture of the "robot operating system" (ROS) the system provides a flexible base to record and replay sessions, record the input from multiple cameras, run exchangeable algorithms on such sessions, store their individual results on the recorded (or live) scene, run different algorithms in parallel to compare their results and attach additional diagnostic modules to the running system.
An eye tracking dataset for point of gaze detection BIBAFull-Text 305-308
  Christopher D. McMurrough; Vangelis Metsis; Jonathan Rich; Fillia Makedon
This paper presents a new, publicly available eye tracking dataset, aimed to be used as a benchmark for Point of Gaze (PoG) detection algorithms. The dataset consists of a set of videos recording the eye motion of human test subjects as they were looking at, or following, a set of predefined points of interest on a computer visual display unit. The eye motion was recorded using a Mobile Eye, head mounted, infrared monocular camera. The ground truth of the point of gaze and head location and direction in the three dimensional space are provided together with the data. The ground truth regarding the point of gaze at is known in advance since the subjects are always looking at predefined targets, whereas, the head position in 3D is captured using a Vicon Motion Tracking System.
Measuring gaze overlap on videos between multiple observers BIBAFull-Text 309-312
  Geoffrey Tien; M. Stella Atkins; Bin Zheng
For gaze-based training in surgery to be meaningful, the similarity between a trainee's gaze and an expert's gaze during performance of surgical tasks must be assessed. As it is difficult to record two people's gaze simultaneously, we produced task videos made by experts, and measured the amount of overlap between the gaze path of the expert surgeon and third-party observers while watching the videos. For this investigation, we developed a new, simple method for displaying and summarizing the proportion of time during which two observers' points of gaze on a common stimulus were separated by no more than a specified visual angle.
   In a study of single-observer self-review and multiple-observer initial view of a laparoscopic training task, we predicted that self-review would produce the highest overlap. We found relatively low overlap between watchers and the task performer; even operators with detailed task knowledge produce low overlap when watching their own videos. Conversely, there was a high overlap among all watchers. Results indicate that it may be insufficient to improve trainees' eye-hand coordination by just watching a video. Gaze training will need to be integrated with other teaching methods to be effective.
Towards location-aware mobile eye tracking BIBAFull-Text 313-316
  Peter Kiefer; Florian Straub; Martin Raubal
This paper considers the impact of location as context in mobile eye tracking studies that extend to large-scale spaces, such as pedestrian wayfinding studies. It shows how adding a subject's location to her gaze data enhances the possibilities for data visualization and analysis. Results from an explorative pilot study on mobile map usage with a pedestrian audio guide demonstrate that the combined recording and analysis of gaze and position can help to tackle research questions on human spatial problem solving in a novel way.
Identifying parameter values for an I-VT fixation filter suitable for handling data sampled with various sampling frequencies BIBAFull-Text 317-320
  Anneli Olsen; Ricardo Matos
Selecting values for fixation filters is a difficult task as not only the specifics of the selected filter algorithm has to be taken into account, but also what it is going to be used for and by whom. In this paper the selection and testing process of values for an I-VT fixation filter algorithm implementation is described.
Comparison of eye movement metrics recorded at different sampling rates BIBAFull-Text 321-324
  Andrew D. Ouzts; Andrew T. Duchowski
Previous work has shown significant differences in eye movement metrics recorded by devices differing in sampling rates. Two schools of thought have emerged on how to effectively compare such apparently disparate data. The first, termed here as upsampling, strives to process eye movement data recorded at a low sampling rate to allow comparison with data recorded at a high sampling rate, e. g., by fitting a cubic spline to the signal derivative (i.e., velocity). Instead, we suggest downsampling based on a two-pass solution in which data is first downsampled and smoothed prior to its velocity-based classification. Results indicate that given a similar experimental task, this approach gives more equitable results than other single-pass classification methods as they typically do not explicitly consider sampling rates.
On the conspicuity of 3-D fiducial markers in 2-D projected environments BIBAFull-Text 325-328
  Andrew D. Ouzts; Andrew T. Duchowski; Toni Gomes; Rupert A. Hurley
Fiducial markers are used with head-mounted eye trackers to facilitate eye movement data aggregation for quantitative analysis. However, use of these markers may be problematic in some situations (e.g., natural tasks) as the markers may be visually distracting. To date, we are aware of no study that has examined the conspicuity of such markers to determine how much (if any) effort should be expended into concealing them from view. This paper presents a study that examines Tobii's infra-red (IR) markers' conspicuity in a 2-D projected environment. Results indicate that even when these 3-D markers are superimposed on a canvas on which the 2-D environment is projected, and no effort is taken to hide them (i.e., by minimizing contrast with the background), the presence of markers does not significantly alter the number or duration of fixations on the location of the markers when a specific task is given.
Voice activity detection from gaze in video mediated communication BIBAFull-Text 329-332
  Michal Hradis; Shahram Eivazi; Roman Bednarik
This paper discusses estimation of active speaker in multi-party video-mediated communication from gaze data of one of the participants. In the explored settings, we predict voice activity of participants in one room based on gaze recordings of a single participant in another room. The two rooms were connected by high definition, low delay audio and video links and the participants engaged in different activities ranging from casual discussion to simple problem-solving games. We treat the task as a classification problem. We evaluate several types of features and parameter settings in the context of Support Vector Machine classification framework. The results show that using the proposed approach vocal activity of a speaker can be correctly predicted in 89% of the time for which the gaze data are available.
Incorporating visual field characteristics into a saliency map BIBAFull-Text 333-336
  Hideyuki Kubota; Yusuke Sugano; Takahiro Okabe; Yoichi Sato; Akihiro Sugimoto; Kazuo Hiraki
Characteristics of the human visual field are well known to be different in central (fovea) and peripheral areas. Existing computational models of visual saliency, however, do not take into account this biological evidence. The existing models compute visual saliency uniformly over the retina and, thus, have difficulty in accurately predicting the next gaze (fixation) point. This paper proposes to incorporate human visual field characteristics into visual saliency, and presents a computational model for producing such a saliency map. Our model integrates image features obtained by bottom-up computation in such a way that weights for the integration depend on the distance from the current gaze point where the weights are optimally learned using actual saccade data. The experimental results using a large number of fixation/saccade data with wide viewing angles demonstrate the advantage of our saliency map, showing that it can accurately predict the point where one looks next.

Uses and applications

Measuring the performance of gaze and speech for text input BIBAFull-Text 337-340
  T. R. Beelders; P. J. Blignaut
A popular word processor application was adapted to include the use of eye gaze and speech as a modality for text entry. An onscreen keyboard was used whereby users were expected to focus on the desired character and then issue a verbal command in order to type the character in the document. Measures of speed and accuracy were captured and analyzed. Results indicate that the keyboard is superior to the gaze and speech entry method in terms of both speed and accuracy. Keyboard button sizes and spacing between the buttons did not affect either measure in any way.
Typing with eye-gaze and tooth-clicks BIBAFull-Text 341-344
  Xiaoyu (Amy) Zhao; Elias D. Guestrin; Dimitry Sayenko; Tyler Simpson; Michel Gauthier; Milos R. Popovic
In eye-gaze-based human-computer interfaces, the most commonly used mechanism for generating activation commands (i.e., mouse clicks) is dwell time (DT). While DT can be relatively efficient and easy to use, it is also associated with the possibility of generating unintentional activation commands -- an issue that is known as the Midas' touch problem. To address this problem, we proposed to use a "tooth-clicker" (TC) device as a mechanism for generating activation commands independently of the activity of the eyes.
   This paper describes a pilot study that verifies the feasibility of using an eye-gaze tracker (EGT) and a TC to type on an on-screen keyboard, and compares the performance of the EGT-TC system with that of the EGT with two different DT thresholds (880 ms and 490 ms). The six subjects that participated in the study were able to attain typing speeds using the EGT-TC system that were slower than but comparable to the typing speeds that they attained using the EGT with the shorter DT threshold.
The effect of clicking by smiling on the accuracy of head-mounted gaze tracking BIBAFull-Text 345-348
  Ville Rantanen; Jarmo Verho; Jukka Lekkala; Outi Tuisku; Veikko Surakka; Toni Vanhala
The effect of facial behaviour on gaze tracking accuracy was studied while using a prototype system that integrated head-mounted, video-based gaze tracking and a capacitive facial movement detection for respective pointing and selecting objects in a simple graphical user interface. Experiments were carried out to determine how voluntary smiling movements that were used to indicate clicks affect the accuracy of gaze tracking due to the combination of user eye movement behaviour and the operation of gaze tracking algorithms. The results showed no observable degradation of the gaze tracking accuracy when using voluntary smiling for object selections.
Using eye gaze and speech to simulate a pointing device BIBAFull-Text 349-352
  T. R. Beelders; P. J. Blignaut
The performance of eye gaze and speech when used as a pointing device was tested using the ISO multi-directional tapping task. Eye gaze and speech were used for target selection as is, as well as with the use of a gravitational well and in conjunction with magnification. These selection methods were then compared to the mouse. The mouse was far superior in terms of performance when selecting targets, although the use of a gravitational well did increase the performance of eye gaze and speech. However, magnification did not improve the use of gaze and speech as a pointing device.
Dynamic context switching for gaze based interaction BIBAFull-Text 353-356
  Antonio Diaz Tula; Filipe M. S. de Campos; Carlos H. Morimoto
This paper introduces Dynamic Context Switching (DCS) as an extension of the Context Switching (CS) paradigm for gaze-based interaction. CS replicates information in each context. The user can freely explore one context without worrying about the Midas touch problem, and a saccade to the other context triggers the selection of the item under focus. Because CS has to display two contexts simultaneously, the amount of useful screen space is limited. DCS dynamically adjusts the context sizes, where the context that has the focus is displayed in full size, while the other is minimized, thus improving useful screen space. A saccade to the minimized context triggers selection, and properly readjusts the sizes of the contexts. Results from a pilot user experiment show that DCS improves user performance and do not cause disorientation due to the dynamic context resizing.
Investigating gaze-supported multimodal pan and zoom BIBAFull-Text 357-360
  Sophie Stellmach; Raimund Dachselt
Remote pan-and-zoom control for the exploration of large information spaces is of interest for various application areas, such as browsing through medical data in sterile environments or investigating geographic information systems on a distant display. In this context, considering a user's visual attention for pan-and-zoom operations could be of interest. In this paper, we investigate the potential of gaze-supported panning in combination with different zooming modalities: (1) a mouse scroll wheel, (2) tilting a handheld device, and (3) touch gestures on a smartphone. Thereby, it is possible to zoom in at a location a user currently looks at (i.e., gaze-directed pivot zoom). These techniques have been tested with Google Earth by ten participants in a user study. While participants were fastest with the already familiar mouse-only base condition, the user feedback indicates a particularly high potential of the gaze-supported pivot zooming in combination with a scroll wheel or touch gesture.
Universal eye-tracking based text cursor warping BIBAFull-Text 361-364
  Ralf Biedert; Andreas Dengel; Christoph Käding
In this paper we present an approach to build an eye-tracking based text cursor placement system. When triggered, the system employs a computer vision based analysis of the screen's content around the current gaze position to find the most likely designated gaze target. Eventually it synthesizes a mouse event at that position, allowing for a rapid text cursor repositioning even in applications which do not support eye tracking explicitly. For our system we compared three different computer vision methods in a simulation run and evaluated the best candidate in two double blinded user studies. We used a total of 19 participants to assess the system's objective and perceived end user speed up. We can demonstrate that in terms of reposition time the OCR based method is superior to the other tested methods, it also beats common keyboard-mouse interaction for some users. We conclude that while the tool was almost universally preferred subjectively over keyboard-mouse interaction, the highest speed can be achieved by using the right amount of eye tracking.
Gaming with gaze and losing with a smile BIBAFull-Text 365-368
  Anders Møller Nielsen; Anders Lerchedahl Petersen; John Paulin Hansen
This paper presents an experiment comparing performance and user experience of gaze and mouse interaction in a minimalistic 3D flying game that only required steering. Mouse interaction provided better performance and participants considered it less physical and mental demanding, less frustrating and less difficult to maneuver. Gaze interaction, however, yielded higher levels of entertainment and engagement. The paper suggests that gaze steering provides a high kinesthetic pleasure both because it is difficult to master and because it presents a unique mapping between fixation and locomotion.
Content based recommender system by using eye gaze data BIBAFull-Text 369-372
  Daniela Giordano; Isaak Kavasidis; Carmelo Pino; Concetto Spampinato
In this work, we present a proactive content based recommender system that employs web document clustering performed by using eye gaze data. Generally, recommender systems are used in commercial applications, where information about the user's habits and interests are of crucial importance in order to plan marketing strategies, or in information retrieval systems in order to suggest similar resources a user is interested in. Commonly, these systems use explicit relevance feedback techniques (e.g. mouse or keyboard) to improve their performance and to recommend products. In contrast, the proposed system permits to capture user's interest by using implicit relevance feedback, based on data acquired by an eye tracker Tobii T60. The purpose of the system is to collect eye gaze data during web navigation and, by employing clustering techniques, to suggest web documents similar to those that the user, implicitly, expressed greater interest. Performance evaluation was carried out on 30 users and the results show that the proposed system enhanced navigation experience in about 73% of the cases.
Using eye-tracking data for automatic film comic creation BIBAFull-Text 373-376
  Masahiro Toyoura; Tomoya Sawada; Mamoru Kunihiro; Xiaoyang Mao
A film comic is a kind of art work representing a movie story as a comic. It uses the images of the movie as panels. Verbal information such as dialogue and narrations is represented in word balloons. A key issue in creating film comics is how to select images which are significant in conveying the story of the movie. Such significance of images is inherently semantic and context-dependent and hence, technologies purely based on image analysis usually fail to produce good results. On the other hand, the word balloon arrangement requires understanding not only the semantic of images but also the verbal information, which is difficult except for the case the script of the movie is available. This paper describes a new attempt to use eye-tracking data for the automatic creation of a film comic from a movie. Patterns of eye movement are analyzed for detecting the change of scenes and gaze information is used for automatically finding the location for inserting and directing the word balloons. Our experiments showed that the proposed technique can largely improve the selection of significant images compared with the method using image features only and realize the automatic balloon arrangement.
Gaze behaviour of expert and novice microneurosurgeons differs during observations of tumor removal recordings BIBAFull-Text 377-380
  Shahram Eivazi; Roman Bednarik; Markku Tukiainen; Mikael von und zu Fraunberg; Ville Leinonen; Juha E. Jääskeläinen
Differences between visual attention strategies of experts and novices have been investigated in many fields, but little has been done in the field of microneurosurgery. In the hands of an experienced surgeon, microneurosurgery seems like an elegant, routine and clean procedure with minimal blood loss. However, microneurosurgery is a multifaceted task with clinical risks associated to surgeons' skills. In a preliminary study, eye movements of eight surgeons were recorded while observing four images representing four phases in a tumor removal surgery. A comparison of the eye movement strategies shows clear markers of expertise depending on the phase of the surgery.
An eye-tracking study on the role of scan time in finding source code defects BIBAFull-Text 381-384
  Bonita Sharif; Michael Falcone; Jonathan I. Maletic
An eye-tracking study is presented that investigates how individuals find defects in source code. This work partially replicates a previous eye-tracking study by Uwano et al. [2006]. In the Uwano study, eye movements are used to characterize the performance of individuals in reviewing source code. Their analysis showed that subjects who did not spend enough time initially scanning the code tend to take more time finding defects. The study here follows a similar setup with added eye-tracking measures and analyses on effectiveness and efficiency of finding defects with respect to eye gaze. The subject pool is larger and is comprised of a varied skill level. Results indicate that scanning significantly correlates with defect detection time as well as visual effort on relevant defect lines. Results of the study are compared and contrasted to the Uwano study.
Reading and estimating gaze on smart phones BIBAFull-Text 385-388
  Ralf Biedert; Andreas Dengel; Georg Buscher; Arman Vartan
While lots of reading happens on mobile devices, little research has been performed on how the reading-interaction actually takes place. Therefore we describe our findings on a study conducted with 18 users which were asked to read a number of texts while their touch and gaze data was being recorded. We found three reader types and identified their preferred alignment of text on the screen. Based on our findings we are able to computationally estimate the reading area with an approximate .81 precision and .89 recall. Our computed reading speed estimate has an average 10.9% wpm error in contrast to the measured speed, and combining both techniques we can pinpoint the reading location at a given time with an overall word error of 9.26 words, or about three lines of text on our device.
Revisiting Russo and Leclerc BIBAFull-Text 389-392
  Poja Shams; Erik Wästlund; Lars Witell
In this paper, we revisit a seminal research contribution by Russo and Leclerc [1994], which identified three stages of the consumer choice process; (1) orientation, (2) evaluation, and (3) verification. Their three stage model broke with previous research favoring two stage models and it disconfirmed the models of planned analysis of choice in favor of an adaptive and constructive process [Wedel and Pieters 2008]. The aim of this paper is to replicate the original study by Russo and Leclerc [1994] to better understand the characteristics of the different stages of the consumer choice process. We argue that such a replication is needed due to the advancements in the technology of eye-tracking during the last 15 years and the detrimental effects of think-aloud protocols. In general, our replication of the research by Russo and Leclerc [1994] confirms the three stage model they suggested by, but we identify some noteworthy differences regarding the time it takes to make a decision and the mean observation time in the three stages..
Learning eye movement patterns for characterization of perceptual expertise BIBAFull-Text 393-396
  Rui Li; Jeff Pelz; Pengcheng Shi; Cecilia Ovesdotter Alm; Anne R. Haake
Human perceptual expertise has significant influence on medical image inspection. However, little is known regarding whether experts differ in their cognitive processing or what effective visual strategies they employ for examining medical images. To remedy this, we conduct an eye tracking experiment and collect both eye movement and verbal description data from three groups of subjects with different medical training levels. Each subject examines and describes 42 photographic dermatological images. We then develop a hierarchical probabilistic framework to extract the common and unique eye movement patterns exhibited among multiple subjects' fixation and saccadic eye movements within each expertise-specific group. Furthermore, experts' annotations of thought units on the transcribed verbal descriptions are time-aligned with these eye movement patterns to identify their semantic meanings. In this work, we are able to uncover the manner in which these subjects alternated their viewing strategies over the course of inspection, and additionally extract their perceptual expertise so that it can be used for advanced medical image understanding.
Visual attention to television programs with a second-screen application BIBAFull-Text 397-400
  Michael E. Holmes; Sheree Josephson; Ryan E. Carney
This study examined participants' visual attention via eye-movement patterns as they watched two television shows -- one a drama, the other a documentary -- while interacting with synchronized second-screen applications introduced in spring 2011. The second screen garnered considerable visual attention, about 30% of the total viewing session. Visual attention went to the tablet screen even without a recent "push" of interactive content and without advertising content on the TV screen. However, interactive content and TV advertising did trigger more attention to the tablet app. The presence of the second screen also dramatically decreased the average gaze length on TV as described in previous research.
Prisoners and chickens: gaze locations indicate bounded rationality BIBAFull-Text 401-404
  Peter G. Mahon; Roxanne L. Canosa
Eye-tracking was used to predict choices made during play of a series of computer-generated simultaneous normal-form games. Four normal-form games were used as the test bed for the eye-tracking experiment: the Coordination Game, Battle of the Sexes, the Game of Chicken, and Prisoner's Dilemma. These games are abstractions of real-life scenarios where a person must make a choice to either cooperate with another person for some common good, or not cooperate, given a specific "payoff" for cooperating or not cooperating. The other player was always an automated agent whose goal was to predict the choice of the human player. Players were found to cluster into different types according to a numeric index specific to the game played. An eye-tracking experiment confirms that attention deployed to particular areas of interest varies according to the game played and the type to which a player belongs. This enabled a decision tree to be created from the eye-tracking data which was used by the agent to classify each player as a specific type, allowing a prediction to be made about a player's likely choice.
Saccadic delays on targets while watching videos BIBAFull-Text 405-408
  M. Stella Atkins; Xianta Jiang; Geoffrey Tien; Bin Zheng
To observe whether there is a difference in eye gaze between doing a task, and watching a video of the task, we recorded the gaze of 17 subjects performing a simple surgical eye-hand coordination task. We also recorded eye gaze of the same subjects later while they were watching videos of their performance.
   We divided the task into 9 or more sub-tasks, each of which involved a large hand movement to a new target location. We analyzed the videos manually and located the video frame for each sub-task where the operator's saccadic movement began, and the frame where the watcher's eye movement began. We found a consistent delay of about 600 ms between initial eye movement when doing the task, and initial eye movement when watching the task, observed in 96.3% of the sub-tasks.
   For the first time, we have quantified the differences between doing and watching a manual task. This will help develop gaze-based training strategies for manual tasks.
How to measure monitoring performance of pilots and air traffic controllers BIBAFull-Text 409-412
  Catrin Hasse; Dietrich Grasshoff; Carmen Bruder
In prior research on the future of aviation it was established that operators will have to work with highly automated systems. Increasing automation will require operators monitoring appropriately (OMA). OMA are expected to demonstrate the use of distinctly different monitoring phases (orientation, anticipation, detection, and recheck). Within these phases, they must grasp in time the relevant information that would enable them to take control should automation fail. The presented study aims at finding appropriate measurements for the identification of OMA on the basis of eye tracking. In order to do this, a normative model of adequate monitoring behavior was designed including the definition of areas of interest. We tested 90 participants who had to monitor a dynamic automatic process, and then take control. In order to decide on suitable eye tracking parameters it was asked which parameters are significantly related to manual control performance. The results show that the suitability of parameters depends on the specific phase of the monitoring process. Gaze durations allow for differentiating between high and low performing subjects during orientation phases. In contrast, relative fixation counts are suitable for predicting monitoring performance during detection phases. In general, the results support the assumption that eye tracking parameters are appropriate for identifying OMA.
Exploring the effects of visual cognitive load and illumination on pupil diameter in driving simulators BIBAFull-Text 413-416
  Oskar Palinko; Andrew L. Kun
Pupil diameter is an important measure of cognitive load. However, pupil diameter is also influenced by the amount of light reaching the retina. In this study we explore the interaction between these two effects in a simulated driving environment. Our results indicate that it is possible to separate the effects of illumination and visual cognitive load on pupil diameter, at least in certain situations.