HCI Bibliography Home | HCI Conferences | PUI Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
PUI Tables of Contents: 01

Proceedings of the 2001 Workshop on Perceptive User Interfaces

Fullname:Proceedings of the 2001 Workshop on Perceptive User Interfaces
Location:Orlando, Florida
Dates:2001-Nov-15 to 2001-Nov-16
Standard No:ACM DL: Table of Contents; hcibib: PUI01
Links:Workshop Website
  1. Paper session #1
  2. Panel on augmented cognition
  3. Posters & Demos
  4. Posters & demos
  5. Paper session #2
  6. Paper session #3
  7. Posters & demos
  8. Paper session #4

Paper session #1

Experimental evaluation of vision and speech based multimodal interfaces BIBAFull-Text 1
  Emilio Schapira; Rajeev Sharma
Progress in computer vision and speech recognition technologies has recently enabled multimodal interfaces that use speech and gestures. These technologies offer promising alternatives to existing interfaces because they emulate the natural way in which humans communicate. However, no systematic work has been reported that formally evaluates the new speech/gesture interfaces. This paper is concerned with formal experimental evaluation of new human-computer interactions enabled by speech and hand gestures.
   The paper describes an experiment conducted with 23 subjects that evaluates selection strategies for interaction with large screen displays. The multimodal interface designed for this experiment does not require the user to be in physical contact with any device. Video cameras and long range microphones are used as input for the system. Three selection strategies are evaluated and results for Different target sizes and positions are reported in terms of accuracy, selection times and user preference. Design implications for vision/speech based interfaces are inferred from these results. This study also raises new question and topics for future research.
Note: 9 pages
Human-robot interface based on the mutual assistance between speech and vision BIBAFull-Text 2
  Mitsutoshi Yoshizaki; Yoshinori Kuno; Akio Nakamura
This paper presents a user interface for a service robot that can bring the objects asked by the user. Speech-based interface is appropriate for this application. However, it alone is not sufficient. The system needs a vision-based interface to recognize gestures as well. Moreover, it needs vision capabilities to obtain the real world information about the objects mentioned in the user's speech. For example, the robot needs to find the target object ordered by speech to carry out the task. This can be considered that vision assists speech. However, vision sometimes fails to detect the objects. Moreover, there are objects for which vision cannot be expected to work well. In these cases, the robot tells the current status to the user so that he/she can give advice by speech to the robot. This can be considered that speech assists vision through the user. This paper presents how the mutual assistance between speech and vision works and demonstrates promising results through experiments.
Note: 4 pages
A visual modality for the augmentation of paper BIBAFull-Text 3
  David R. McGee; Misha Pavel; Adriana Adami; Guoping Wang; Philip R. Cohen
In this paper we describe how we have enhanced our multimodal paper-based system, Rasa, with visual perceptual input. We briefly explain how Rasa improves upon current decision-support tools by augmenting, rather than replacing, the paper-based tools that people in command and control centers have come to rely upon. We note shortcomings in our initial approach, discuss how we have added computer-vision as another input modality in our multimodal fusion system, and characterize the advantages that it has to offer. We conclude by discussing our current limitations and the work we intend to pursue to overcome them in the future.
Note: 7 pages
Signal level fusion for multimodal perceptual user interface BIBAFull-Text 4
  John W. Fisher; Trevor Darrell
Multi-modal fusion is an important, yet challenging task for perceptual user interfaces. Humans routinely perform complex and simple tasks in which ambiguous auditory and visual data are combined in order to support accurate perception. By contrast, automated approaches for processing multi-modal data sources lag far behind. This is primarily due to the fact that few methods adequately model the complexity of the audio/visual relationship. We present an information theoretic approach for fusion of multiple modalities. Furthermore we discuss a statistical model for which our approach to fusion is justified. We present empirical results demonstrating audio-video localization and consistency measurement. We show examples determining where a speaker is within a scene, and whether they are producing the specified audio stream.
Note: 7 pages

Panel on augmented cognition

Perceptive user interfaces workshop BIBFull-Text 5
  Dylan Schmorrow; Jim Patrey

Posters & Demos

Sketch based interfaces: early processing for sketch understanding BIBAFull-Text 6
  Tevfik Metin Sezgin; Thomas Stahovich; Randall Davis
Freehand sketching is a natural and crucial part of everyday human interaction, yet is almost totally unsupported by current user interfaces. We are working to combine the flexibility and ease of use of paper and pencil with the processing power of a computer, to produce a user interface for design that feels as natural as paper, yet is considerably smarter. One of the most basic steps in accomplishing this is converting the original digitized pen strokes in a sketch into the intended geometric objects. In this paper we describe an implemented system that combines multiple sources of knowledge to provide robust early processing for freehand sketching.
Note: 8 pages

Posters & demos

Speech driven facial animation BIBAFull-Text 7
  P. Kakumanu; R. Gutierrez-Osuna; A. Esposito; R. Bryll; A. Goshtasby; O. N. Garcia
The results reported in this article are an integral part of a larger project aimed at achieving perceptually realistic animations, including the individualized nuances, of three-dimensional human faces driven by speech. The audiovisual system that has been developed for learning the spatio-temporal relationship between speech acoustics and facial animation is described, including video and speech processing, pattern analysis, and MPEG-4 compliant facial animation for a given speaker. In particular, we propose a perceptual transformation of the speech spectral envelope, which is shown to capture the dynamics of articulatory movements. An efficient nearest-neighbor algorithm is used to predict novel articulatory trajectories from the speech dynamics. The results are very promising and suggest a new way to approach the modeling of synthetic lip motion of a given speaker driven by his/her speech. This would also provide clues toward a more general cross-speaker realistic animation.
Note: 5 pages
An experimental multilingual speech translation system BIBAFull-Text 8
  Kenji Matsui; Yumi Wakita; Tomohiro Konuma; Kenji Mizutani; Mitsuru Endo; Masashi Murata
In this paper, we describe an experimental speech translation system utilizing small, PC-based hardware with multi-modal user interface. Two major problems for people using an automatic speech translation device are speech recognition errors and language translation errors. In this paper we focus on developing techniques to overcome these problems. The techniques include a new language translation approach based on example sentences, simplified expression rules, and a multi-modal user interface which shows possible speech recognition candidates retrieved from the example sentences. Combination of the proposed techniques can provide accurate language translation performance even if the speech recognition result contains some errors. We propose to use keyword classes by looking at the dependency between keywords to detect the misrecognized keywords and to search the example expressions. Then, the suitable example expression is chosen using a touch panel or by pushing buttons. The language translation picks up the expression in the other language, which should always be grammatically correct. Simplified translated expressions are realized by speech-act based simplifying rules so that the system can avoid various redundant expressions. A simple comparison study showed that the proposed method outputs almost 2 to 10 times faster than a conventional translation device.
Note: 4 pages
A multimodal presentation planner for a home entertainment environment BIBAFull-Text 9
  Christian Elting; Georg Michelitsch
In this paper we outline the design and the implementation of the multimodal presentation planner PMO, which is part of the EMBASSI intelligent user interface for home entertainment devices. We provide details about the concepts we use to produce cohesive and coherent output as well as illustrate the software architecture of the PMO. We compare our approach with the state of the art in presentation planning and conclude with an illustration of our future work.
Note: 5 pages
Physiological data feedback for application in distance education BIBAFull-Text 10
  Martha E. Crosby; Brent Auernheimer; Christoph Aschwanden; Curtis Ikehara
This paper describes initial experiments collecting physiological data from subjects performing computer tasks. A prototype realtime Emotion Mouse collected skin temperature, galvanic skin response (GSR), and heartbeat data. Possible applications to distance education, and a second-generation system are discussed.
Note: 5 pages
The Bayes Point Machine for computer-user frustration detection via PressureMouse BIBAFull-Text 11
  Yuan Qi; Carson Reynolds; Rosalind W. Picard
We mount eight pressure sensors on a computer mouse and collect mouse pressure signals from subjects who fill out web forms containing usability bugs. This approach is based on a hypothesis that subjects tend to apply excess pressure to the mouse after encountering frustrating events. We then train a Bayes Point Machine in an attempt to classify two regions of each user's behavior: mouse pressure where the form-filling process is proceeding smoothly, and mouse pressure following a usability bug. Different from current popular classifiers such as the Support Vector Machine, the Bayes Point Machine is a new classification technique rooted in the Bayesian theory. Trained with a new efficient Bayesian approximation algorithm, Expectation Propagation, the Bayes Point Machine achieves a person-dependent classification accuracy rate of 88%, which outperforms the Support Vector Machine in our experiments. The resulting system can be used for many applications in human-computer interaction including adaptive interface design.
Note: 5 pages
Using eye movements to determine referents in a spoken dialogue system BIBAFull-Text 12
  Ellen Campana; Jason Baldridge; John Dowding; Beth Ann Hockey; Roger W. Remington; Leland S. Stone
Most computational spoken dialogue systems take a "literary" approach to reference resolution. With this type of approach, entities that are mentioned by a human interactor are unified with elements in the world state based on the same principles that guide the process during text interpretation. In human-to-human interaction, however, referring is a much more collaborative process. Participants often under-specify their referents, relying on their discourse partners for feedback if more information is needed to uniquely identify a particular referent. By monitoring eye-movements during this interaction, it is possible to improve the performance of a spoken dialogue system on referring expressions that are underspecified according to the literary model. This paper describes a system currently under development that employs such a strategy.
Note: 5 pages
An automatic sign recognition and translation system BIBAFull-Text 13
  Jie Yang; Jiang Gao; Ying Zhang; Xilin Chen; Alex Waibel
A sign is something that suggests the presence of a fact, condition, or quality. Signs are everywhere in our lives. They make our lives easier when we are familiar with them. But sometimes they pose problems. For example, a tourist might not be able to understand signs in a foreign country. This paper discusses problems of automatic sign recognition and translation. We present a system capable of capturing images, detecting and recognizing signs, and translating them into a target language. We describe methods for automatic sign extraction and translation. We use a user-centered approach in system development. The approach takes advantage of human intelligence if needed and leverage human capabilities. We are currently working on Chinese sign translation. We have developed a prototype system that can recognize Chinese sign input from a video camera that is a common gadget for a tourist, and translate the signs into English or voice stream. The sign translation, in conjunction with spoken language translation, can help international tourists to overcome language barriers. The technology can also help a visually handicapped person to increase environmental awareness.
Note: 8 pages
Multimodal optimizations: can legacy systems defeat them? BIBAFull-Text 14
  John Harper; Donal Sweeney
This paper describes several results obtained during the implementation and evaluation of a speech complemented interface to a vehicle monitoring system. A speech complemented interface is one wherein the operations at the interface (keyboard and mouse, for instance) are complemented by operator speech not directly processed by the computer. Such systems from an interface perspective have 'low brow' multimodal characteristics. Typical domains include vehicle tracking applications (taxis, buses, freight) where operators frequently use speech to confirm displayed vehicle properties with a driver.
Note: 8 pages
Using multimodal interaction to navigate in arbitrary virtual VRML worlds BIBAFull-Text 15
  Frank Althoff; Gregor McGlaun; Björn Schuller; Peter Morguet; Manfred Lang
In this paper we present a multimodal interface for navigating in arbitrary virtual VRML worlds. Conventional haptic devices like keyboard, mouse, joystick and touchscreen can freely be combined with special Virtual-Reality hardware like spacemouse, data glove and position tracker. As a key feature, the system additionally provides intuitive input by command and natural speech utterances as well as dynamic head and hand gestures. The communication of the interface components is based on the abstract formalism of a context-free grammar, allowing the representation of device-independent information. Taking into account the current system context, user interactions are combined in a semantic unification process and mapped on a model of the viewer's functionality vocabulary. To integrate the continuous multimodal information stream we use a straight-forward rule-based approach and a new technique based on evolutionary algorithms. Our navigation interface has extensively been evaluated in usability studies, obtaining excellent results.
Note: 8 pages

Paper session #2

Towards reliable multimodal sensing in aware environments BIBAFull-Text 16
  Scott Stillman; Irfan Essa
A prototype system for implementing a reliable sensor network for large scale smart environments is presented. Most applications within any form of smart environments (rooms, offices, homes, etc.) are dependent on reliable who, where, when, and what information of its inhabitants (users). This information can be inferred from different sensors spread throughout the space. However, isolated sensing technologies provide limited information under the varying, dynamic, and long-term scenarios (24/7), that are inherent in applications for intelligent environments. In this paper, we present a prototype system that provides an infrastructure for leveraging the strengths of different sensors and processes used for the interpretation of their collective data. We describe the needs of such systems, propose an architecture to deal with such multi-modal fusion, and discuss the initial set of sensors and processes used to address such needs.
Note: 6 pages
Visually prototyping perceptual user interfaces through multimodal storyboarding BIBAFull-Text 17
  Anoop K. Sinha; James A. Landay
We are applying our knowledge in designing informal prototyping tools for user interface design to create an interactive visual prototyping tool for perceptual user interfaces. Our tool allows a designer to quickly map out certain types of multimodal, cross-device user interface scenarios. These sketched designs form a multimodal storyboard that can then be executed, quickly testing the interaction and collecting feedback about refinements necessary for the design. By relying on visual prototyping, our multimodal storyboarding tool simplifies and speeds perceptual user interface prototyping and opens up the challenging space of perceptual user interface design to non-programmers.
Note: 4 pages
Naturally conveyed explanations of device behavior BIBAFull-Text 18
  Michael Oltmans; Randall Davis
Designers routinely explain their designs to one another using sketches and verbal descriptions of behavior, both of which can be understood long before the device has been fully specified. But current design tools fail almost completely to support this sort of interaction, instead not only forcing designers to specify details of the design, but typically requiring that they do so by navigating a forest of menus and dialog boxes, rather than directly describing the behaviors with sketches and verbal explanations. We have created a prototype system, called assistance, capable of interpreting multimodal explanations for simple 2-D kinematic devices. The program generates a model of the events and the causal relationships between events that have been described via hand drawn sketches, sketched annotations, and verbal descriptions. Our goal is to make the designer's interaction with the computer more like interacting with another designer. This requires the ability not only to understand physical devices but also to understand the means by which the explanations of these devices are conveyed.
Note: 8 pages
Audio-video array source separation for perceptual user interfaces BIBAFull-Text 19
  Kevin Wilson; Neal Checka; David Demirdjian; Trevor Darrell
Steerable microphone arrays provide a flexible infrastructure for audio source separation. In order for them to be used effectively in perceptual user interfaces, there must be a mechanism in place for steering the focus of the array to the sound source. Audio-only steering techniques often perform poorly in the presence of multiple sound sources or strong reverberation. Video-only techniques can achieve high spatial precision but require that the audio and video subsystems be accurately calibrated to preserve this precision. We present an audio-video localization technique that combines the benefits of the two modalities. We implement our technique in a test environment containing multiple stereo cameras and a room-sized microphone array. Our technique achieves an 8.9 dB improvement over a single far-field microphone and a 6.7 dB improvement over source separation based on video-only localization.
Note: 7 pages

Paper session #3

Estimating focus of attention based on gaze and sound BIBAFull-Text 20
  Rainer Stiefelhagen; Jie Yang; Alex Waibel
Estimating a person's focus of attention is useful for various human-computer interaction applications, such as smart meeting rooms, where a user's goals and intent have to be monitored. In work presented here, we are interested in modeling focus of attention in a meeting situation. We have developed a system capable of estimating participants' focus of attention from multiple cues. We employ an omnidirectional camera to simultaneously track participants' faces around a meeting table and use neural networks to estimate their head poses. In addition, we use microphones to detect who is speaking. The system predicts participants' focus of attention from acoustic and visual information separately, and then combines the output of the audio- and video-based focus of attention predictors. We have evaluated the system using the data from three recorded meetings. The acoustic information has provided 8% error reduction on average compared to using a single modality.
Note: 9 pages
A pneumatic tactile alerting system for the driving environment BIBAFull-Text 21
  Mario Enriquez; Oleg Afonin; Brent Yager; Karon Maclean
Sensory overloaded environments present an opportunity for innovative design in the area of Human-Machine Interaction. In this paper we study the usefulness of a tactile display in the automobile environment. Our approach uses a simple pneumatic pump to produce pulsations of varying frequencies on the driver's hands through a car steering wheel fitted with inflatable pads. The goal of the project is to evaluate the effectiveness of such a system in alerting the driver of a possible problem, when it is used to augment the visual display presently used in automobiles. A steering wheel that provides haptic feedback using pneumatic pockets was developed to test our hypothesis. The steering wheel can pulsate at different frequencies. The system was tested in a simple multitasking paradigm on several subjects and their reaction times to different stimuli were measured and analyzed. For these experiments, we found that using a tactile feedback device lowers reaction time significantly and that modulating frequency of vibration provides extra information that can reduce the time necessary to identify a problem.
Note: 7 pages
A robust algorithm for reading detection BIBAFull-Text 22
  Christopher S. Campbell; Paul P. Maglio
As video cameras become cheaper and more pervasive, there is now increased opportunity for user interfaces to take advantage of user gaze data. Eye movements provide a powerful source of information that can be used to determine user intentions and interests. In this paper, we develop and test a method for recognizing when users are reading text based solely on eye-movement data. The experimental results show that our reading detection method is robust to noise, individual differences, and variations in text difficulty. Compared to a simple detection algorithm, our algorithm reliably, quickly, and accurately recognizes and tracks reading. Thus, we provide a means to capture normal user activity, enabling interfaces that incorporate more natural interactions of human and computer.
Note: 7 pages
A perceptual user interface for recognizing head gesture acknowledgements BIBAFull-Text 23
  James W. Davis; Serge Vaks
We present the design and implementation of a perceptual user interface for a responsive dialog-box agent that employs real-time computer vision to recognize user acknowledgements from head gestures (e.g., nod = yes). IBM Pupil-Cam technology together with anthropometric head and face measures are used to first detect the location of the user's face. Salient facial features are then identi ed and tracked to compute the global 2-D motion direction of the head. For recognition, timings of natural gesture motion are incorporated into a state-space model. The interface is presented in the context of an enhanced text editor employing a perceptual dialog-box agent.
Note: 7 pages
Perception and haptics: towards more accessible computers for motion-impaired users BIBAFull-Text 24
  Faustina Hwang; Simeon Keates; Patrick Langdon; P. John Clarkson; Peter Robinson
For people with motion impairments, access to and independent control of a computer can be essential. Symptoms such as tremor and spasm, however, can make the typical keyboard and mouse arrangement for computer interaction difficult or even impossible to use. This paper describes three approaches to improving computer input effectiveness for people with motion impairments. The three approaches are: (1) to increase the number of interaction channels, (2) to enhance commonly existing interaction channels, and (3) to make more effective use of all the available information in an existing input channel. Experiments in multimodal input, haptic feedback, user modelling, and cursor control are discussed in the context of the three approaches. A haptically enhanced keyboard emulator with perceptive capability is proposed, combining approaches in a way that improves computer access for motion impaired users.
Note: 9 pages

Posters & demos

A real-time head nod and shake detector BIBAFull-Text 25
  Ashish Kapoor; Rosalind W. Picard
Head nods and head shakes are non-verbal gestures used often to communicate intent, emotion and to perform conversational functions. We describe a vision-based system that detects head nods and head shakes in real time and can act as a useful and basic interface to a machine. We use an infrared sensitive camera equipped with infrared LEDs to track pupils. The directions of head movements, determined using the position of pupils, are used as observations by a discrete Hidden Markov Model (HMM) based pattern analyzer to detect when a head nod/shake occurs. The system is trained and tested on natural data from ten users gathered in the presence of varied lighting and varied facial expressions. The system as described achieves a real time recognition accuracy of 78.46% on the test dataset.
Note: 5 pages
"Those look similar!" issues in automating gesture design advice BIBAFull-Text 26
  A. Chris Long; James A. Landay; Lawrence A. Rowe
Today, state-of-the-art user interfaces often include new interaction technologies, such as speech recognition, computer vision, or gesture recognition. Unfortunately, these technologies are difficult for most interface designers to incorporate into their interfaces, and traditional tools do not help designers with these technologies. One such technology is pen gestures, which are valuable as a powerful pen-based interaction technique, but are difficult to design well. We developed an interface design tool that uses unsolicited advice to help designers of pen-based user interfaces create pen gestures. Specifically, the tool warns designers when their gestures will be perceived to be similar and advises designers how to make their gestures less similar. We believe that the issues we encountered while designing an interface for advice and implementing this advice will reappear in design tools for other novel input technologies, such as hand and body gestures.
Note: 5 pages
Design issues for vision-based computer interaction systems BIBAFull-Text 27
  Rick Kjeldsen; Jacob Hartman
Computer Vision and other direct sensing technologies have progressed to the point where we can detect many aspects of a user's activity reliably and in real time. Simply recognizing the activity is not enough, however. If perceptual interaction is going to become a part of the user interface, we must turn our attention to the tasks we wish to perform and methods to effectively perform them.
   This paper attempts to further our understanding of vision-based interaction by looking at the steps involved in building practical systems, giving examples from several existing systems. We classify the types of tasks well suited to this type of interaction as pointing, control or selection, and discuss interaction techniques for each class. We address the factors affecting the selection of the control action, and various types of control signals that can be extracted from visual input. We present our design for widgets to perform different types of tasks, and techniques, similar to those used with established user interface devices, to give the user the type of control they need to perform the task well. We look at ways to combine individual widgets into Visual Interfaces that allow the user to perform these tasks both concurrently and sequentially.
Note: 8 pages
Hand tracking for human-computer interaction with Graylevel VisualGlove: turning back to the simple way BIBAFull-Text 28
  Giancarlo Iannizzotto; Massimo Villari; Lorenzo Vita
Recent developments in the manufacturing and marketing of low power-consumption computers, small enough to be "worn" by users and remain almost invisible, have reintroduced the problem of overcoming the outdated paradigm of human-computer interaction based on use of a keyboard and a mouse. Approaches based on visual tracking seem to be the most promising, as they do not require any additional devices (gloves, etc.) and can be implemented with off-the-shelf devices such as webcams. Unfortunately, extremely variable lighting conditions and the high degree of computational complexity of most of the algorithms available make these techniques hard to use in systems where CPU power consumption is a major issue (e.g. wearable computers) and in situations where lighting conditions are critical (outdoors, in the dark, etc.). This paper describes the work carried out at VisiLAB at the University of Messina as part of the VisualGlove Project to develop a real-time, vision-based device able to operate as a substitute for the mouse and other similar input devices. It is able to operate in a wide range of lighting conditions, using a low-cost webcam and running on an entry-level PC. As explained in detail below, particular care has been taken to reduce computational complexity, in the attempt to reduce the amount of resources needed for the whole system to work.
Note: 7 pages
Robust finger tracking for wearable computer interfacing BIBAFull-Text 29
  Sylvia M. Dominguez; Trish Keaton; Ali H. Sayed
Key to the design of human-machine gesture interface applications is the ability of the machine to quickly and efficiently identify and track the hand movements of its user. In a wearable computer system equipped with head-mounted cameras, this task is extremely difficult due to the uncertain camera motion caused by the user's head movement, the user standing still then randomly walking, and the user's hand or pointing finger abruptly changing directions at variable speeds. This paper presents a tracking methodology based on a robust state-space estimation algorithm, which attempts to control the influence of uncertain environment conditions on the system's performance by adapting the tracking model to compensate for the uncertainties inherent in the data. Our system tracks a user's pointing gesture from a single head mounted camera, to allow the user to encircle an object of interest, thereby coarsely segmenting the object. The snapshot of the object is then passed to a recognition engine for identification, and retrieval of any pre-stored information regarding the object. A comparison of our robust tracker against a plain Kalman tracker showed a 15% improvement in the estimated position error, and exhibited a faster response time.
Note: 5 pages
Privacy protection by concealing persons in circumstantial video image BIBAFull-Text 30
  Suriyon Tansuriyavong; Shin-ichi Hanaki
A circumstantial video image should convey sufficient situation information, while protecting specific person's privacy information in the scene. This paper proposes a system which automatically identifies a person by face recognition, tracks him or her, and displays the image of the person in modified form such as silhouette with or without name, or only name in characters (i.e. invisible person). A subjective evaluation experiment was carried out in order to know how people prefer each modified video image either from observer or subject viewpoint. It turned out that the silhouette display with name list seems to be most appropriate from the balance between protecting privacy and conveying situation information in circumstantial video image.
Note: 4 pages
Bare-hand human-computer interaction BIBAFull-Text 31
  Christian von Hardenberg; François Bérard
In this paper, we describe techniques for barehanded interaction between human and computer. Barehanded means that no device and no wires are attached to the user, who controls the computer directly with the movements of his/her hand.
   Our approach is centered on the needs of the user. We therefore define requirements for real-time barehanded interaction, derived from application scenarios and usability considerations. Based on those requirements a finger-finding and hand-posture recognition algorithm is developed and evaluated.
   To demonstrate the strength of the algorithm, we build three sample applications. Finger tracking and hand posture recognition are used to paint virtually onto the wall, to control a presentation with hand postures, and to move virtual items on the wall during a brainstorming session. We conclude the paper with user tests, which were conducted to prove the usability of bare-hand human computer interaction.
Note: 8 pages
User and social interfaces by observing human faces for intelligent wheelchairs BIBAFull-Text 32
  Yoshinori Kuno; Yoshifumi Murakami; Nobutaka Shimada
With the increase in the number of senior citizens, there is a growing demand for human-friendly wheelchairs as mobility aids. Thus several intelligent wheelchairs have been proposed recently. However, they consider friendliness only to their users. Since wheelchairs move among people, they should also be friendly to people around them. In other words, they should have a social-friendly interface as well as a user-friendly interface. We propose an intelligent wheelchair that is friendly to both user and people around it by observing the faces of both user and others. The user can control it by turning his/her face in the direction where he/she would like to turn. It observes pedestrian's face and changes its collision avoidance method depending on whether or not he/she notices it. Here we assume that the pedestrian notices the wheelchair if his/her face often faces toward the wheelchair.
Note: 4 pages
First steps towards automatic recognition of spontaneous facial action units BIBAFull-Text 33
  B. Braathen; M. S. Bartlett; G. Littlewort; J. R. Movellan
We present ongoing work on a project for automatic recognition of spontaneous facial actions (FACs). Current methods for automatic facial expression recognition assume images are collected in controlled environments in which the subjects deliberately face the camera. Since people often nod or turn their heads, automatic recognition of spontaneous facial behavior requires methods for handling out-of-image-plane head rotations. There are many promising approaches to address the problem of out-of-image plane rotations. In this paper we explore an approach based on 3-D warping of images into canonical views. Since our goal is to explore the potential of this approach, we first tried with images with 8 hand-labeled facial landmarks. However the approach can be generalized in a straight-forward manner to work automatically based on the output of automatic feature detectors. A front-end system was developed that jointly estimates camera parameters, head geometry and 3-D head pose across entire sequences of video images. Head geometry and image parameters were assumed constant across images and 3-D head pose is allowed to vary. First a small set of images was used to estimate camera parameters and 3D face geometry. Markov chain Monte-Carlo methods were then used to recover the most-likely sequence of 3D poses given a sequence of video images. Once the 3D pose was known, we warped each image into frontal views with a canonical face geometry. We evaluate the performance of the approach as a front-end for an spontaneous expression recognition task.
Note: 5 pages
A video joystick from a toy BIBAFull-Text 34
  Gary Bradski; Victor Eruhimov; Sergey Molinov; Valery Mosyagin; Vadim Pisarevsky
The paper describes an algorithm for 3D reconstruction of a toy composed from rigid bright colored blocks with the help of a conventional video camera. The blocks are segmented using histogram thresholds and merged into one connected component corresponding to the whole toy. We also present the algorithm for extracting the color structure and matching feature points across the frames and discuss robust structure from motion and recognition connected with the subject.
Note: 4 pages
WebContext: remote access to shared context BIBAFull-Text 35
  Robert G., III Capra; Manuel A. Pérez-Quiñones; Naren Ramakrishnan
In this paper, we describe a system and architecture for building and remotely accessing shared context between a user and a computer. The system is designed to allow a user to browse web pages on a personal computer and then remotely make queries about information seen on the web pages using a telephone-based voice user interface.
Note: 9 pages

Paper session #4

Recognizing movements from the ground reaction force BIBAFull-Text 36
  Robert Headon; Rupert Curwen
This paper presents a novel approach to movement recognition, using the vertical component of a person's Ground Reaction Force (GRF). Typical primitive movements such as taking a step, jumping, drop-landing, sitting down, rising to stand and crouching are decomposed and recognized in terms of the GRF signal observed by a weight sensitive floor. Previous works focused on vision processing for movement recognition. This work provides a new sensor modality for a larger research effort, that of sentient computing, which is concerned with giving computers awareness of their environment and inhabitants.
Note: 8 pages
The Infocockpit: providing location and place to aid human memory BIBAFull-Text 37
  Desney S. Tan; Jeanine K. Stefanucci; Dennis R. Proffitt; Randy Pausch
Our work focuses on building and evaluating computer system interfaces that make information memorable. Psychology research tells us people remember spatially distributed information based on its location relative to their body, as well as the environment in which the information was learned. We apply these principles in the implementation of a multimodal prototype system, the Infocockpit (for "Information Cockpit"). The Infocockpit not only uses multiple monitors surrounding the user to engage human memory for location, but also provides ambient visual and auditory displays to engage human memory for place. We report a user study demonstrating a 56% increase in memory for information presented with our Infocockpit system as compared to a standard desktop system.
Note: 4 pages
Visual panel: virtual mouse, keyboard and 3D controller with an ordinary piece of paper BIBAFull-Text 38
  Zhengyou Zhang; Ying Wu; Ying Shan; Steven Shafer
This paper presents a vision-based interface system, VISUAL PANEL, which employs an arbitrary quadrangle-shaped panel (e.g., an ordinary piece of paper) and a tip pointer (e.g., fingertip) as an intuitive, wireless and mobile input device. The system can accurately and reliably track the panel and the tip pointer. The panel tracking continuously determines the projective mapping between the panel at the current position and the display, which in turn maps the tip position to the corresponding position on the display. By detecting the clicking and dragging actions, the system can fulfill many tasks such as controlling a remote large display, and simulating a physical keyboard. Users can naturally use their fingers or other tip pointers to issue commands and type texts. Furthermore, by tracking the 3D position and orientation of the visual panel, the system can also provide 3D information, serving as a virtual joystick, to control 3D virtual objects.
Note: 8 pages