HCI Bibliography : Search Results skip to search form | skip to results |
Database updated: 2016-05-10 Searches since 2006-12-01: 32,876,218
director@hcibib.org
Hosted by ACM SIGCHI
The HCI Bibliogaphy was moved to a new server 2015-05-12 and again 2016-01-05, substantially degrading the environment for making updates.
There are no plans to add to the database.
Please send questions or comments to director@hcibib.org.
Query: nakamura_s* Results: 42 Sorted by: Date  Comments?
Help Dates
Limit:   
<<First <Previous Permalink Next> Last>> Records: 1 to 25 of 42 Jump to: 2015 | 14 | 13 | 12 | 11 | 10 | 09 | 08 | 07 | 06 | 05 | 03 | 02 |
[1] An Enhanced Electrolarynx with Automatic Fundamental Frequency Control based on Statistical Prediction Demo Session / Tanaka, Kou / Toda, Tomoki / Neubig, Graham / Sakti, Sakriani / Nakamura, Satoshi Seventeenth International ACM SIGACCESS Conference on Computers and Accessibility 2015-10-26 p.435-436
ACM Digital Library Link
Summary: An electrolarynx is a type of speaking aid device which is able to mechanically generate excitation sounds to help laryngectomees produce electrolaryngeal (EL) speech. Although EL speech is quite intelligible, its naturalness suffers from monotonous fundamental frequency patterns of the mechanical excitation sounds. To make it possible to generate more natural excitation sounds, we have proposed a method to automatically control the fundamental frequency of the sounds generated by the electrolarynx based on a statistical prediction model, which predicts the fundamental frequency patterns from the produced EL speech in real-time. In this paper, we develop a prototype system by implementing the proposed control method in an actual, physical electrolarynx and evaluate its performance.

[2] A model-based approach to support smart and social home living Smarter homes and vehicles / Nakamura, Shoko / Shigaki, Saeko / Hiromori, Akihito / Yamaguchi, Hirozumi / Higashino, Teruo Proceedings of the 2015 International Conference on Ubiquitous Computing 2015-09-07 p.1101-1105
ACM Digital Library Link
Summary: A system to improve the quality of human life is developed and proposed. A model-based approach is used where smart home residents, appliances, energy sources and correlations among them are comprehensively modeled. The model was integrated with activity recognition information that enables the system to suggest smart life tips that provide advice to residents in a non-intrusive way. A crowd-sourced large-scale survey of 1,000 subjects was conducted that enabled important tips for improving the quality of human life to be quantified. On the basis of the survey results, quantitative metrics and strategies were designed for presenting suitable tips in a timely manner depending on the lifestyle of subjects. The system was evaluated by (1) 34 actual subjects in virtual smart homes and (2) family members in an actual house in an experiment lasting more than one month in which actual sensors were deployed.

[3] Proposal of an Instructional Design Support System Based on Consensus Among Academic Staff and Students Information and Interaction for Learning and Education / Nakamura, Shuya / Tomoto, Takahito / Akakura, Takako HIMI 2015: 17th International Conference on Human Interface and the Management of Information, Symposium on Human Interface, Part II: Information and Knowledge in Context 2015-08-02 v.2 p.370-377
Keywords: Instructional design; Co-creation; Service engineering
Link to Digital Content at Springer
Summary: In this paper, we propose an instructional design-based method for supporting academic staff and students in value co-creation within the university setting. The term co-creation adopted in this study comes from the field of service engineering and is defined as the mutual creation of value by service providers and service beneficiaries. Co-creation is realized by consensus among them. Within the university setting, the service providers are the academic staff and the beneficiaries are students. Here, we propose a model of co-creation in universities and then present a support method based on a syllabus and learning motivation for co-creation. Finally, we discuss the co-creation support system.

[4] Automated Social Skills Trainer Education / Crowdsourcing / Social / Tanaka, Hiroki / Sakti, Sakriani / Neubig, Graham / Toda, Tomoki / Negoro, Hideki / Iwasaka, Hidemi / Nakamura, Satoshi Proceedings of the 2015 International Conference on Intelligent User Interfaces 2015-03-29 v.1 p.17-27
ACM Digital Library Link
Summary: Social skills training is a well-established method to decrease human anxiety and discomfort in social interaction, and acquire social skills. In this paper, we attempt to automate the process of social skills training by developing a dialogue system named "automated social skills trainer," which provides social skills training through human-computer interaction. The system includes a virtual avatar that recognizes user speech and language information and gives feedback to users to improve their social skills. Its design is based on conventional social skills training performed by human participants, including defining target skills, modeling, role-play, feedback, reinforcement, and homework. An experimental evaluation measuring the relationship between social skill and speech and language features shows that these features have a relationship with autistic traits. Additional experiments measuring the effect of performing social skills training with the proposed application show that most participants improve their skill by using the system for 50 minutes.

[5] VRMixer: mixing video and real world with video segmentation Entertainment environment / Hirai, Tatsunori / Nakamura, Satoshi / Yumura, Tsubasa / Morishima, Shigeo Proceedings of the 2014 International Conference on Advances in Computer Entertainment Technology 2014-11-11 p.30
ACM Digital Library Link
Summary: This paper presents VRMixer, a system that mixes real world and a video clip letting a user enter the video clip and realize a virtual co-starring role with people appearing in the clip. Our system constructs a simple virtual space by allocating video frames and the people appearing in the clip within the user's 3D space. By measuring the user's 3D depth in real time, the time space of the video clip and the user's 3D space become mixed. VRMixer automatically extracts human images from a video clip by using a video segmentation technique based on 3D graph cut segmentation that employs face detection to detach the human area from the background. A virtual 3D space (i.e., 2.5D space) is constructed by positioning the background in the back and the people in the front. In the video clip, the user can stand in front of or behind the people by using a depth camera. Real objects that are closer than the distance of the clip's background will become part of the constructed virtual 3D space. This synthesis creates a new image in which the user appears to be a part of the video clip, or in which people in the clip appear to enter the real world. We aim to realize "video reality," i.e., a mixture of reality and video clips using VRMixer.

[6] EDITED BOOK Natural Interaction with Robots, Knowbots and Smartphones: Putting Spoken Dialog Systems into Practice / Mariani, Joseph / Rosset, Sophie / Garnier-Rizet, Martine / Devillers, Laurence 2014 p.397 Springer New York
ISBN: 978-1-4614-8279-6 (print), 978-1-4614-8280-2 (online)
Link to Digital Content at Springer
== Spoken Dialog Systems in Everyday Applications ==
Spoken Language Understanding for Natural Interaction: The Siri Experience (3-14)
	+ Bellegarda, Jerome R.
Development of Speech-Based In-Car HMI Concepts for Information Exchange Internet Apps (15-28)
	+ Hofmann, Hansjörg
	+ Silberstein, Anna
	+ Ehrlich, Ute
	+ Berton, André
	+ Müller, Christian
	+ Mahr, Angela
Real Users and Real Dialog Systems: The Hard Challenge for SDS (29-36)
	+ Black, Alan W.
	+ Eskenazi, Maxine
A Multimodal Multi-device Discourse and Dialogue Infrastructure for Collaborative Decision-Making in Medicine (37-47)
	+ Sonntag, Daniel
	+ Schulz, Christian
== Spoken Dialog Prototypes and Products ==
Yochina: Mobile Multimedia and Multimodal Crosslingual Dialogue System (51-57)
	+ Xu, Feiyu
	+ Schmeier, Sven
	+ Ai, Renlong
	+ Uszkoreit, Hans
Walk This Way: Spatial Grounding for City Exploration (59-67)
	+ Boye, Johan
	+ Fredriksson, Morgan
	+ Götze, Jana
	+ Gustafson, Joakim
	+ Königsmann, Jürgen
Multimodal Dialogue System for Interaction in AmI Environment by Means of File-Based Services (69-77)
	+ Ábalos, Nieves
	+ Espejo, Gonzalo
	+ López-Cózar, Ramón
	+ Ballesteros, Francisco J.
	+ Soriano, Enrique
	+ Guardiola, Gorka
Development of a Toolkit Handling Multiple Speech-Oriented Guidance Agents for Mobile Applications (79-85)
	+ Hara, Sunao
	+ Kawanami, Hiromichi
	+ Saruwatari, Hiroshi
	+ Shikano, Kiyohiro
Providing Interactive and User-Adapted E-City Services by Means of Voice Portals (87-98)
	+ Griol, David
	+ García-Jiménez, María
	+ Callejas, Zoraida
	+ López-Cózar, Ramón
== Multi-domain, Crosslingual Spoken Dialog Systems ==
Efficient Language Model Construction for Spoken Dialog Systems by Inducting Language Resources of Different Languages (101-110)
	+ Misu, Teruhisa
	+ Matsuda, Shigeki
	+ Mizukami, Etsuo
	+ Kashioka, Hideki
	+ Li, Haizhou
Towards Online Planning for Dialogue Management with Rich Domain Knowledge (111-123)
	+ Lison, Pierre
A Two-Step Approach for Efficient Domain Selection in Multi-Domain Dialog Systems (125-131)
	+ Lee, Injae
	+ Kim, Seokhwan
	+ Kim, Kyungduk
	+ Lee, Donghyeon
	+ Choi, Junhwi
	+ Ryu, Seonghan
	+ Lee, Gary Geunbae
== Human-Robot Interaction ==
From Informative Cooperative Dialogues to Long-Term Social Relation with a Robot (135-151)
	+ Buendia, Axel
	+ Devillers, Laurence
Integration of Multiple Sound Source Localization Results for Speaker Identification in Multiparty Dialogue System (153-165)
	+ Nakashima, Taichi
	+ Komatani, Kazunori
	+ Sato, Satoshi
Investigating the Social Facilitation Effect in Human--Robot Interaction (167-177)
	+ Wechsung, Ina
	+ Ehrenbrink, Patrick
	+ Schleicher, Robert
	+ Möller, Sebastian
More Than Just Words: Building a Chatty Robot (179-185)
	+ Gilmartin, Emer
	+ Campbell, Nick
Predicting When People Will Speak to a Humanoid Robot (187-198)
	+ Sugiyama, Takaaki
	+ Komatani, Kazunori
	+ Sato, Satoshi
Designing an Emotion Detection System for a Socially Intelligent Human-Robot Interaction (199-211)
	+ Chastagnol, Clément
	+ Clavel, Céline
	+ Courgeon, Matthieu
	+ Devillers, Laurence
Multimodal Open-Domain Conversations with the Nao Robot (213-224)
	+ Jokinen, Kristiina
	+ Wilcock, Graham
Component Pluggable Dialogue Framework and Its Application to Social Robots (225-237)
	+ Jiang, Ridong
	+ Tan, Yeow Kee
	+ Limbu, Dilip Kumar
	+ Dung, Tran Anh
	+ Li, Haizhou
== Spoken Dialog Systems Components ==
Visual Contribution to Word Prominence Detection in a Playful Interaction Setting (241-247)
	+ Heckmann, Martin
Label Noise Robustness and Learning Speed in a Self-Learning Vocal User Interface (249-259)
	+ Ons, Bart
	+ Gemmeke, Jort F.
	+ Van hamme, Hugo
Topic Classification of Spoken Inquiries Using Transductive Support Vector Machine (261-267)
	+ Torres, Rafael
	+ Kawanami, Hiromichi
	+ Matsui, Tomoko
	+ Saruwatari, Hiroshi
	+ Shikano, Kiyohiro
Frame-Level Selective Decoding Using Native and Non-native Acoustic Models for Robust Speech Recognition to Native and Non-native Speech (269-274)
	+ Oh, Yoo Rhee
	+ Chung, Hoon
	+ Kang, Jeom-ja
	+ Lee, Yun Keun
Analysis of Speech Under Stress and Cognitive Load in USAR Operations (275-281)
	+ Charfuelan, Marcela
	+ Kruijff, Geert-Jan
== Dialog Management ==
Does Personality Matter? Expressive Generation for Dialogue Interaction (285-301)
	+ Walker, Marilyn A.
	+ Sawyer, Jennifer
	+ Lin, Grace
	+ Wing, Sam
Application and Evaluation of a Conditioned Hidden Markov Model for Estimating Interaction Quality of Spoken Dialogue Systems (303-312)
	+ Ultes, Stefan
	+ ElChab, Robert
	+ Minker, Wolfgang
FLoReS: A Forward Looking, Reward Seeking, Dialogue Manager (313-325)
	+ Morbini, Fabrizio
	+ DeVault, David
	+ Sagae, Kenji
	+ Gerten, Jillian
	+ Nazarian, Angela
	+ Traum, David
A Clustering Approach to Assess Real User Profiles in Spoken Dialogue Systems (327-334)
	+ Callejas, Zoraida
	+ Griol, David
	+ Engelbrecht, Klaus-Peter
	+ López-Cózar, Ramón
What Are They Achieving Through the Conversation? Modeling Guide--Tourist Dialogues by Extended Grounding Networks (335-341)
	+ Mizukami, Etsuo
	+ Kashioka, Hideki
Co-adaptation in Spoken Dialogue Systems (343-353)
	+ Chandramohan, Senthilkumar
	+ Geist, Matthieu
	+ Lefèvre, Fabrice
	+ Pietquin, Olivier
Developing Non-goal Dialog System Based on Examples of Drama Television (355-361)
	+ Nio, Lasguido
	+ Sakti, Sakriani
	+ Neubig, Graham
	+ Toda, Tomoki
	+ Adriani, Mirna
	+ Nakamura, Satoshi
A User Model for Dialog System Evaluation Based on Activation of Subgoals (363-374)
	+ Engelbrecht, Klaus-Peter
Real-Time Feedback System for Monitoring and Facilitating Discussions (375-387)
	+ Sarda, Sanat
	+ Constable, Martin
	+ Dauwels, Justin
	+ Shoko Dauwels (Okutsu), 	+ 
	+ Elgendi, Mohamed
	+ Mengyu, Zhou
	+ Rasheed, Umer
	+ Tahir, Yasir
	+ Thalmann, Daniel
	+ Magnenat-Thalmann, Nadia
Evaluation of Invalid Input Discrimination Using Bag-of-Words for Speech-Oriented Guidance System (389-397)
	+ Majima, Haruka
	+ Torres, Rafael
	+ Kawanami, Hiromichi
	+ Hara, Sunao
	+ Matsui, Tomoko
	+ Saruwatari, Hiroshi
	+ Shikano, Kiyohiro

[7] Leveraging viewer comments for mood classification of music video clips Short papers 1 -- multimedia IR / Yamamoto, Takehiro / Nakamura, Satoshi Proceedings of the 2013 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2013-07-28 p.797-800
ACM Digital Library Link
Summary: This short paper proposes a method to classify music video clips uploaded to a video sharing service into music mood categories such as 'cheerful,' 'wistful,' and 'aggressive.' The method leverages viewer comments posted to the music video clips for the music mood classification. It extracts specific features from the comments: (1) adjectives in comments, (2) lengthened words in comments, and (3) comments in chorus sections. Our experimental results classifying 695 video clips into six mood categories showed that our method outperformed the baseline in terms of macro and micro averaged F-measures. In addition, our method outperformed the existing approaches that utilize lyrics and audio signals of songs.

[8] Design Considerations for Leveraging Over-familiar Items for Elderly Health Monitors Design and Evaluation of Smart and Intelligent Environments / Wang, Edward / Ipser, Samantha / Little, Patrick / Duncan, Noah / Liu, Benjamin / Nakamura, Shinsaku DAPI 2013: 1st International Conference on Distributed, Ambient, and Pervasive Interactions 2013-07-21 p.255-261
Keywords: User Interface; Health Monitoring; Gernotechnology
Link to Digital Content at Springer
Summary: Japan is facing the phenomenon of an aging population. Elderly individuals in Japan are becoming increasingly isolated, with no one to look after him or her as the elderly individual's health deteriorates. To prevent this decline in elderly individuals, the Japanese government has been introducing various devices to monitor the health of elderly individuals. However, existing products in Japan do not fully address customer needs because they focus solely on functionality. As a result, elderly individuals that do not depend on monitoring may find the system too inconvenient. However, it is still important for elderly individuals in good health to be monitored to identify risks and prevent a decline in health. Therefore, health monitor designers must reduce the inconvenience to the user caused by systems that monitor elderly individuals.

[9] Facial design for humanoid robot Robot and VR / Kanaya, Ichiroh / Doi, Shoichi / Nakamura, Shohei / Kawasaki, Kazuo Proceedings of the 2012 Asia Pacific Conference on Computer Human Interaction 2012-08-28 p.141-148
ACM Digital Library Link
Summary: In this research, the authors succeeded in creating facial expressions made with the minimum necessary elements for recognizing a face. The elements are two eyes and a mouth made using precise circles, which are transformed to make facial expressions geometrically, through rotation and vertically scaling transformation. The facial expression patterns made by the geometric elements and transformations were composed employing three dimensions of visual information that had been suggested by many previous researches, slantedness of the mouth, openness of the face, and slantedness of the eyes. In addition, the relationships between the affective meanings of the visual information also corresponded to the results of the previous researches.
    The authors found that facial expressions can be classified into 10 emotions: happy, angry, sad, disgust, fear, surprised, angry*, fear*, neutral (pleasant) indicating positive emotion, and neutral (unpleasant) indicating negative emotion. These emotions were portrayed by different geometric transformations. Furthermore, the authors discovered the "Tetrahedral model," which can express most clearly the geometric relationships between facial expressions. In this model, each side connecting the face is an axis that controlled the rotational and vertically scaling transformations of the eyes and mouth.

[10] Search intent estimation from user's eye movements for supporting information seeking User-system interaction / Umemoto, Kazutoshi / Yamamoto, Takehiro / Nakamura, Satoshi / Tanaka, Katsumi Proceedings of the 2012 International Conference on Advanced Visual Interfaces 2012-05-22 p.349-356
ACM Digital Library Link
Summary: In this paper, we propose a two-stage system using user's eye movements to accommodate the increasing demands to obtain information from the Web in an efficient way. In the first stage the system estimates a user's search intent as a set of weighted terms extracted based on the user's eye movements while browsing Web pages. Then in the second stage, the system shows relevant information to the user by using the estimated intent for re-ranking search results, suggesting intent-based queries, and emphasizing relevant parts of Web pages. The system aims to help users to efficiently obtain what they need by repeating these steps throughout the information seeking process. We proposed four types of search intent estimation methods (MLT, nMLT, DLT and nDLT) considering the relationship among intents, term frequencies and eye movements. As a result of an experiment designed for evaluating the accuracy of each method with a prototype system, we confirmed that the nMLT method works best. In addition, by analyzing the extracted intent terms for eight subjects in the experiment, we found that the system could estimate the unique search intent of each user even if they performed the same search tasks.

[11] Study of information clouding methods to prevent spoilers of sports match Interactive posters / Nakamura, Satoshi / Komatsu, Takanori Proceedings of the 2012 International Conference on Advanced Visual Interfaces 2012-05-22 p.661-664
ACM Digital Library Link
Summary: Seeing the final score of a sports match on the Web often spoils the pleasure of a user who is waiting to watch a recording of this match on TV. This paper proposes four information clouding methods to block spoiling information, and describes implementation of a system using these methods as a browser extension. We then experimentally investigate the usefulness of the methods, taking into account their differences, differences in the variety of content, and differences in the user's interest in sports.

[12] Personal photo browser that can classify photos by participants and situations System paper demos / Onishi, Tomoya / Tokuami, Ryosuke / Kono, Yasuyuki / Nakamura, Satoshi Proceedings of the 2012 International Conference on Advanced Visual Interfaces 2012-05-22 p.798-799
ACM Digital Library Link
Summary: This paper demonstrates a photo browser which rearranges photos referring to the persons who were close to the photographer when the photos were taken by consulting Bluetooth device detection information. Most of Bluetooth devices accompany their owners. Each photo is tagged with Bluetooth device-IDs which were detected around the moment when it was taken. Employing the tag information, the system classifies the user's photo archive into a layered cluster tree in terms of tag similarity, and shows its user the photos of her-selected cluster on either a map or timelines.

[13] RerankEverything: a reranking interface for exploring search results Poster session: information retrieval / Yamamoto, Takehiro / Nakamura, Satoshi / Tanaka, Katsumi Proceedings of the 2011 ACM Conference on Information and Knowledge Management 2011-10-24 p.1913-1916
ACM Digital Library Link
Summary: This paper proposes a system called "RerankEverything", which enables users to rerank search results in any search service, such as a Web search engine, an e-commerce site, a hotel reservation site, and so on. This system helps users explore diverse search results. In conventional search services, interactions between users and systems are quite limited and complicated. By using RerankEverything, users can interactively explore search results in accordance with their interests by reranking search results from various viewpoints. Experimental results show that our system potentially help users search more proactively. When using our system, users were more likely to click search results that were initially low ranked. Users also browsed through more diverse search results by reranking search results after giving various types of feedback with our system.

[14] Extracting adjective facets from community Q&A corpus Poster session: information retrieval / Yamamoto, Takehiro / Nakamura, Satoshi / Tanaka, Katsumi Proceedings of the 2011 ACM Conference on Information and Knowledge Management 2011-10-24 p.2021-2024
ACM Digital Library Link
Summary: In this paper, we propose a method for helping users explore information via Web searches by using a question and answer (Q&A) corpus archived in a community Q&A site. When users do not have clear information needs and have little knowledge about the task domain, it is difficult for them to create queries that adequately reflect their information needs. We focused on terms like "famous temples," "historical townscapes," and "delicious sweets," which we call "adjective facets", and developed a method of extracting these facets from question and answer archives at a community Q&A site. We evaluated the effectiveness of our adjective facets by comparing them with several baselines.

[15] 3-D Sound Reproduction System for Immersive Environments Based on the Boundary Surface Control Principle Virtual and Immersive Environments / Enomoto, Seigo / Ikeda, Yusuke / Ise, Shiro / Nakamura, Satoshi VMR 2011: 4th International Conference on Virtual and Mixed Reality, Part I: New Trends 2011-07-09 v.1 p.174-184
Keywords: Boundary surface control principle; Immersive environments; Virtual reality; Stereophony; Surround sound
Link to Digital Content at Springer
Summary: We constructed a 3-D sound reproduction system containing a 62-channel loudspeaker array and 70-channel microphone array based on the boundary surface control principle (BoSC). The microphone array can record the volume of the 3-D sound field and the loudspeaker array can accurately recreate it in other locations. Using these systems, we realized immersive acoustic environments similar to cinema or television sound spaces. We also recorded real 3-D acoustic environments, such as an orchestra performance and forest sounds, by using the microphone array. Recreated sound fields were evaluated by demonstration experiments using the 3-D sound field. Subjective assessments of 390 subjects confirm that these systems can achieve high presence for 3-D sound reproduction and provide the listener with deep immersion.

[16] Providing Immersive Virtual Experience with First-Person Perspective Omnidirectional Movies and Three Dimensional Sound Field Virtual and Immersive Environments / Kondo, Kazuaki / Mukaigawa, Yasuhiro / Ikeda, Yusuke / Enomoto, Seigo / Ise, Shiro / Nakamura, Satoshi / Yagi, Yasushi VMR 2011: 4th International Conference on Virtual and Mixed Reality, Part I: New Trends 2011-07-09 v.1 p.204-213
Keywords: First-person Perspective; Omnidirectional Vision; Three Dimensional Sound Reproduction; Boundary Surface Control Principle
Link to Digital Content at Springer
Summary: Providing high immersive feeling to audiences has proceeded with growing up of techniques about video and acoustic medias. In our proposal, we record and reproduce omnidirectional movies captured at a perspective of an actor and three dimensional sound field around him, and try to reproduce more impressive feeling. We propose a sequence of techniques to archive it, including a recording equipment, video and acoustic processing, and a presentation system. Effectiveness and demand of our system has been demonstrated by ordinary people through evaluation experiments.

[17] Personalized Voice Assignment Techniques for Synchronized Scenario Speech Output in Entertainment Systems VR for Culture and Entertainment / Kawamoto, Shinichi / Yotsukura, Tatsuo / Nakamura, Satoshi / Morishima, Shigeo VMR 2011: 4th International Conference on Virtual and Mixed Reality, Part II: Systems and Applications 2011-07-09 v.2 p.177-186
Keywords: Instant casting movie system; post-recording; speaker similarity; voice morphing; synchronized speech output
Link to Digital Content at Springer
Summary: The paper describes voice assignment techniques for synchronized scenario speech output in an instant casting movie system that enables anyone to be a movie star using his or her own voice and face. Two prototype systems were implemented, and both systems worked well for various participants, ranging from children to the elderly.

[18] Instant Movie Casting with Personality: Dive into the Movie System VR for Culture and Entertainment / Morishima, Shigeo / Yagi, Yasushi / Nakamura, Satoshi VMR 2011: 4th International Conference on Virtual and Mixed Reality, Part II: Systems and Applications 2011-07-09 v.2 p.187-196
Keywords: Personality Modeling; Gait Motion; Entertainment; Face Capture
Link to Digital Content at Springer
Summary: "Dive into the Movie (DIM)" is a name of project to aim to realize a world innovative entertainment system which can provide an immersion experience into the story by giving a chance to audience to share an impression with his family or friends by watching a movie in which all audience can participate in the story as movie casts. To realize this system, we are trying to model and capture the personal characteristics instantly and precisely in face, body, gait, hair and voice. All of the modeling, character synthesis, rendering and compositing processes have to be performed on real-time without any manual operation. In this paper, a novel entertainment system, Future Cast System (FCS), is introduced as a prototype of DIM. The first experimental trial demonstration of FCS was performed at the World Exposition 2005 in which 1,630,000 people have experienced this event during 6 months. And finally up-to-date DIM system to realize more realistic sensation is introduced.

[19] RerankEverything: a reranking interface for browsing search results WWW posters / Yamamoto, Takehiro / Nakamura, Satoshi / Tanaka, Katsumi Proceedings of the 2010 International Conference on the World Wide Web 2010-04-26 v.1 p.1209-1210
Keywords: reranking, search user interfaces, wrapper generation
ACM Digital Library Link
Summary: This paper proposes a system called RerankEverything, which enables users to rerank search results in any search service, such as a Web search engine, an e-commerce site, a hotel reservation site and so on. In conventional search services, interactions between users and services are quite limited and complicated. In addition, search functions and interactions to refine search results differ depending on the services. By using RerankEverything, users can interactively explore search results in accordance with their interests by reranking search results from various viewpoints.

[20] Normalization on the modulation spectrum of the subband temporal envelopes for automatic speech recognition in reverberant environments Search and interface / Lu, X. / Unoki, M. / Nakamura, S. Proceedings of the 3rd International Universal Communication Symposium 2009-12-03 p.247-254
Keywords: automatic speech recognition, dereverberation, subband temporal envelope, temporal modulation
ACM Digital Library Link
Summary: In this study, we proposed a feature extraction method based on the subband temporal envelopes (STEs) and their normalization for reverberated speech recognition. The STEs were extracted by using a series of constant bandwidth band-pass filters with Hilbert transform followed by a low-pass filtering. In the normalization, both the modulation spectrum (MS) of the subband temporal envelopes of the clean and reverberated speech are normalized to a reference MS calculated from a clean speech data set. Based on the normalized subband MS, the inverse Fourier transform was used to restore the subband temporal envelopes. We tested the proposed method on speech recognition in a reverberant room with different speaker to microphone distance (SMD). For comparison, the recognition performance of using the traditional Mel-cepstral coefficients with mean and variance normalization were used as the baseline. Experimental results showed that, by averaging the SMDs from 50 cm to 400 cm, there was a 44.96% relative improvement by only using subband temporal envelope processing, and further a 15.68% relative improvement by using the normalization on the subband modulation spectrum. Totally, there was about a 53.59% relative improvement, which was better than those of using other temporal filtering and normalization methods.

[21] Evaluation for WFST-based dialog management Search and interface / Hori, Chiori / Ohtake, Kiyonori / Misu, Teruhisa / Kashioka, Hideki / Nakamura, Satoshi Proceedings of the 3rd International Universal Communication Symposium 2009-12-03 p.255-260
Keywords: WFST optimization operation, interchange format (IF), spoken dialog, statistical dialog management, weighted finite-state transducer (WFST)
ACM Digital Library Link
Summary: To construct an expandable and adaptable dialog system which handles multiple tasks, we proposes a dialog system using a weighted finite-state transducer (WFST) in which users concept and system action tags are input and output of the transducer, respectively. To test the potential of the WFST-based dialog management (DM) platform using statistical DM models, we construct a dialog system using a human-to-human spoken dialog corpus for hotel reservation, which is annotated with Interchange Format (IF). A scenario, a Spoken Language Understanding (SLU) and a Sentence Generation (SG) WFSTs are obtained from the corpus and then composed together and optimized to generate a Dialog Management (DM) WFST. We evaluate the detection accuracy of the system next actions using Mean Reciprocal Ranking (MRR). We evaluated how WFST optimization operations contribute to dialog systems and confirmed the optimization enhance the performance of accuracy of the next action detection.

[22] Dialogue act annotation for consulting dialogue corpus Poster session / Ohtake, Kiyonori / Misu, Teruhisa / Hori, Chiori / Kashioka, Hideki / Nakamura, Satoshi Proceedings of the 3rd International Universal Communication Symposium 2009-12-03 p.372-378
ACM Digital Library Link
Summary: This paper introduces a new corpus of consulting dialogues, which is designed for training a dialogue manager that can handle consulting dialogues through spontaneous interactions from the tagged dialogue corpus. We have collected 130 h of consulting dialogues in the tourist guidance domain. This paper outlines our taxonomy of dialogue act annotation that can describe two aspects of an utterances: the communicative function (speech act), and the semantic content of the utterance. We provide an overview of the Kyoto tour guide dialogue corpus and a preliminary analysis using the dialogue act tags.

[23] Hyperbolic structure of fundamental frequency contour Poster session / Ni, Jinfu / Sakai, Shinsuke / Kawai, Hisashi / Nakamura, Satoshi Proceedings of the 3rd International Universal Communication Symposium 2009-12-03 p.389-394
Keywords: F0 control, intonation, speech prosody, speech synthesis
ACM Digital Library Link
Summary: In this paper, we propose an approach to transformation of fundamental frequency (F0) contours for conversational speech synthesis. The figure of F0 in relations to the period of cycles of sound waves is one branch of the rectangular hyperbola. Based on a few symmetry assumptions on the hyperbolic property, we achieve a generalized hyperbolic structure so as to aggressively manipulate F0 contours. The modeling proves an equivalent expression of the resonance mechanism capable for dealing with the interaction of tone and intonation. Also, it is language-independent because no language-dependent hypothesis is necessary. This paper describes two applications of the hyperbolic structures of F0 contours to prosodic information processing. One modulates the baseline F0 contours when fusing additional makeup information onto them without altering the underlying linguistic information. The other separates local rise/fall F0 movements and global scale component from observed F0 contours, both being useful for estimating dynamical F0 variation. Our experimental results are very positive.

[24] Spoken document retrieval using topic models Poster session / Hu, Xinhui / Isotani, Ryosuke / Nakamura, Satoshi Proceedings of the 3rd International Universal Communication Symposium 2009-12-03 p.400-403
Keywords: NMF, document topic model, spoken document retrieval
ACM Digital Library Link
Summary: In this paper, we propose a document topic model (DTM) based on the non-negative matrix factorization (NMF) approach to explore spontaneous spoken document retrieval. The model uses latent semantic indexing to detect underlying semantic relationships within documents. Each document is interpreted as a generative topic model belonging to many topics. The relevance of a document to a query is expressed by the probability of a query being generated by the model. The term-document matrix used for NMF is built stochastically from the speech recognition N-best results, so that multiple recognition hypotheses can be utilized to compensate for the word recognition errors. Using this approach, experiments are conducted on a test collection from the Corpus of Spontaneous Japanese (CSJ), with 39 queries for over 600 hours of spontaneous Japanese speech. The retrieval performance of this model is proved to be superior to the conventional vector space model (VSM) when the dimension or topic number exceeds a certain threshold. Moreover, whether from the viewpoint of retrieval performance or the ability of topic expression, the NMF-based topic model is verified to surpass another latent indexing method that is based on the singular value decomposition (SVD). The extent to which this topic model can resist speech recognition error, which is a special problem of spoken document retrieval, is also investigated.

[25] Soft margin estimation on improving environment structures for ensemble speaker and speaking environment modeling Poster session / Tsao, Yu / Li, Jinyu / Lee, Chin-Hui / Nakamura, Satoshi Proceedings of the 3rd International Universal Communication Symposium 2009-12-03 p.404-408
Keywords: ASR, ESSEM, SME, model adaptation, noise robustness
ACM Digital Library Link
Summary: Recently, we proposed an ensemble speaker and speaking environment modeling (ESSEM) approach to enhance the robustness of automatic speech recognition (ASR) under adverse conditions. The ESSEM framework comprises two phases, offline and online phases. In the offline phase, we prepare an environment structure that is formed by multiple sets of hidden Markov models (HMMs). Each HMM set represents a particular speaker and speaking environment. In the online phase, ESSEM estimates a mapping function to transform the prepared environment structure to a set of HMMs for the unknown testing condition. In this study, we incorporate the soft margin estimation (SME) to increase the discriminative power of the environment structure in the offline stage and therefore enhance the overall ESSEM performance. We evaluated the performance on the Aurora-2 connected digit database. With the SME refined environment structure, ESSEM provides better performance than the original framework. By using our best online mapping function, ESSEM achieves a word error rate (WER) of 4.62%, corresponding to 14.60% relative WER reduction (from 5.41% to 4.62%) over the best baseline performance of 5.41% WER.
<<First <Previous Permalink Next> Last>> Records: 1 to 25 of 42 Jump to: 2015 | 14 | 13 | 12 | 11 | 10 | 09 | 08 | 07 | 06 | 05 | 03 | 02 |