[1]
An Enhanced Electrolarynx with Automatic Fundamental Frequency Control based
on Statistical Prediction
Demo Session
/
Tanaka, Kou
/
Toda, Tomoki
/
Neubig, Graham
/
Sakti, Sakriani
/
Nakamura, Satoshi
Seventeenth International ACM SIGACCESS Conference on Computers and
Accessibility
2015-10-26
p.435-436
© Copyright 2015 ACM
Summary: An electrolarynx is a type of speaking aid device which is able to
mechanically generate excitation sounds to help laryngectomees produce
electrolaryngeal (EL) speech. Although EL speech is quite intelligible, its
naturalness suffers from monotonous fundamental frequency patterns of the
mechanical excitation sounds. To make it possible to generate more natural
excitation sounds, we have proposed a method to automatically control the
fundamental frequency of the sounds generated by the electrolarynx based on a
statistical prediction model, which predicts the fundamental frequency patterns
from the produced EL speech in real-time. In this paper, we develop a prototype
system by implementing the proposed control method in an actual, physical
electrolarynx and evaluate its performance.
[2]
A model-based approach to support smart and social home living
Smarter homes and vehicles
/
Nakamura, Shoko
/
Shigaki, Saeko
/
Hiromori, Akihito
/
Yamaguchi, Hirozumi
/
Higashino, Teruo
Proceedings of the 2015 International Conference on Ubiquitous Computing
2015-09-07
p.1101-1105
© Copyright 2015 ACM
Summary: A system to improve the quality of human life is developed and proposed. A
model-based approach is used where smart home residents, appliances, energy
sources and correlations among them are comprehensively modeled. The model was
integrated with activity recognition information that enables the system to
suggest smart life tips that provide advice to residents in a non-intrusive
way. A crowd-sourced large-scale survey of 1,000 subjects was conducted that
enabled important tips for improving the quality of human life to be
quantified. On the basis of the survey results, quantitative metrics and
strategies were designed for presenting suitable tips in a timely manner
depending on the lifestyle of subjects. The system was evaluated by (1) 34
actual subjects in virtual smart homes and (2) family members in an actual
house in an experiment lasting more than one month in which actual sensors were
deployed.
[3]
Proposal of an Instructional Design Support System Based on Consensus Among
Academic Staff and Students
Information and Interaction for Learning and Education
/
Nakamura, Shuya
/
Tomoto, Takahito
/
Akakura, Takako
HIMI 2015: 17th International Conference on Human Interface and the
Management of Information, Symposium on Human Interface, Part II: Information
and Knowledge in Context
2015-08-02
v.2
p.370-377
Keywords: Instructional design; Co-creation; Service engineering
© Copyright 2015 Springer International Publishing Switzerland
Summary: In this paper, we propose an instructional design-based method for
supporting academic staff and students in value co-creation within the
university setting. The term co-creation adopted in this study comes from the
field of service engineering and is defined as the mutual creation of value by
service providers and service beneficiaries. Co-creation is realized by
consensus among them. Within the university setting, the service providers are
the academic staff and the beneficiaries are students. Here, we propose a model
of co-creation in universities and then present a support method based on a
syllabus and learning motivation for co-creation. Finally, we discuss the
co-creation support system.
[4]
Automated Social Skills Trainer
Education / Crowdsourcing / Social
/
Tanaka, Hiroki
/
Sakti, Sakriani
/
Neubig, Graham
/
Toda, Tomoki
/
Negoro, Hideki
/
Iwasaka, Hidemi
/
Nakamura, Satoshi
Proceedings of the 2015 International Conference on Intelligent User
Interfaces
2015-03-29
v.1
p.17-27
© Copyright 2015 ACM
Summary: Social skills training is a well-established method to decrease human
anxiety and discomfort in social interaction, and acquire social skills. In
this paper, we attempt to automate the process of social skills training by
developing a dialogue system named "automated social skills trainer," which
provides social skills training through human-computer interaction. The system
includes a virtual avatar that recognizes user speech and language information
and gives feedback to users to improve their social skills. Its design is based
on conventional social skills training performed by human participants,
including defining target skills, modeling, role-play, feedback, reinforcement,
and homework. An experimental evaluation measuring the relationship between
social skill and speech and language features shows that these features have a
relationship with autistic traits. Additional experiments measuring the effect
of performing social skills training with the proposed application show that
most participants improve their skill by using the system for 50 minutes.
[5]
VRMixer: mixing video and real world with video segmentation
Entertainment environment
/
Hirai, Tatsunori
/
Nakamura, Satoshi
/
Yumura, Tsubasa
/
Morishima, Shigeo
Proceedings of the 2014 International Conference on Advances in Computer
Entertainment Technology
2014-11-11
p.30
© Copyright 2014 Authors
Summary: This paper presents VRMixer, a system that mixes real world and a video clip
letting a user enter the video clip and realize a virtual co-starring role with
people appearing in the clip. Our system constructs a simple virtual space by
allocating video frames and the people appearing in the clip within the user's
3D space. By measuring the user's 3D depth in real time, the time space of the
video clip and the user's 3D space become mixed. VRMixer automatically extracts
human images from a video clip by using a video segmentation technique based on
3D graph cut segmentation that employs face detection to detach the human area
from the background. A virtual 3D space (i.e., 2.5D space) is constructed by
positioning the background in the back and the people in the front. In the
video clip, the user can stand in front of or behind the people by using a
depth camera. Real objects that are closer than the distance of the clip's
background will become part of the constructed virtual 3D space. This synthesis
creates a new image in which the user appears to be a part of the video clip,
or in which people in the clip appear to enter the real world. We aim to
realize "video reality," i.e., a mixture of reality and video clips using
VRMixer.
[6]
EDITED BOOK
Natural Interaction with Robots, Knowbots and Smartphones: Putting Spoken
Dialog Systems into Practice
/
Mariani, Joseph
/
Rosset, Sophie
/
Garnier-Rizet, Martine
/
Devillers, Laurence
2014
p.397
Springer New York
== Spoken Dialog Systems in Everyday Applications ==
Spoken Language Understanding for Natural Interaction: The Siri Experience (3-14)
+ Bellegarda, Jerome R.
Development of Speech-Based In-Car HMI Concepts for Information Exchange Internet Apps (15-28)
+ Hofmann, Hansjörg
+ Silberstein, Anna
+ Ehrlich, Ute
+ Berton, André
+ Müller, Christian
+ Mahr, Angela
Real Users and Real Dialog Systems: The Hard Challenge for SDS (29-36)
+ Black, Alan W.
+ Eskenazi, Maxine
A Multimodal Multi-device Discourse and Dialogue Infrastructure for Collaborative Decision-Making in Medicine (37-47)
+ Sonntag, Daniel
+ Schulz, Christian
== Spoken Dialog Prototypes and Products ==
Yochina: Mobile Multimedia and Multimodal Crosslingual Dialogue System (51-57)
+ Xu, Feiyu
+ Schmeier, Sven
+ Ai, Renlong
+ Uszkoreit, Hans
Walk This Way: Spatial Grounding for City Exploration (59-67)
+ Boye, Johan
+ Fredriksson, Morgan
+ Götze, Jana
+ Gustafson, Joakim
+ Königsmann, Jürgen
Multimodal Dialogue System for Interaction in AmI Environment by Means of File-Based Services (69-77)
+ Ábalos, Nieves
+ Espejo, Gonzalo
+ López-Cózar, Ramón
+ Ballesteros, Francisco J.
+ Soriano, Enrique
+ Guardiola, Gorka
Development of a Toolkit Handling Multiple Speech-Oriented Guidance Agents for Mobile Applications (79-85)
+ Hara, Sunao
+ Kawanami, Hiromichi
+ Saruwatari, Hiroshi
+ Shikano, Kiyohiro
Providing Interactive and User-Adapted E-City Services by Means of Voice Portals (87-98)
+ Griol, David
+ García-Jiménez, María
+ Callejas, Zoraida
+ López-Cózar, Ramón
== Multi-domain, Crosslingual Spoken Dialog Systems ==
Efficient Language Model Construction for Spoken Dialog Systems by Inducting Language Resources of Different Languages (101-110)
+ Misu, Teruhisa
+ Matsuda, Shigeki
+ Mizukami, Etsuo
+ Kashioka, Hideki
+ Li, Haizhou
Towards Online Planning for Dialogue Management with Rich Domain Knowledge (111-123)
+ Lison, Pierre
A Two-Step Approach for Efficient Domain Selection in Multi-Domain Dialog Systems (125-131)
+ Lee, Injae
+ Kim, Seokhwan
+ Kim, Kyungduk
+ Lee, Donghyeon
+ Choi, Junhwi
+ Ryu, Seonghan
+ Lee, Gary Geunbae
== Human-Robot Interaction ==
From Informative Cooperative Dialogues to Long-Term Social Relation with a Robot (135-151)
+ Buendia, Axel
+ Devillers, Laurence
Integration of Multiple Sound Source Localization Results for Speaker Identification in Multiparty Dialogue System (153-165)
+ Nakashima, Taichi
+ Komatani, Kazunori
+ Sato, Satoshi
Investigating the Social Facilitation Effect in Human--Robot Interaction (167-177)
+ Wechsung, Ina
+ Ehrenbrink, Patrick
+ Schleicher, Robert
+ Möller, Sebastian
More Than Just Words: Building a Chatty Robot (179-185)
+ Gilmartin, Emer
+ Campbell, Nick
Predicting When People Will Speak to a Humanoid Robot (187-198)
+ Sugiyama, Takaaki
+ Komatani, Kazunori
+ Sato, Satoshi
Designing an Emotion Detection System for a Socially Intelligent Human-Robot Interaction (199-211)
+ Chastagnol, Clément
+ Clavel, Céline
+ Courgeon, Matthieu
+ Devillers, Laurence
Multimodal Open-Domain Conversations with the Nao Robot (213-224)
+ Jokinen, Kristiina
+ Wilcock, Graham
Component Pluggable Dialogue Framework and Its Application to Social Robots (225-237)
+ Jiang, Ridong
+ Tan, Yeow Kee
+ Limbu, Dilip Kumar
+ Dung, Tran Anh
+ Li, Haizhou
== Spoken Dialog Systems Components ==
Visual Contribution to Word Prominence Detection in a Playful Interaction Setting (241-247)
+ Heckmann, Martin
Label Noise Robustness and Learning Speed in a Self-Learning Vocal User Interface (249-259)
+ Ons, Bart
+ Gemmeke, Jort F.
+ Van hamme, Hugo
Topic Classification of Spoken Inquiries Using Transductive Support Vector Machine (261-267)
+ Torres, Rafael
+ Kawanami, Hiromichi
+ Matsui, Tomoko
+ Saruwatari, Hiroshi
+ Shikano, Kiyohiro
Frame-Level Selective Decoding Using Native and Non-native Acoustic Models for Robust Speech Recognition to Native and Non-native Speech (269-274)
+ Oh, Yoo Rhee
+ Chung, Hoon
+ Kang, Jeom-ja
+ Lee, Yun Keun
Analysis of Speech Under Stress and Cognitive Load in USAR Operations (275-281)
+ Charfuelan, Marcela
+ Kruijff, Geert-Jan
== Dialog Management ==
Does Personality Matter? Expressive Generation for Dialogue Interaction (285-301)
+ Walker, Marilyn A.
+ Sawyer, Jennifer
+ Lin, Grace
+ Wing, Sam
Application and Evaluation of a Conditioned Hidden Markov Model for Estimating Interaction Quality of Spoken Dialogue Systems (303-312)
+ Ultes, Stefan
+ ElChab, Robert
+ Minker, Wolfgang
FLoReS: A Forward Looking, Reward Seeking, Dialogue Manager (313-325)
+ Morbini, Fabrizio
+ DeVault, David
+ Sagae, Kenji
+ Gerten, Jillian
+ Nazarian, Angela
+ Traum, David
A Clustering Approach to Assess Real User Profiles in Spoken Dialogue Systems (327-334)
+ Callejas, Zoraida
+ Griol, David
+ Engelbrecht, Klaus-Peter
+ López-Cózar, Ramón
What Are They Achieving Through the Conversation? Modeling Guide--Tourist Dialogues by Extended Grounding Networks (335-341)
+ Mizukami, Etsuo
+ Kashioka, Hideki
Co-adaptation in Spoken Dialogue Systems (343-353)
+ Chandramohan, Senthilkumar
+ Geist, Matthieu
+ Lefèvre, Fabrice
+ Pietquin, Olivier
Developing Non-goal Dialog System Based on Examples of Drama Television (355-361)
+ Nio, Lasguido
+ Sakti, Sakriani
+ Neubig, Graham
+ Toda, Tomoki
+ Adriani, Mirna
+ Nakamura, Satoshi
A User Model for Dialog System Evaluation Based on Activation of Subgoals (363-374)
+ Engelbrecht, Klaus-Peter
Real-Time Feedback System for Monitoring and Facilitating Discussions (375-387)
+ Sarda, Sanat
+ Constable, Martin
+ Dauwels, Justin
+ Shoko Dauwels (Okutsu), +
+ Elgendi, Mohamed
+ Mengyu, Zhou
+ Rasheed, Umer
+ Tahir, Yasir
+ Thalmann, Daniel
+ Magnenat-Thalmann, Nadia
Evaluation of Invalid Input Discrimination Using Bag-of-Words for Speech-Oriented Guidance System (389-397)
+ Majima, Haruka
+ Torres, Rafael
+ Kawanami, Hiromichi
+ Hara, Sunao
+ Matsui, Tomoko
+ Saruwatari, Hiroshi
+ Shikano, Kiyohiro
[7]
Leveraging viewer comments for mood classification of music video clips
Short papers 1 -- multimedia IR
/
Yamamoto, Takehiro
/
Nakamura, Satoshi
Proceedings of the 2013 Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval
2013-07-28
p.797-800
© Copyright 2013 ACM
Summary: This short paper proposes a method to classify music video clips uploaded to
a video sharing service into music mood categories such as 'cheerful,'
'wistful,' and 'aggressive.' The method leverages viewer comments posted to the
music video clips for the music mood classification. It extracts specific
features from the comments: (1) adjectives in comments, (2) lengthened words in
comments, and (3) comments in chorus sections. Our experimental results
classifying 695 video clips into six mood categories showed that our method
outperformed the baseline in terms of macro and micro averaged F-measures. In
addition, our method outperformed the existing approaches that utilize lyrics
and audio signals of songs.
[8]
Design Considerations for Leveraging Over-familiar Items for Elderly Health
Monitors
Design and Evaluation of Smart and Intelligent Environments
/
Wang, Edward
/
Ipser, Samantha
/
Little, Patrick
/
Duncan, Noah
/
Liu, Benjamin
/
Nakamura, Shinsaku
DAPI 2013: 1st International Conference on Distributed, Ambient, and
Pervasive Interactions
2013-07-21
p.255-261
Keywords: User Interface; Health Monitoring; Gernotechnology
© Copyright 2013 Springer-Verlag
Summary: Japan is facing the phenomenon of an aging population. Elderly individuals
in Japan are becoming increasingly isolated, with no one to look after him or
her as the elderly individual's health deteriorates. To prevent this decline in
elderly individuals, the Japanese government has been introducing various
devices to monitor the health of elderly individuals. However, existing
products in Japan do not fully address customer needs because they focus solely
on functionality. As a result, elderly individuals that do not depend on
monitoring may find the system too inconvenient. However, it is still important
for elderly individuals in good health to be monitored to identify risks and
prevent a decline in health. Therefore, health monitor designers must reduce
the inconvenience to the user caused by systems that monitor elderly
individuals.
[9]
Facial design for humanoid robot
Robot and VR
/
Kanaya, Ichiroh
/
Doi, Shoichi
/
Nakamura, Shohei
/
Kawasaki, Kazuo
Proceedings of the 2012 Asia Pacific Conference on Computer Human
Interaction
2012-08-28
p.141-148
© Copyright 2012 Springer-Verlag
Summary: In this research, the authors succeeded in creating facial expressions made
with the minimum necessary elements for recognizing a face. The elements are
two eyes and a mouth made using precise circles, which are transformed to make
facial expressions geometrically, through rotation and vertically scaling
transformation. The facial expression patterns made by the geometric elements
and transformations were composed employing three dimensions of visual
information that had been suggested by many previous researches, slantedness of
the mouth, openness of the face, and slantedness of the eyes. In addition, the
relationships between the affective meanings of the visual information also
corresponded to the results of the previous researches.
The authors found that facial expressions can be classified into 10
emotions: happy, angry, sad, disgust, fear, surprised, angry*, fear*, neutral
(pleasant) indicating positive emotion, and neutral (unpleasant) indicating
negative emotion. These emotions were portrayed by different geometric
transformations. Furthermore, the authors discovered the "Tetrahedral model,"
which can express most clearly the geometric relationships between facial
expressions. In this model, each side connecting the face is an axis that
controlled the rotational and vertically scaling transformations of the eyes
and mouth.
[10]
Search intent estimation from user's eye movements for supporting
information seeking
User-system interaction
/
Umemoto, Kazutoshi
/
Yamamoto, Takehiro
/
Nakamura, Satoshi
/
Tanaka, Katsumi
Proceedings of the 2012 International Conference on Advanced Visual
Interfaces
2012-05-22
p.349-356
© Copyright 2012 ACM
Summary: In this paper, we propose a two-stage system using user's eye movements to
accommodate the increasing demands to obtain information from the Web in an
efficient way. In the first stage the system estimates a user's search intent
as a set of weighted terms extracted based on the user's eye movements while
browsing Web pages. Then in the second stage, the system shows relevant
information to the user by using the estimated intent for re-ranking search
results, suggesting intent-based queries, and emphasizing relevant parts of Web
pages. The system aims to help users to efficiently obtain what they need by
repeating these steps throughout the information seeking process. We proposed
four types of search intent estimation methods (MLT, nMLT, DLT and nDLT)
considering the relationship among intents, term frequencies and eye movements.
As a result of an experiment designed for evaluating the accuracy of each
method with a prototype system, we confirmed that the nMLT method works best.
In addition, by analyzing the extracted intent terms for eight subjects in the
experiment, we found that the system could estimate the unique search intent of
each user even if they performed the same search tasks.
[11]
Study of information clouding methods to prevent spoilers of sports match
Interactive posters
/
Nakamura, Satoshi
/
Komatsu, Takanori
Proceedings of the 2012 International Conference on Advanced Visual
Interfaces
2012-05-22
p.661-664
© Copyright 2012 ACM
Summary: Seeing the final score of a sports match on the Web often spoils the
pleasure of a user who is waiting to watch a recording of this match on TV.
This paper proposes four information clouding methods to block spoiling
information, and describes implementation of a system using these methods as a
browser extension. We then experimentally investigate the usefulness of the
methods, taking into account their differences, differences in the variety of
content, and differences in the user's interest in sports.
[12]
Personal photo browser that can classify photos by participants and
situations
System paper demos
/
Onishi, Tomoya
/
Tokuami, Ryosuke
/
Kono, Yasuyuki
/
Nakamura, Satoshi
Proceedings of the 2012 International Conference on Advanced Visual
Interfaces
2012-05-22
p.798-799
© Copyright 2012 ACM
Summary: This paper demonstrates a photo browser which rearranges photos referring to
the persons who were close to the photographer when the photos were taken by
consulting Bluetooth device detection information. Most of Bluetooth devices
accompany their owners. Each photo is tagged with Bluetooth device-IDs which
were detected around the moment when it was taken. Employing the tag
information, the system classifies the user's photo archive into a layered
cluster tree in terms of tag similarity, and shows its user the photos of
her-selected cluster on either a map or timelines.
[13]
RerankEverything: a reranking interface for exploring search results
Poster session: information retrieval
/
Yamamoto, Takehiro
/
Nakamura, Satoshi
/
Tanaka, Katsumi
Proceedings of the 2011 ACM Conference on Information and Knowledge
Management
2011-10-24
p.1913-1916
© Copyright 2011 ACM
Summary: This paper proposes a system called "RerankEverything", which enables users
to rerank search results in any search service, such as a Web search engine, an
e-commerce site, a hotel reservation site, and so on. This system helps users
explore diverse search results. In conventional search services, interactions
between users and systems are quite limited and complicated. By using
RerankEverything, users can interactively explore search results in accordance
with their interests by reranking search results from various viewpoints.
Experimental results show that our system potentially help users search more
proactively. When using our system, users were more likely to click search
results that were initially low ranked. Users also browsed through more diverse
search results by reranking search results after giving various types of
feedback with our system.
[14]
Extracting adjective facets from community Q&A corpus
Poster session: information retrieval
/
Yamamoto, Takehiro
/
Nakamura, Satoshi
/
Tanaka, Katsumi
Proceedings of the 2011 ACM Conference on Information and Knowledge
Management
2011-10-24
p.2021-2024
© Copyright 2011 ACM
Summary: In this paper, we propose a method for helping users explore information via
Web searches by using a question and answer (Q&A) corpus archived in a
community Q&A site. When users do not have clear information needs and have
little knowledge about the task domain, it is difficult for them to create
queries that adequately reflect their information needs. We focused on terms
like "famous temples," "historical townscapes," and "delicious sweets," which
we call "adjective facets", and developed a method of extracting these facets
from question and answer archives at a community Q&A site. We evaluated the
effectiveness of our adjective facets by comparing them with several baselines.
[15]
3-D Sound Reproduction System for Immersive Environments Based on the
Boundary Surface Control Principle
Virtual and Immersive Environments
/
Enomoto, Seigo
/
Ikeda, Yusuke
/
Ise, Shiro
/
Nakamura, Satoshi
VMR 2011: 4th International Conference on Virtual and Mixed Reality, Part I:
New Trends
2011-07-09
v.1
p.174-184
Keywords: Boundary surface control principle; Immersive environments; Virtual reality;
Stereophony; Surround sound
Copyright © 2011 Springer-Verlag
Summary: We constructed a 3-D sound reproduction system containing a 62-channel
loudspeaker array and 70-channel microphone array based on the boundary surface
control principle (BoSC). The microphone array can record the volume of the 3-D
sound field and the loudspeaker array can accurately recreate it in other
locations. Using these systems, we realized immersive acoustic environments
similar to cinema or television sound spaces. We also recorded real 3-D
acoustic environments, such as an orchestra performance and forest sounds, by
using the microphone array. Recreated sound fields were evaluated by
demonstration experiments using the 3-D sound field. Subjective assessments of
390 subjects confirm that these systems can achieve high presence for 3-D sound
reproduction and provide the listener with deep immersion.
[16]
Providing Immersive Virtual Experience with First-Person Perspective
Omnidirectional Movies and Three Dimensional Sound Field
Virtual and Immersive Environments
/
Kondo, Kazuaki
/
Mukaigawa, Yasuhiro
/
Ikeda, Yusuke
/
Enomoto, Seigo
/
Ise, Shiro
/
Nakamura, Satoshi
/
Yagi, Yasushi
VMR 2011: 4th International Conference on Virtual and Mixed Reality, Part I:
New Trends
2011-07-09
v.1
p.204-213
Keywords: First-person Perspective; Omnidirectional Vision; Three Dimensional Sound
Reproduction; Boundary Surface Control Principle
Copyright © 2011 Springer-Verlag
Summary: Providing high immersive feeling to audiences has proceeded with growing up
of techniques about video and acoustic medias. In our proposal, we record and
reproduce omnidirectional movies captured at a perspective of an actor and
three dimensional sound field around him, and try to reproduce more impressive
feeling. We propose a sequence of techniques to archive it, including a
recording equipment, video and acoustic processing, and a presentation system.
Effectiveness and demand of our system has been demonstrated by ordinary people
through evaluation experiments.
[17]
Personalized Voice Assignment Techniques for Synchronized Scenario Speech
Output in Entertainment Systems
VR for Culture and Entertainment
/
Kawamoto, Shinichi
/
Yotsukura, Tatsuo
/
Nakamura, Satoshi
/
Morishima, Shigeo
VMR 2011: 4th International Conference on Virtual and Mixed Reality, Part
II: Systems and Applications
2011-07-09
v.2
p.177-186
Keywords: Instant casting movie system; post-recording; speaker similarity; voice
morphing; synchronized speech output
Copyright © 2011 Springer-Verlag
Summary: The paper describes voice assignment techniques for synchronized scenario
speech output in an instant casting movie system that enables anyone to be a
movie star using his or her own voice and face. Two prototype systems were
implemented, and both systems worked well for various participants, ranging
from children to the elderly.
[18]
Instant Movie Casting with Personality: Dive into the Movie System
VR for Culture and Entertainment
/
Morishima, Shigeo
/
Yagi, Yasushi
/
Nakamura, Satoshi
VMR 2011: 4th International Conference on Virtual and Mixed Reality, Part
II: Systems and Applications
2011-07-09
v.2
p.187-196
Keywords: Personality Modeling; Gait Motion; Entertainment; Face Capture
Copyright © 2011 Springer-Verlag
Summary: "Dive into the Movie (DIM)" is a name of project to aim to realize a world
innovative entertainment system which can provide an immersion experience into
the story by giving a chance to audience to share an impression with his family
or friends by watching a movie in which all audience can participate in the
story as movie casts. To realize this system, we are trying to model and
capture the personal characteristics instantly and precisely in face, body,
gait, hair and voice. All of the modeling, character synthesis, rendering and
compositing processes have to be performed on real-time without any manual
operation. In this paper, a novel entertainment system, Future Cast System
(FCS), is introduced as a prototype of DIM. The first experimental trial
demonstration of FCS was performed at the World Exposition 2005 in which
1,630,000 people have experienced this event during 6 months. And finally
up-to-date DIM system to realize more realistic sensation is introduced.
[19]
RerankEverything: a reranking interface for browsing search results
WWW posters
/
Yamamoto, Takehiro
/
Nakamura, Satoshi
/
Tanaka, Katsumi
Proceedings of the 2010 International Conference on the World Wide Web
2010-04-26
v.1
p.1209-1210
Keywords: reranking, search user interfaces, wrapper generation
© Copyright 2010 ACM
Summary: This paper proposes a system called RerankEverything, which enables users to
rerank search results in any search service, such as a Web search engine, an
e-commerce site, a hotel reservation site and so on. In conventional search
services, interactions between users and services are quite limited and
complicated. In addition, search functions and interactions to refine search
results differ depending on the services. By using RerankEverything, users can
interactively explore search results in accordance with their interests by
reranking search results from various viewpoints.
[20]
Normalization on the modulation spectrum of the subband temporal envelopes
for automatic speech recognition in reverberant environments
Search and interface
/
Lu, X.
/
Unoki, M.
/
Nakamura, S.
Proceedings of the 3rd International Universal Communication Symposium
2009-12-03
p.247-254
Keywords: automatic speech recognition, dereverberation, subband temporal envelope,
temporal modulation
© Copyright 2009 ACM
Summary: In this study, we proposed a feature extraction method based on the subband
temporal envelopes (STEs) and their normalization for reverberated speech
recognition. The STEs were extracted by using a series of constant bandwidth
band-pass filters with Hilbert transform followed by a low-pass filtering. In
the normalization, both the modulation spectrum (MS) of the subband temporal
envelopes of the clean and reverberated speech are normalized to a reference MS
calculated from a clean speech data set. Based on the normalized subband MS,
the inverse Fourier transform was used to restore the subband temporal
envelopes. We tested the proposed method on speech recognition in a reverberant
room with different speaker to microphone distance (SMD). For comparison, the
recognition performance of using the traditional Mel-cepstral coefficients with
mean and variance normalization were used as the baseline. Experimental results
showed that, by averaging the SMDs from 50 cm to 400 cm, there was a 44.96%
relative improvement by only using subband temporal envelope processing, and
further a 15.68% relative improvement by using the normalization on the subband
modulation spectrum. Totally, there was about a 53.59% relative improvement,
which was better than those of using other temporal filtering and normalization
methods.
[21]
Evaluation for WFST-based dialog management
Search and interface
/
Hori, Chiori
/
Ohtake, Kiyonori
/
Misu, Teruhisa
/
Kashioka, Hideki
/
Nakamura, Satoshi
Proceedings of the 3rd International Universal Communication Symposium
2009-12-03
p.255-260
Keywords: WFST optimization operation, interchange format (IF), spoken dialog,
statistical dialog management, weighted finite-state transducer (WFST)
© Copyright 2009 ACM
Summary: To construct an expandable and adaptable dialog system which handles
multiple tasks, we proposes a dialog system using a weighted finite-state
transducer (WFST) in which users concept and system action tags are input and
output of the transducer, respectively. To test the potential of the WFST-based
dialog management (DM) platform using statistical DM models, we construct a
dialog system using a human-to-human spoken dialog corpus for hotel
reservation, which is annotated with Interchange Format (IF). A scenario, a
Spoken Language Understanding (SLU) and a Sentence Generation (SG) WFSTs are
obtained from the corpus and then composed together and optimized to generate a
Dialog Management (DM) WFST. We evaluate the detection accuracy of the system
next actions using Mean Reciprocal Ranking (MRR). We evaluated how WFST
optimization operations contribute to dialog systems and confirmed the
optimization enhance the performance of accuracy of the next action detection.
[22]
Dialogue act annotation for consulting dialogue corpus
Poster session
/
Ohtake, Kiyonori
/
Misu, Teruhisa
/
Hori, Chiori
/
Kashioka, Hideki
/
Nakamura, Satoshi
Proceedings of the 3rd International Universal Communication Symposium
2009-12-03
p.372-378
© Copyright 2009 ACM
Summary: This paper introduces a new corpus of consulting dialogues, which is
designed for training a dialogue manager that can handle consulting dialogues
through spontaneous interactions from the tagged dialogue corpus. We have
collected 130 h of consulting dialogues in the tourist guidance domain. This
paper outlines our taxonomy of dialogue act annotation that can describe two
aspects of an utterances: the communicative function (speech act), and the
semantic content of the utterance. We provide an overview of the Kyoto tour
guide dialogue corpus and a preliminary analysis using the dialogue act tags.
[23]
Hyperbolic structure of fundamental frequency contour
Poster session
/
Ni, Jinfu
/
Sakai, Shinsuke
/
Kawai, Hisashi
/
Nakamura, Satoshi
Proceedings of the 3rd International Universal Communication Symposium
2009-12-03
p.389-394
Keywords: F0 control, intonation, speech prosody, speech synthesis
© Copyright 2009 ACM
Summary: In this paper, we propose an approach to transformation of fundamental
frequency (F0) contours for conversational speech synthesis. The figure
of F0 in relations to the period of cycles of sound waves is one branch
of the rectangular hyperbola. Based on a few symmetry assumptions on the
hyperbolic property, we achieve a generalized hyperbolic structure so as to
aggressively manipulate F0 contours. The modeling proves an equivalent
expression of the resonance mechanism capable for dealing with the interaction
of tone and intonation. Also, it is language-independent because no
language-dependent hypothesis is necessary. This paper describes two
applications of the hyperbolic structures of F0 contours to prosodic
information processing. One modulates the baseline F0 contours when
fusing additional makeup information onto them without altering the underlying
linguistic information. The other separates local rise/fall F0 movements
and global scale component from observed F0 contours, both being useful
for estimating dynamical F0 variation. Our experimental results are very
positive.
[24]
Spoken document retrieval using topic models
Poster session
/
Hu, Xinhui
/
Isotani, Ryosuke
/
Nakamura, Satoshi
Proceedings of the 3rd International Universal Communication Symposium
2009-12-03
p.400-403
Keywords: NMF, document topic model, spoken document retrieval
© Copyright 2009 ACM
Summary: In this paper, we propose a document topic model (DTM) based on the
non-negative matrix factorization (NMF) approach to explore spontaneous spoken
document retrieval. The model uses latent semantic indexing to detect
underlying semantic relationships within documents. Each document is
interpreted as a generative topic model belonging to many topics. The relevance
of a document to a query is expressed by the probability of a query being
generated by the model. The term-document matrix used for NMF is built
stochastically from the speech recognition N-best results, so that multiple
recognition hypotheses can be utilized to compensate for the word recognition
errors. Using this approach, experiments are conducted on a test collection
from the Corpus of Spontaneous Japanese (CSJ), with 39 queries for over 600
hours of spontaneous Japanese speech. The retrieval performance of this model
is proved to be superior to the conventional vector space model (VSM) when the
dimension or topic number exceeds a certain threshold. Moreover, whether from
the viewpoint of retrieval performance or the ability of topic expression, the
NMF-based topic model is verified to surpass another latent indexing method
that is based on the singular value decomposition (SVD). The extent to which
this topic model can resist speech recognition error, which is a special
problem of spoken document retrieval, is also investigated.
[25]
Soft margin estimation on improving environment structures for ensemble
speaker and speaking environment modeling
Poster session
/
Tsao, Yu
/
Li, Jinyu
/
Lee, Chin-Hui
/
Nakamura, Satoshi
Proceedings of the 3rd International Universal Communication Symposium
2009-12-03
p.404-408
Keywords: ASR, ESSEM, SME, model adaptation, noise robustness
© Copyright 2009 ACM
Summary: Recently, we proposed an ensemble speaker and speaking environment modeling
(ESSEM) approach to enhance the robustness of automatic speech recognition
(ASR) under adverse conditions. The ESSEM framework comprises two phases,
offline and online phases. In the offline phase, we prepare an environment
structure that is formed by multiple sets of hidden Markov models (HMMs). Each
HMM set represents a particular speaker and speaking environment. In the online
phase, ESSEM estimates a mapping function to transform the prepared environment
structure to a set of HMMs for the unknown testing condition. In this study, we
incorporate the soft margin estimation (SME) to increase the discriminative
power of the environment structure in the offline stage and therefore enhance
the overall ESSEM performance. We evaluated the performance on the Aurora-2
connected digit database. With the SME refined environment structure, ESSEM
provides better performance than the original framework. By using our best
online mapping function, ESSEM achieves a word error rate (WER) of 4.62%,
corresponding to 14.60% relative WER reduction (from 5.41% to 4.62%) over the
best baseline performance of 5.41% WER.