SEARCH
MENU

History of Robotics Research and Development of Japan2013Integration, Intelligence, etc.Real-world Robust Microphone Array Processing for Robot Audition

Keisuke NakamuraHonda Research Institute Japan Co., Ltd.
Hiroki MiuraTokyo Institute of Technology
Takami YoshidaTokyo Institute of Technology
Yoshiaki BandoKyoto University
Takeshi MizumotoHonda Research Institute Japan Co., Ltd.
Kazuhiro NakadaiHonda Research Institute Japan Co., Ltd.
Hiroshi G. OkunoKyoto University
Automatic Speech Recognition (ASR) by a robot is one of the most common human-robot interactions and is essential for a robot working in a human environment. Different from ASR, conducted via a close-talking microphone as observed in mobile phone applications, ASR by a robot has to use the robot's own microphones, and the location, timing, and number of speech sources are unknown. Moreover, because a robot has its own ego-noise (self-generated noise), the signal-to-noise ratio (SNR) of the speech is low in most of the cases. To achieve ASR in a highly dynamic and noisy environment, robot audition [1] has been studied for a decade. Figure 1 shows the overview of the robot audition system. In addition to the signal processing for ASR by a close-talking microphone, the robot audition system employs microphone array processing such as sound source localization (SSL) and sound source separation (SSS) to recognize an unknown number of individual sources of speech conducted simultaneously. Because SSL estimates the location, timing, and number of speech sources, and SSS separates the speech from each source, an improvement in the performances of SSL and SSS is essential. The microphone array processing generally has two assumptions; geometry information on a microphone array or time-consuming measurements of transfer functions between a microphone array and a sound source is necessary, and a fully synchronous microphones should be used. To solve these issues, we proposed in [2] an online framework for microphone array calibration by combining simultaneous localization and mapping (SLAM), and beamforming (See Figure 2). SLAM simultaneously calibrates locations of microphones and a sound source, and clocks differences between microphones every time a microphone array observes a sound event such as hand claps. In [3], we developed a self-calibration microphone array method using self-emitted sounds coming from the robot. In [4], we considered the practicality of using current robot audition systems in a highly dynamic and noisy environment and recognized that they have three problems: they are not adequately noise-robust for ego-noise, their resolution is not sufficiently high, and they exhibit difficulties during real-time implementation. To solve these three issues, we proposed in [4] SSL based on the spatial whitening of ego-noise using generalized singular value decomposition, the super-resolution SSL and SSS based on hybrid transfer function interpolation, and the optimal hierarchical SSL based on a coarse-to-fine approach (See Figure 3). Applying these methods to real robots, we have achieved the simultaneous speech recognition robot and SSL using a flying quadrocopter in an outdoor environment. 2th RSJ Advanced Robotics Best Paper Award in 2014. IROS 2011 Best Paper Award Finalist in 2011. IEEE Robotics and Automation Society Japan Chapter Young Award in 2013.
ロボット聴覚システムと課題
Robot Audition System and its Issues
非同期マイクロホンアレイのオンラインキャリブレーション
Online Calibration of an Asynchronous Microphone Array
雑音ロバストな実時間超解像ロボット聴覚システム
Real-time Noise-robust Super-resolution Robot Audition System

Movies


Correspondence papers


Keisuke Nakamura, Kazuhiro Nakadai, Hiroshi G. Okuno:A real-time super-resolution robot audition system that improves the robustness of simultaneous speech recognition

Advanced Robotics, Vol. 27, No. 12, pp. 933-945, 2013.

Hiroki Miura, Takami Yoshida, Keisuke Nakamura, Kazuhiro Nakadai:SLAM-based Online Calibration of Asynchronous Microphone Array for Robot Audition

in Proceedings of 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011), pp. 524-529, 2011.

Yoshiaki Bando, Takeshi Mizumoto, Katsutoshi Itoyama, Kazuhiro Nakadai, Hiroshi G. Okuno:Posture estimation of hose-shaped robot using microphone array localization

in Proceedings of 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2013), pp. 3446-3451, 2013.

Related papers


[1] Kazuhiro Nakadai, Toru Takahashi, Hiroshi G. Okuno, Hirofumi Nakajima, Yuji Hasegawa, Hiroshi Tsujino, ""Design and Implementation of Robot Audition System 'HARK' — Open Source Software for Listening to Three Simultaneous Speakers"", Advanced Robotics, Vol. 24, No. 5-6, pp. 739-761, 2010.
[2] Hiroki Miura, Takami Yoshida, Keisuke Nakamura, Kazuhiro Nakadai, ""SLAM-based Online Calibration for Asynchronous Microphone Array"", Advanced Robotics, Vol. 26, No. 17, pp. 1941-1965, 2012.
[3] Yoshiaki Bando, Takuma Otsuka, Kazuhiro Nakadai, Satoshi Tadokoro, Masashi Konyo, Katsutoshi Itoyama, Hiroshi G. Okuno, ""Posture Estimation of Hose-shaped Robot by using Active Microphoe Array"", Advanced Robotics, Vol. 29, No. 1, pp. 35-49, 2015.
[4] Keisuke Nakamura, Kazuhiro Nakadai, Hiroshi G. Okuno, ""A real-time super-resolution robot audition system that improves the robustness of simultaneous speech recognition"", Advanced Robotics, Vol. 27, No. 12, pp. 933-945, 2013.

Related Article

記事はありません。