Date of Award
5-2017
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Engineering and Sciences
First Advisor
Veton Z. Kepuska
Second Advisor
Philip Chan
Third Advisor
Ivica Kostanic
Fourth Advisor
Georgios C. Anagnostopoulos
Abstract
Even thought the cutting-edge speaker-independent Automatic Speech Recognition (ASR) systems demand big training data, they barely handle time-varying speaking rates, tolerate various uttering alterations, or are robust to noise. In contrast, our Wake-Up-Word (WUW) technique is tuned to these challenges in the light ASR systems with minimal number of initial training samples. It is crucial that users of ASR systems be capable of rolling out new WUW calls swiftly and modifying ASR vocabulary at any time, such as in the cases of foreign WUW addition and adaptation to phonetic change. We had tested our proposed methodologies in the acoustic WUW-II corpus [17], and they guaranteed roughly 89% (±0.5%) for both Out-Of-Vocabulary (OOV) and In-Vocabulary (INV) word recognition rates. We recommend dual directional (Bidirectional) Dynamic Time Warping (BDTW), a chronological contrast model, and a semi-supervised training procedure. Not only can BDTW produce accurate time alignment of phonemic states, but it can also be utilized for WUW isolation, whereby boundaries of similar sounding patterns are precisely located for autonomous segmenting/retrieval of WUWs from continuous speech streams. A suggested distance/similarity model extracts time warping from the contrast of phones comprising WUWs themselves by exploiting acoustic evidence up front. Hence, minimal prior knowledge about language is needed with the proposed solutions. Additionally, just one utterance is capable of initiating speaker-independent ASR systems when incremental learning is enabled after each test such that the cognitive matching utterance of a speaker is the most probable hypothesis given that each session contains at least one similar utterance. This work also conveys an overview and analysis of fundamentals, implementations, and most importantly, results of empirical tests and future focus.
Recommended Citation
Hasanain, Ahmad Zuhair S., "Analysis, A Technique, and Incremental Learning of Wake-Up-Word Speech Recognition" (2017). Theses and Dissertations. 864.
https://repository.fit.edu/etd/864
Comments
Copyright held by author