Deep Learning Approach to Speech Recognition: A Signal Extractor & Producer for Artificial General Intelligence
Date of Award
Doctor of Philosophy (PhD)
Computer Engineering and Sciences
Veton Z. Këpuska
Marius C. Silaghi
Georgios C. Anagnostopoulos
Ivica N. Kostanic
The efficient use of a communication bandwidth starts with the data source. The features of the speech signals can be extracted and reconstructed to lower the Internet traffic of the acoustic artificial agents and to increase the quality of the automatic speech recognition systems. The Speech Quefrency Transform (SQT) is hereby introduced in the work to enrich the communication space between the artificial agents and mankind. We describe the motivation, methodology, and deep learning approach in detail as we apply the SQT technology to several applications: sharp pitch track extraction, real-time speech communications, and emotion recognition. The results were excellent. The work proves that the acceleration is the unit of quefrency and advocates for the adoption of the geometric scale for the cepstrum domain. It also proposes spectral banking to model the quefrency filters by the means of controlling the spectral leakage. This dissertation shows how to generate, combine, and apply the filters.
Hasanain, Ahmad Zuhair S., "Deep Learning Approach to Speech Recognition: A Signal Extractor & Producer for Artificial General Intelligence" (2022). Theses and Dissertations. 811.