Date of Award
12-2022
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computer Engineering and Sciences
First Advisor
Veton Z. Këpuska
Second Advisor
Marius C. Silaghi
Third Advisor
Georgios C. Anagnostopoulos
Fourth Advisor
Ivica N. Kostanic
Abstract
The efficient use of a communication bandwidth starts with the data source. The features of the speech signals can be extracted and reconstructed to lower the Internet traffic of the acoustic artificial agents and to increase the quality of the automatic speech recognition systems. The Speech Quefrency Transform (SQT) is hereby introduced in the work to enrich the communication space between the artificial agents and mankind. We describe the motivation, methodology, and deep learning approach in detail as we apply the SQT technology to several applications: sharp pitch track extraction, real-time speech communications, and emotion recognition. The results were excellent. The work proves that the acceleration is the unit of quefrency and advocates for the adoption of the geometric scale for the cepstrum domain. It also proposes spectral banking to model the quefrency filters by the means of controlling the spectral leakage. This dissertation shows how to generate, combine, and apply the filters.
Recommended Citation
Hasanain, Ahmad Zuhair S., "Deep Learning Approach to Speech Recognition: A Signal Extractor & Producer for Artificial General Intelligence" (2022). Theses and Dissertations. 811.
https://repository.fit.edu/etd/811