Date of Award

11-2014

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Engineering and Sciences

First Advisor

Veton Z. Këpuska

Second Advisor

Samuel P. Kozaitis

Third Advisor

Marius C. Silaghi

Abstract

The feature analysis component of an Automated Speaker Recognition (ASR) system plays a crucial role in the overall performance of the system. There are many feature extraction techniques available, but ultimately we want to maximize the performance of these systems. From this point of view, the algorithms developed to compute feature components are analyzed. Current state-of-the-art ASR systems perform quite well in a controlled environment where the speech signal is noise free. The objective of this thesis investigates the results that can be obtained when you combine Mel-Frequency Cepstral Coefficients (MFCC) and Gammatone Frequency Cepstral Coefficients (GFCC) as feature components for the front-end processing of an ASR. The MFCC and GFCC feature components combined are suggested to improve the reliability of a speaker recognition system. The MFCC are typically the “de facto” standard for speaker recognition systems because of their high accuracy and low complexity; however they are not very robust at the presence of additive noise. The GFCC features in recent studies have shown very good robustness against noise and acoustic change. The main idea is to integrate MFCC & GFCC features to improve the overall ASR system performance in low signal to noise ratio (SNR) conditions. The experiment are conducted on the Texas Instruments and Massachusetts Institute of Technology (TIMIT) and the English Language Speech Database for Speaker Recognition (ELSDR) databases, were the test utterances are mixed with noises at various SNR levels to simulate the channel change. The results provide an empirical comparison of the MFCC-GFCC combined features and the individual counterparts.

Share

COinS