Date of Award

12-2019

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Engineering and Sciences

First Advisor

Adrian Peter

Second Advisor

Anthony Smith

Third Advisor

Luis D. Otero

Fourth Advisor

Philip Bernhard

Abstract

In recent years, Natural Language Processing in the field of machine learning has seen some major improvements. Data scientists have shown that neural networks are capable of breaking down semantics of sentences, translating languages, and answering complex questions with fast recall. While impressive, these feats all hinge on having access to a massive amount of clean text, or data sets with almost perfect grammar and spelling. Without this, neural networks will usually fail to converge on a meaningful result. To partially this dependency, Grapheme-to-Phoneme conversion can be employed. This is the conversion of words from their spellings to a form that more closely matches their pronunciations. Since most spelling errors hold their phonetic pronunciation, word conversion to phonemes should improve network convergence in datasets that contain occasional spelling errors. Phoneme conversion is a well-researched topic, with state-of-the-art models having a 20% word error rate. This error rate stems from model training being stopped early to retain accuracy on out-of-vocabulary words. To alleviate this, this paper employs the use of Neural Arithmetic Logic Units. A recent study on these neurons show that they have greatly increased generalization capabilities over standard neural network layers. When used in a recurrent attention mechanism, phoneme conversion models overfit at a much slower rate, allowing for a word error rate of less than 10%.

Share

COinS