Date of Award

12-2022

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Engineering and Sciences

First Advisor

Debasis Mitra

Second Advisor

Alan C. Leonard

Third Advisor

Marius C. Silaghi

Fourth Advisor

Philip J. Bernhard

Abstract

One of the difficulties of genetic research is the asymmetrical relationship between data collection techniques and data analysis techniques. The goal of this research was to test a novel application of non-negative matrix factorization, which would allow researchers to more easily identify co-mutations. Those co-mutations then can then be further verified by frequency analysis. This pruning process allows researchers to identify more fruitful research opportunities, saving time, energy, and funding. Past research has utilized non-negative matrix factorization to extract factors which meaningfully express underlying data features. This study extends the depth of non-negative matrix factorization knowledge in various ways. First, a novel cost function was utilized to convert raw genetic data into numerical values appropriate for matrix operations. Second, this research utilized the alternating non-negative least squares matrix factorization variant for its faster convergence time compared to the more traditional multiplicative update approach. Third, traditionally data sets were not factored at multiple factor counts, but this study extends previously established methods by performing an analysis over multiple factor counts. Fourth, this study suggests evidence that factors produced by non-negative matrix factorization contain co-mutations, which were verified by a statistical analysis. Fifth, this study demonstrated that non-negative matrix factorization has an unsupervised ability to partition a data set into chronologically separated clusters. This research indicates that non-negative matrix factorization is a scalable algorithm for identifying genetic co-mutations within a practical computational time frame.

Share

COinS