Date of Award
Master of Science (MS)
Computer Engineering and Sciences
Alan C. Leonard
Marius C. Silaghi
Philip J. Bernhard
One of the difficulties of genetic research is the asymmetrical relationship between data collection techniques and data analysis techniques. The goal of this research was to test a novel application of non-negative matrix factorization, which would allow researchers to more easily identify co-mutations. Those co-mutations then can then be further verified by frequency analysis. This pruning process allows researchers to identify more fruitful research opportunities, saving time, energy, and funding. Past research has utilized non-negative matrix factorization to extract factors which meaningfully express underlying data features. This study extends the depth of non-negative matrix factorization knowledge in various ways. First, a novel cost function was utilized to convert raw genetic data into numerical values appropriate for matrix operations. Second, this research utilized the alternating non-negative least squares matrix factorization variant for its faster convergence time compared to the more traditional multiplicative update approach. Third, traditionally data sets were not factored at multiple factor counts, but this study extends previously established methods by performing an analysis over multiple factor counts. Fourth, this study suggests evidence that factors produced by non-negative matrix factorization contain co-mutations, which were verified by a statistical analysis. Fifth, this study demonstrated that non-negative matrix factorization has an unsupervised ability to partition a data set into chronologically separated clusters. This research indicates that non-negative matrix factorization is a scalable algorithm for identifying genetic co-mutations within a practical computational time frame.
Kolar, Michael Robert, "Non-negative Matrix Factorization in the Identification of Co-mutations" (2022). Theses and Dissertations. 892.