Date of Award

12-2016

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Engineering and Sciences

First Advisor

Gregory Harrison

Second Advisor

John Barranti

Third Advisor

Ronaldo Menezes

Fourth Advisor

David Mutschler

Abstract

In data mining and machine learning systems, highly predictive feature attributes help researchers and data analysts classify the problem space in the most optimal way. In this thesis paper, a feature subset selection classifier system for data mining and machine learning will be presented. This feature subset selection system will test data from the PIMA Indian Diabetes Data and the Breiman Waveform Data Set. The system collects the input features and then processes them through an evolutionary search process in order to reduce the feature space into an optimal feature set. Reducing irrelevant attributes can decrease the complexity and processing of machine learning systems, and increase the classification accuracy of data mining systems. The design approach is to apply a genetic algorithm (GA), which generates an initial output, and then use an artificial neural network (ANN) to process the feature input to find the most optimal feature attributes. This architecture is called GBANN-V, for a Genetically Based Artificial Neural Network with Validation. The validation step is used to speed up ANN training and reduce overfitting. In addition, step decay is added to minimize the local minima problem associated with the gradient descent learning algorithm. Experimental results show that the GBANN-V system can get equal to better accuracy rates in finding the most optimal feature subsets, while at the same time reducing the input feature dimensionality up to 40%. Feature set reduction has positive implications for data mining and machine learning applications.

Share

COinS