Date of Award
12-2016
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Engineering and Sciences
First Advisor
Gregory Harrison
Second Advisor
John Barranti
Third Advisor
Ronaldo Menezes
Fourth Advisor
David Mutschler
Abstract
In data mining and machine learning systems, highly predictive feature attributes help researchers and data analysts classify the problem space in the most optimal way. In this thesis paper, a feature subset selection classifier system for data mining and machine learning will be presented. This feature subset selection system will test data from the PIMA Indian Diabetes Data and the Breiman Waveform Data Set. The system collects the input features and then processes them through an evolutionary search process in order to reduce the feature space into an optimal feature set. Reducing irrelevant attributes can decrease the complexity and processing of machine learning systems, and increase the classification accuracy of data mining systems. The design approach is to apply a genetic algorithm (GA), which generates an initial output, and then use an artificial neural network (ANN) to process the feature input to find the most optimal feature attributes. This architecture is called GBANN-V, for a Genetically Based Artificial Neural Network with Validation. The validation step is used to speed up ANN training and reduce overfitting. In addition, step decay is added to minimize the local minima problem associated with the gradient descent learning algorithm. Experimental results show that the GBANN-V system can get equal to better accuracy rates in finding the most optimal feature subsets, while at the same time reducing the input feature dimensionality up to 40%. Feature set reduction has positive implications for data mining and machine learning applications.
Recommended Citation
Worden, Eric W., "GBANN-V: Genetically Based Artificial Neural Network With Validation for Optimal Feature Selection and Data Mining" (2016). Theses and Dissertations. 735.
https://repository.fit.edu/etd/735