• This email address is being protected from spambots. You need JavaScript enabled to view it.

"Research is what I'm doing when I don't know what I'm doing." ~ Wernher von Braun

Optimal Feature Selection for Decision Trees Induction Using a Genetic Algorithm Wrapper - A Model Approach


The aim of this paper is to describe an approach to a sophisticated model of optimised subsets of data classification. This effort refers to a seemingly parallel processing of two algorithms, in order to successfully classify

features through optimization processing, using a wrapping method in order to decrease overfitting and maintain accuracy. A wrapping method measures how useful the features are through the classifier’s performance optimisation. In cases where big datasets are classified the risk of overfitting to occur is high. Thus, instead of classifying big datasets, a “smarter” approach is used by classifying subsets of data, also called chromosomes, using a genetic algorithm. The genetic algorithm is used to find the best combinations of chromosomes from a series of combinations called generations. The genetic algorithm will produce a big number of chromosomes of certain number of attributes, also called genes, that will be classified from the decision tree and they will get a fitness number. This fitness number refers to classification accuracy that each chromosome got from the classification process. Only the strongest chromosomes will pass on the next generation. This method reduces the size of genes classified, eliminating at the same time the risk of overfitting. At the end, the fittest chromosomes or sets of genes or subsets of attributes will be represented. This method helps on faster and more accurate decision making. Applications of this wrapper can be used in digital marketing campaigns metrics, analytics metrics, website ranking factors, content curation, keyword research, consumer/visitor behavior analysis and other areas of marketing and business interest.

Keywords: Decision trees, Genetic algorithm, Data classification; Data optimization, Overfitting, Classification accuracy, Chromosomes Genes 

Theodoridis P.K., Gkikas D.C. (2020). Optimal Feature Selection for Decision Trees Induction Using a Genetic Algorithm Wrapper - A Model Approach. In: Kavoura A., Kefallonitis E., Theodoridis P. (Eds.), Strategic Innovative Marketing and Tourism. Springer Proceedings in Business and Economics. 8th ICSIMAT Strategic Innovative Marketing and Tourism, Northern Aegean, Greece, 2019. Springer, Cham. https://doi.org/10.1007/978-3-030-36126-6_65

© 2024 Dimitris C. Gkikas. All Rights Reserved.