Publicação: Supervised learning algorithms in the classification of plant populations with different degrees of kinship
Carregando...
Data
Orientador
Coorientador
Pós-graduação
Curso de graduação
Título da Revista
ISSN da Revista
Título de Volume
Editor
Soc Botanica Sao Paulo
Tipo
Artigo
Direito de acesso
Resumo
The population discrimination and the classification of individuals have great importance for genetic improvement in population studies and genetic diversity conservation. Furthermore, multivariate approaches are often used, especially the Fisher and Anderson discriminant functions. New methodologies based on machine learning (ML) have shown to be promising for such procedures, but there is nonetheless a need for further evaluation and comparison of these methods. Thus, the present study evaluates the efficacy of supervised ML algorithms in classifying populations with different degrees of similarity-comparing them with discriminant analysis techniques proposed by Anderson and by Fisher. The methods of supervised ML tested were as follows: Naive Bayes, Decision Tree, k-Nearest Neighbors (kNN), Random Forest, Support Vector Machine (SVM) and Multi-layer Perceptron Neural Networks (MLP/ANN). To compare classification methods, we used phenotypic data of populations with different degrees of genetic similarity. Data stemmed from the genotypic information simulation for different populations submitted to the backcrossing scheme. Accuracy here means 30 repetitions from each classification method were compared by the Friedman and Nemenyi tests with a 95% confidence level. Classification methods based on machine learning algorithms showed superior results to the Fisher and Anderson discriminant functions, obtaining high accuracy where there was a higher similarity between populations. The kNN, Random Forest, SVM and Naive Bayes algorithms presented the highest accuracy, surpassing the Decision Tree algorithm and even MLP/ANN (which lost accuracy at a 96.88% similarity condition between populations). Thus, the present work confirms that ML techniques demonstrate greater accuracy in the discrimination and classification of populations without the limitations of statistical techniques.
Descrição
Palavras-chave
Classification methods, Genetic improvement, Machine learning, Similarity between populations
Idioma
Inglês
Como citar
Brazilian Journal Of Botany. Sao Paulo: Soc Botanica Sao Paulo, 9 p., 2021.