Supervised learning algorithms in the classification of plant populations with different degrees of kinship

Nenhuma Miniatura disponível

Data

2021-02-04

Orientador

Coorientador

Pós-graduação

Curso de graduação

Título da Revista

ISSN da Revista

Título de Volume

Editor

Soc Botanica Sao Paulo

Tipo

Artigo

Direito de acesso

Resumo

The population discrimination and the classification of individuals have great importance for genetic improvement in population studies and genetic diversity conservation. Furthermore, multivariate approaches are often used, especially the Fisher and Anderson discriminant functions. New methodologies based on machine learning (ML) have shown to be promising for such procedures, but there is nonetheless a need for further evaluation and comparison of these methods. Thus, the present study evaluates the efficacy of supervised ML algorithms in classifying populations with different degrees of similarity-comparing them with discriminant analysis techniques proposed by Anderson and by Fisher. The methods of supervised ML tested were as follows: Naive Bayes, Decision Tree, k-Nearest Neighbors (kNN), Random Forest, Support Vector Machine (SVM) and Multi-layer Perceptron Neural Networks (MLP/ANN). To compare classification methods, we used phenotypic data of populations with different degrees of genetic similarity. Data stemmed from the genotypic information simulation for different populations submitted to the backcrossing scheme. Accuracy here means 30 repetitions from each classification method were compared by the Friedman and Nemenyi tests with a 95% confidence level. Classification methods based on machine learning algorithms showed superior results to the Fisher and Anderson discriminant functions, obtaining high accuracy where there was a higher similarity between populations. The kNN, Random Forest, SVM and Naive Bayes algorithms presented the highest accuracy, surpassing the Decision Tree algorithm and even MLP/ANN (which lost accuracy at a 96.88% similarity condition between populations). Thus, the present work confirms that ML techniques demonstrate greater accuracy in the discrimination and classification of populations without the limitations of statistical techniques.

Descrição

Idioma

Inglês

Como citar

Brazilian Journal Of Botany. Sao Paulo: Soc Botanica Sao Paulo, 9 p., 2021.

Itens relacionados

Coleções