Publicação: Supervised learning algorithms in the classification of plant populations with different degrees of kinship
dc.contributor.author | Skowronski, Leandro | |
dc.contributor.author | Moraes, Paula Martin de | |
dc.contributor.author | Teixeira de Moraes, Mario Luiz [UNESP] | |
dc.contributor.author | Goncalves, Wesley Nunes | |
dc.contributor.author | Constantino, Michel | |
dc.contributor.author | Costa, Celso Soares | |
dc.contributor.author | Fava, Wellington Santos | |
dc.contributor.author | Costa, Reginaldo B. | |
dc.contributor.institution | Univ Catolica Dom Bosco | |
dc.contributor.institution | Fed Univ Grande Dourados | |
dc.contributor.institution | Universidade Estadual Paulista (Unesp) | |
dc.contributor.institution | Universidade Federal de Mato Grosso do Sul (UFMS) | |
dc.contributor.institution | Fed Inst Educ Sci & Technol Mato Grosso do Sul | |
dc.date.accessioned | 2021-06-25T11:50:32Z | |
dc.date.available | 2021-06-25T11:50:32Z | |
dc.date.issued | 2021-02-04 | |
dc.description.abstract | The population discrimination and the classification of individuals have great importance for genetic improvement in population studies and genetic diversity conservation. Furthermore, multivariate approaches are often used, especially the Fisher and Anderson discriminant functions. New methodologies based on machine learning (ML) have shown to be promising for such procedures, but there is nonetheless a need for further evaluation and comparison of these methods. Thus, the present study evaluates the efficacy of supervised ML algorithms in classifying populations with different degrees of similarity-comparing them with discriminant analysis techniques proposed by Anderson and by Fisher. The methods of supervised ML tested were as follows: Naive Bayes, Decision Tree, k-Nearest Neighbors (kNN), Random Forest, Support Vector Machine (SVM) and Multi-layer Perceptron Neural Networks (MLP/ANN). To compare classification methods, we used phenotypic data of populations with different degrees of genetic similarity. Data stemmed from the genotypic information simulation for different populations submitted to the backcrossing scheme. Accuracy here means 30 repetitions from each classification method were compared by the Friedman and Nemenyi tests with a 95% confidence level. Classification methods based on machine learning algorithms showed superior results to the Fisher and Anderson discriminant functions, obtaining high accuracy where there was a higher similarity between populations. The kNN, Random Forest, SVM and Naive Bayes algorithms presented the highest accuracy, surpassing the Decision Tree algorithm and even MLP/ANN (which lost accuracy at a 96.88% similarity condition between populations). Thus, the present work confirms that ML techniques demonstrate greater accuracy in the discrimination and classification of populations without the limitations of statistical techniques. | en |
dc.description.affiliation | Univ Catolica Dom Bosco, Campo Grande, MS, Brazil | |
dc.description.affiliation | Fed Univ Grande Dourados, Dourados, MS, Brazil | |
dc.description.affiliation | Paulista State Univ Julio de Mesquita Filho, Ilha Solteira, SP, Brazil | |
dc.description.affiliation | Univ Fed Mato Grosso do Sul, Campo Grande, MS, Brazil | |
dc.description.affiliation | Fed Inst Educ Sci & Technol Mato Grosso do Sul, Campo Grande, MS, Brazil | |
dc.description.affiliation | Univ Fed Mato Grosso do Sul, Inst Biosci, Lab Ecol & Evolutionary Biol, BR-79070900 Campo Grande, MS, Brazil | |
dc.description.affiliationUnesp | Paulista State Univ Julio de Mesquita Filho, Ilha Solteira, SP, Brazil | |
dc.description.sponsorship | Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) | |
dc.description.sponsorship | Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) | |
dc.description.sponsorshipId | CAPES: PNPD/CAPES 88882.315120/2019-01 | |
dc.description.sponsorshipId | CAPES: CNPq 301840/2016-4 | |
dc.format.extent | 9 | |
dc.identifier | http://dx.doi.org/10.1007/s40415-021-00703-1 | |
dc.identifier.citation | Brazilian Journal Of Botany. Sao Paulo: Soc Botanica Sao Paulo, 9 p., 2021. | |
dc.identifier.doi | 10.1007/s40415-021-00703-1 | |
dc.identifier.issn | 0100-8404 | |
dc.identifier.uri | http://hdl.handle.net/11449/209174 | |
dc.identifier.wos | WOS:000614671900001 | |
dc.language.iso | eng | |
dc.publisher | Soc Botanica Sao Paulo | |
dc.relation.ispartof | Brazilian Journal Of Botany | |
dc.source | Web of Science | |
dc.subject | Classification methods | |
dc.subject | Genetic improvement | |
dc.subject | Machine learning | |
dc.subject | Similarity between populations | |
dc.title | Supervised learning algorithms in the classification of plant populations with different degrees of kinship | en |
dc.type | Artigo | pt |
dcterms.rightsHolder | Soc Botanica Sao Paulo | |
dspace.entity.type | Publication | |
unesp.author.orcid | 0000-0002-1076-9812[3] | |
unesp.author.orcid | 0000-0003-2570-0209[5] | |
unesp.author.orcid | 0000-0001-7040-7058[6] | |
unesp.author.orcid | 0000-0002-3608-0503[7] | |
unesp.campus | Universidade Estadual Paulista (UNESP), Faculdade de Engenharia, Ilha Solteira | pt |