Logo do repositório
 

Feature extraction approaches for biological sequences: A comparative study of mathematical features

dc.contributor.authorBonidia, Robson P.
dc.contributor.authorSampaio, Lucas D.H.
dc.contributor.authorDomingues, Douglas S. [UNESP]
dc.contributor.authorPaschoal, Alexandre R.
dc.contributor.authorLopes, Fabrício M.
dc.contributor.authorde Carvalho, André C.P.L.F.
dc.contributor.authorSanches, Danilo S.
dc.contributor.institutionUniversidade de São Paulo (USP)
dc.contributor.institutionThe Federal University of Technology - Paraná (UTFPR)
dc.contributor.institutionUniversidade Estadual de Londrina (UEL)
dc.contributor.institutionUniversidade Estadual Paulista (UNESP)
dc.contributor.institutionUniversidade Federal do Paraná (UFPR)
dc.date.accessioned2022-04-28T19:45:14Z
dc.date.available2022-04-28T19:45:14Z
dc.date.issued2021-01-01
dc.description.abstractAs consequence of the various genomic sequencing projects, an increasing volume of biological sequence data is being produced. Although machine learning algorithms have been successfully applied to a large number of genomic sequence-related problems, the results are largely affected by the type and number of features extracted. This effect has motivated new algorithms and pipeline proposals, mainly involving feature extraction problems, in which extracting significant discriminatory information from a biological set is challenging. Considering this, our work proposes a new study of feature extraction approaches based on mathematical features (numerical mapping with Fourier, entropy and complex networks). As a case study, we analyze long non-coding RNA sequences. Moreover, we separated this work into three studies. First, we assessed our proposal with the most addressed problem in our review, e.g. lncRNA and mRNA; second, we also validate the mathematical features in different classification problems, to predict the class of lncRNA, e.g. circular RNAs sequences; third, we analyze its robustness in scenarios with imbalanced data. The experimental results demonstrated three main contributions: first, an in-depth study of several mathematical features; second, a new feature extraction pipeline; and third, its high performance and robustness for distinct RNA sequence classification.en
dc.description.affiliationInstitute of Mathematics and Computer Sciences University of São Paulo - USP
dc.description.affiliationThe Federal University of Technology - Paraná (UTFPR)
dc.description.affiliationThe State University of Londrina
dc.description.affiliationPolytechnic School The University of São Paulo
dc.description.affiliationThe São Paulo State University
dc.description.affiliationThe University of São Paulo
dc.description.affiliationThe Federal University of Paraná (UFPR)
dc.description.affiliationThe University of São Paulo (USP)
dc.description.affiliationThe Department of Computer Science University of São Paulo
dc.description.affiliationUnespThe São Paulo State University
dc.description.sponsorshipUniversidade Federal do Paraná
dc.description.sponsorshipPró-Reitoria de Pesquisa, Universidade Federal do Rio Grande do Sul
dc.description.sponsorshipCoordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
dc.description.sponsorshipIdPró-Reitoria de Pesquisa, Universidade Federal do Rio Grande do Sul: 88887.144045/2017-00
dc.description.sponsorshipIdCAPES: PROEX-11919694/D
dc.identifierhttp://dx.doi.org/10.1093/bib/bbab011
dc.identifier.citationBriefings in Bioinformatics, v. 22, n. 5, 2021.
dc.identifier.doi10.1093/bib/bbab011
dc.identifier.issn1477-4054
dc.identifier.issn1467-5463
dc.identifier.scopus2-s2.0-85115965809
dc.identifier.urihttp://hdl.handle.net/11449/222520
dc.language.isoeng
dc.relation.ispartofBriefings in Bioinformatics
dc.sourceScopus
dc.subjectBiological sequences
dc.subjectComplex networks
dc.subjectEntropy
dc.subjectFeature extraction
dc.subjectFourier
dc.subjectNumerical mapping
dc.titleFeature extraction approaches for biological sequences: A comparative study of mathematical featuresen
dc.typeResenha
dspace.entity.typePublication

Arquivos

Coleções