Logo do repositório

A machine learning based framework to identify and classify long terminal repeat retrotransposons

dc.contributor.authorSchietgat, Leander
dc.contributor.authorVens, Celine
dc.contributor.authorCerri, Ricardo
dc.contributor.authorFischer, Carlos N. [UNESP]
dc.contributor.authorCosta, Eduardo
dc.contributor.authorRamon, Jan
dc.contributor.authorCarareto, Claudia M. A. [UNESP]
dc.contributor.authorBlockeel, Hendrik
dc.contributor.institutionKU Leuven
dc.contributor.institutionKU Leuven Kulak
dc.contributor.institutionGhent University and VIB Inflammation Research Center
dc.contributor.institutionUniversidade Federal de São Carlos (UFSCar)
dc.contributor.institutionUniversidade Estadual Paulista (Unesp)
dc.contributor.institutionUniversidade de São Paulo (USP)
dc.contributor.institutionINRIA Lille Nord Europe
dc.date.accessioned2018-12-11T17:19:49Z
dc.date.available2018-12-11T17:19:49Z
dc.date.issued2018-04-01
dc.description.abstractTransposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-Learner, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: RepeatMasker, Censor and LtrDigest. In contrast to these methods, TE-Learner is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance, while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-Learner’s predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE.en
dc.description.affiliationDepartment of Computer Science KU Leuven
dc.description.affiliationDepartment of Public Health and Primary Care KU Leuven Kulak
dc.description.affiliationDepartment of Respiratory Medicine Ghent University and VIB Inflammation Research Center
dc.description.affiliationDepartment of Computer Science UFSCar Federal University of São Carlos
dc.description.affiliationDepartment of Statistics Applied Mathematics and Computer Science UNESP São Paulo State University
dc.description.affiliationInstituto de Ciências Matemáticas e de Computação Universidade de São Paulo
dc.description.affiliationINRIA Lille Nord Europe, 40 avenue Halley
dc.description.affiliationDepartment of Biology UNESP São Paulo State University São José do Rio Preto
dc.description.affiliationUnespDepartment of Statistics Applied Mathematics and Computer Science UNESP São Paulo State University
dc.description.affiliationUnespDepartment of Biology UNESP São Paulo State University São José do Rio Preto
dc.identifierhttp://dx.doi.org/10.1371/journal.pcbi.1006097
dc.identifier.citationPLoS Computational Biology, v. 14, n. 4, 2018.
dc.identifier.doi10.1371/journal.pcbi.1006097
dc.identifier.file2-s2.0-85046367727.pdf
dc.identifier.issn1553-7358
dc.identifier.issn1553-734X
dc.identifier.lattes3425772998319216
dc.identifier.orcid0000-0002-0298-1354
dc.identifier.scopus2-s2.0-85046367727
dc.identifier.urihttp://hdl.handle.net/11449/176256
dc.language.isoeng
dc.relation.ispartofPLoS Computational Biology
dc.relation.ispartofsjr3,097
dc.rights.accessRightsAcesso aberto
dc.sourceScopus
dc.titleA machine learning based framework to identify and classify long terminal repeat retrotransposonsen
dc.typeArtigo
dspace.entity.typePublication
unesp.author.lattes1858554355077119[4]
unesp.author.lattes3425772998319216[7]
unesp.author.orcid0000-0003-0983-256X[2]
unesp.author.orcid0000-0002-2582-1695[3]
unesp.author.orcid0000-0002-5598-6263[4]
unesp.author.orcid0000-0003-0378-3699[8]
unesp.author.orcid0000-0002-0298-1354[7]
unesp.campusUniversidade Estadual Paulista (UNESP), Instituto de Biociências Letras e Ciências Exatas, São José do Rio Pretopt
unesp.departmentBiologia - IBILCEpt

Arquivos

Pacote original

Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
2-s2.0-85046367727.pdf
Tamanho:
2.38 MB
Formato:
Adobe Portable Document Format
Descrição: