Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading

dc.contributor.authorDe Andrade, Tiago Luís [UNESP]
dc.contributor.authorDe Souza, Rogéria Cristiane Gratão [UNESP]
dc.contributor.authorBabini, Maurizio [UNESP]
dc.contributor.authorValêncio, Carlos Roberto [UNESP]
dc.contributor.institutionUniversidade Estadual Paulista (Unesp)
dc.date.accessioned2014-05-27T11:26:14Z
dc.date.available2014-05-27T11:26:14Z
dc.date.issued2011-12-01
dc.description.abstractAiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.en
dc.description.affiliationDepto. de Ciências de Computação e Estatística Universidade Estadual Paulista - Unesp, São José do Rio Preto
dc.description.affiliationDepartamento de Letras Modernas Universidade Estadual Paulista - Unesp, São José do Rio Preto
dc.description.affiliationUnespDepto. de Ciências de Computação e Estatística Universidade Estadual Paulista - Unesp, São José do Rio Preto
dc.description.affiliationUnespDepartamento de Letras Modernas Universidade Estadual Paulista - Unesp, São José do Rio Preto
dc.format.extent299-304
dc.identifierhttp://dx.doi.org/10.1109/PDCAT.2011.58
dc.identifier.citationParallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings, p. 299-304.
dc.identifier.doi10.1109/PDCAT.2011.58
dc.identifier.lattes4644812253875832
dc.identifier.lattes4035066471503413
dc.identifier.lattes5914651754517864
dc.identifier.orcid0000-0002-9325-3159
dc.identifier.orcid0000-0002-7449-9022
dc.identifier.scopus2-s2.0-84856660893
dc.identifier.urihttp://hdl.handle.net/11449/72860
dc.language.isoeng
dc.relation.ispartofParallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings
dc.rights.accessRightsAcesso aberto
dc.sourceScopus
dc.subjectAlgorithm
dc.subjectData cleansing
dc.subjectDuplicated tuples
dc.subjectData cleaning
dc.subjectKnowledge discovery in database
dc.subjectMissing values
dc.subjectMulti-threading
dc.subjectNull value
dc.subjectDatabase systems
dc.subjectLinguistics
dc.subjectOptimization
dc.subjectAlgorithms
dc.titleOptimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreadingen
dc.typeTrabalho apresentado em evento
dcterms.licensehttp://www.ieee.org/publications_standards/publications/rights/rights_policies.html
unesp.author.lattes4644812253875832[4]
unesp.author.lattes4035066471503413
unesp.author.lattes5914651754517864[2]
unesp.author.orcid0000-0002-9325-3159[4]
unesp.author.orcid0000-0002-7449-9022[2]
unesp.campusUniversidade Estadual Paulista (Unesp), Instituto de Biociências, Letras e Ciências Exatas, São José do Rio Pretopt
unesp.departmentCiências da Computação e Estatística - IBILCEpt
unesp.departmentLetras Modernas - IBILCEpt

Arquivos