Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading

De Andrade, Tiago Luís [UNESP]; De Souza, Rogéria Cristiane Gratão [UNESP]; Babini, Maurizio [UNESP]; Valêncio, Carlos Roberto [UNESP]

Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading

Data

2011-12-01

Autores

De Andrade, Tiago Luís

De Souza, Rogéria Cristiane Gratão

Babini, Maurizio

Valêncio, Carlos Roberto

Tipo

Trabalho apresentado em evento

Direito de acesso

Acesso aberto

Resumo

Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.

Palavras-chave

Algorithm, Data cleansing, Duplicated tuples, Data cleaning, Knowledge discovery in database, Missing values, Multi-threading, Null value, Database systems, Linguistics, Optimization, Algorithms

Idioma

Inglês

Como citar

Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings, p. 299-304.

URI

http://hdl.handle.net/11449/72860

Coleções

São José do Rio Preto - IBILCE - Instituto de Biociências, Letras e Ciências Exatas

Página do item completo

Optimization of algorithm to identification of duplicate tuples through similarity phonetic based on multithreading

Data

Autores

Orientador

Coorientador

Pós-graduação

Curso de graduação

Título da Revista

ISSN da Revista

Título de Volume

Editor

Tipo

Direito de acesso

Resumo

Descrição

Palavras-chave

Idioma

Como citar

URI

Itens relacionados

Financiadores

Coleções