Visualizing the document pre-processing effects in text mining process

Eler, Danilo Medeiros [UNESP]; Pola, Ives Renê Venturini [UNESP]; Garcia, Rogério Eduardo [UNESP]; Teixeira, Jaqueline Batista Martins [UNESP]

Visualizing the document pre-processing effects in text mining process

Data

2018-01-01

Autores

Eler, Danilo Medeiros

Pola, Ives Renê Venturini

Garcia, Rogério Eduardo

Teixeira, Jaqueline Batista Martins

Tipo

Trabalho apresentado em evento

Direito de acesso

Acesso aberto

Resumo

Text mining is an important step to categorize textual data by using data mining techniques. As most obtained textual data is unstructured, it needs to be processed before applying mining algorithms – that process is known as pre-processing step in overall text mining process. Pre-processing step has important impact on mining. This paper aims at providing detailed analysis of the document pre-processing when employing multidimensional projection techniques to generate graphical representations of vector space models, which are computed from eight combinations of three steps: stemming, term weighting and term elimination based on low frequency cut. Experiments were made to show that the visual approach is useful to perceive the processing effects on document similarities and group formation (i.e., cohesion and separation). Additionally, quality measures were computed from graphical representations and compared with classification rates of a k-Nearest Neighbor and Naive Bayes classifiers, where the results highlights the importance of the pre-processing step in text mining.

Palavras-chave

Document pre-processing, Document similarity, Multidimensional projection, Text mining, Visualization

Idioma

Inglês

Como citar

Advances in Intelligent Systems and Computing, v. 558, p. 485-491.

URI

http://hdl.handle.net/11449/176206

Financiadores

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Coleções

Presidente Prudente - FCT - Faculdade de Ciências e Tecnologia

Página do item completo

Atenção!

Visualizing the document pre-processing effects in text mining process

Data

Autores

Orientador

Coorientador

Pós-graduação

Curso de graduação

Título da Revista

ISSN da Revista

Título de Volume

Editor

Tipo

Direito de acesso

Resumo

Descrição

Palavras-chave

Idioma

Como citar

URI

Itens relacionados

Financiadores

Coleções