Visualizing the document pre-processing effects in text mining process

dc.contributor.authorEler, Danilo Medeiros [UNESP]
dc.contributor.authorPola, Ives Renê Venturini [UNESP]
dc.contributor.authorGarcia, Rogério Eduardo [UNESP]
dc.contributor.authorTeixeira, Jaqueline Batista Martins [UNESP]
dc.contributor.institutionUniversidade Estadual Paulista (Unesp)
dc.date.accessioned2018-12-11T17:19:35Z
dc.date.available2018-12-11T17:19:35Z
dc.date.issued2018-01-01
dc.description.abstractText mining is an important step to categorize textual data by using data mining techniques. As most obtained textual data is unstructured, it needs to be processed before applying mining algorithms – that process is known as pre-processing step in overall text mining process. Pre-processing step has important impact on mining. This paper aims at providing detailed analysis of the document pre-processing when employing multidimensional projection techniques to generate graphical representations of vector space models, which are computed from eight combinations of three steps: stemming, term weighting and term elimination based on low frequency cut. Experiments were made to show that the visual approach is useful to perceive the processing effects on document similarities and group formation (i.e., cohesion and separation). Additionally, quality measures were computed from graphical representations and compared with classification rates of a k-Nearest Neighbor and Naive Bayes classifiers, where the results highlights the importance of the pre-processing step in text mining.en
dc.description.affiliationFaculdade de Ciências e Tecnologia Departamento de Matemática e Computação UNESP – Universidade Estadual Paulista
dc.description.affiliationUnespFaculdade de Ciências e Tecnologia Departamento de Matemática e Computação UNESP – Universidade Estadual Paulista
dc.description.sponsorshipFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
dc.description.sponsorshipIdFAPESP: # 2013/03452-0
dc.format.extent485-491
dc.identifierhttp://dx.doi.org/10.1007/978-3-319-54978-1_62
dc.identifier.citationAdvances in Intelligent Systems and Computing, v. 558, p. 485-491.
dc.identifier.doi10.1007/978-3-319-54978-1_62
dc.identifier.issn2194-5357
dc.identifier.lattes8031012573259361
dc.identifier.orcid0000-0003-1248-528X
dc.identifier.scopus2-s2.0-85045731840
dc.identifier.urihttp://hdl.handle.net/11449/176206
dc.language.isoeng
dc.relation.ispartofAdvances in Intelligent Systems and Computing
dc.rights.accessRightsAcesso aberto
dc.sourceScopus
dc.subjectDocument pre-processing
dc.subjectDocument similarity
dc.subjectMultidimensional projection
dc.subjectText mining
dc.subjectVisualization
dc.titleVisualizing the document pre-processing effects in text mining processen
dc.typeTrabalho apresentado em evento
unesp.author.lattes8031012573259361[3]
unesp.author.orcid0000-0003-1248-528X[3]
unesp.departmentMatemática e Computação - FCTpt

Arquivos