Analysis of document pre-processing effects in text and opinion mining

dc.contributor.authorEler, Danilo Medeiros [UNESP]
dc.contributor.authorGrosa, Denilson [UNESP]
dc.contributor.authorPola, Ives
dc.contributor.authorGarcia, Rogério [UNESP]
dc.contributor.authorCorreia, Ronaldo [UNESP]
dc.contributor.authorTeixeira, Jaqueline [UNESP]
dc.contributor.institutionUniversidade Estadual Paulista (Unesp)
dc.contributor.institutionUniversity of Technology-UTFPR
dc.date.accessioned2018-12-11T17:36:46Z
dc.date.available2018-12-11T17:36:46Z
dc.date.issued2018-04-20
dc.description.abstractTypically, textual information is available as unstructured data, which require processing so that data mining algorithms can handle such data; this processing is known as the pre-processing step in the overall text mining process. This paper aims at analyzing the strong impact that the pre-processing step has on most mining tasks. Therefore, we propose a methodology to vary distinct combinations of pre-processing steps and to analyze which pre-processing combination allows high precision. In order to show different combinations of pre-processing methods, experiments were performed by comparing some combinations such as stemming, term weighting, term elimination based on low frequency cut and stop words elimination. These combinations were applied in text and opinion mining tasks, from which correct classification rates were computed to highlight the strong impact of the pre-processing combinations. Additionally, we provide graphical representations from each pre-processing combination to show how visual approaches are useful to show the processing effects on document similarities and group formation (i.e., cohesion and separation).en
dc.description.affiliationDepartamento de Matematica e Computação Sao Paulo State University-UNESP
dc.description.affiliationDepartamento de Informática University of Technology-UTFPR
dc.description.affiliationUnespDepartamento de Matematica e Computação Sao Paulo State University-UNESP
dc.description.sponsorshipFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
dc.description.sponsorshipIdFAPESP: 2013/03452-0
dc.identifierhttp://dx.doi.org/10.3390/info9040100
dc.identifier.citationInformation (Switzerland), v. 9, n. 4, 2018.
dc.identifier.doi10.3390/info9040100
dc.identifier.file2-s2.0-85045734307.pdf
dc.identifier.issn2078-2489
dc.identifier.scopus2-s2.0-85045734307
dc.identifier.urihttp://hdl.handle.net/11449/179792
dc.language.isoeng
dc.relation.ispartofInformation (Switzerland)
dc.relation.ispartofsjr0,222
dc.rights.accessRightsAcesso aberto
dc.sourceScopus
dc.subjectDocument pre-processing
dc.subjectDocument similarity
dc.subjectMultidimensional projection
dc.subjectOpinion mining
dc.subjectSentiment analysis
dc.subjectTextmining
dc.subjectVisualization
dc.titleAnalysis of document pre-processing effects in text and opinion miningen
dc.typeArtigo

Arquivos

Pacote Original
Agora exibindo 1 - 1 de 1
Carregando...
Imagem de Miniatura
Nome:
2-s2.0-85045734307.pdf
Tamanho:
1.38 MB
Formato:
Adobe Portable Document Format
Descrição: