Analysis of document pre-processing effects in text and opinion mining
MetadataShow full item record
Typically, textual information is available as unstructured data, which require processing so that data mining algorithms can handle such data; this processing is known as the pre-processing step in the overall text mining process. This paper aims at analyzing the strong impact that the pre-processing step has on most mining tasks. Therefore, we propose a methodology to vary distinct combinations of pre-processing steps and to analyze which pre-processing combination allows high precision. In order to show different combinations of pre-processing methods, experiments were performed by comparing some combinations such as stemming, term weighting, term elimination based on low frequency cut and stop words elimination. These combinations were applied in text and opinion mining tasks, from which correct classification rates were computed to highlight the strong impact of the pre-processing combinations. Additionally, we provide graphical representations from each pre-processing combination to show how visual approaches are useful to show the processing effects on document similarities and group formation (i.e., cohesion and separation).
How to cite this document
Showing items related by title, author, creator and subject.
Pinho, Sheila Zambello de ; Oliveira, José Brás Barreto de ; Gazola, Rodrigo José Cristiano ; Mazotti, Adriano César ; Molero, Camila Schimite ; Mendes, Carolina Borghi ; Mello, Denise Fernandes de ; Marques, Emilia de Mendonça Rosa ; Talamoni, Jandira Liria Biscalquini ; Silva, José Humberto Dias da et al. (Coleção PROGRAD (UNESP), 2011) [Livro]
Pinho, Sheila Zambello de ; Oliveira, José Brás Barreto de ; Pontes, Sueli Rodrigues ; Almeida, Djanira Soares de Oliveira e ; Godoy, Kathya Maria Ayres de ; Rosa, Claudia de Souza ; Nunes, Julianus Araújo ; Salvador, Sérgio Azevedo ; David, Célia Maria ; Vilche Peña, Angel Fidel et al. (Coleção PROGRAD (UNESP), 2011) [Livro]
Pinho, Sheila Zambello de ; Spazziani, Maria de Lourdes ; Mendonça, Sueli Guadelupe de Lima ; Rubo, Elisabete Aparecida Andrello ; Villarreal, Dalva Maria de Oliveira ; Duarte, Camila ; Okamoto, Mary Yoko ; Souza, Thais R. ; Garms, Gilza Maria Zauhy ; Marin, Fátima Aparecida Dias Gomes et al. (Coleção PROGRAD (UNESP), 2012) [Livro]