Latent association rule cluster based model to extract topics for classification and recommendation applications

Santos, Fabiano Fernandes dos; Domingues, Marcos Aurelio; Sundermann, Camila Vaccari; Carvalho, Veronica Oliveira de [UNESP]; Moura, Maria Fernanda; Rezende, Solange Oliveira

doi:10.1016/j.eswa.2018.06.021

Latent association rule cluster based model to extract topics for classification and recommendation applications

dc.contributor.author	Santos, Fabiano Fernandes dos
dc.contributor.author	Domingues, Marcos Aurelio
dc.contributor.author	Sundermann, Camila Vaccari
dc.contributor.author	Carvalho, Veronica Oliveira de [UNESP]
dc.contributor.author	Moura, Maria Fernanda
dc.contributor.author	Rezende, Solange Oliveira
dc.contributor.institution	Universidade de São Paulo (USP)
dc.contributor.institution	Universidade Estadual de Maringá (UEM)
dc.contributor.institution	Universidade Estadual Paulista (Unesp)
dc.contributor.institution	Empresa Brasileira de Pesquisa Agropecuária (EMBRAPA)
dc.date.accessioned	2019-10-04T13:28:17Z
dc.date.available	2019-10-04T13:28:17Z
dc.date.issued	2018-12-01
dc.description.abstract	The quality of any text mining technique is highly dependent on the features that are used to represent the document collection. A classical form of document representation is the vector space model (VSM), according to which the documents are represented as vectors of weights that correspond to the features of the documents. The bag-of-words model is the most popular VSM approach due to its simplicity and general applicability, but this model does not include term dependency and has a high dimensionality. In the literature, several models for document representation have been proposed in order to capture the dependency of terms. Among them, the topic model representation is one of the most interesting approaches - since it describes the collection of documents in a way that reveals their internal structure and the interrelationships therein, and also provides a dimensionality reduction. However, even for topic models, the efficient extraction of information concerning the relations among terms for document representation is still a major research challenge. In order to address this issue, we proposed thelatent association rule cluster based model (LARCM). The LARCM is a non-probabilistic topic model that makes use of association rule clustering to build a document representation with low dimensionality in such a way that each feature (i.e., topic) is comprised of information concerning relations among the terms. We evaluated the interpretability of the topics obtained by using our proposed model against the ones provided by the traditional latent dirichlet allocation (LDA) model and the LDA model using a document representation that includes correlated terms (i.e., bag-of-related-words). The experimental results indicated that the LARCM provides topics with better interpretability than the LDA models. Additionally, we used the topics obtained by the LARCM in two different applications: text classification and page recommendation. With respect to text classification, the topics were used to improve document collection representation. Concerning page recommendation, topics were used as contextual information in context aware recommender systems. Results have shown that the topics provided by the LARCM can be used to improve both applications. (C) 2018 Elsevier Ltd. All rights reserved.	en
dc.description.affiliation	Univ Sao Paulo, Inst Math & Comp Sci, Ave Trabalhador Sao Carlense 400, BR-13566590 Sao Carlos, SP, Brazil
dc.description.affiliation	Univ Estadual Maringa, Dept Informat, Ave Colombo, BR-87020900 Maringa, Parana, Brazil
dc.description.affiliation	State Univ Sao Paulo, Inst Geosci & Exact Sci, 24 A, BR-13506900 Rio Claro, SP, Brazil
dc.description.affiliation	Embrapa Agr Informat, Ave Dr Andre Tosello, BR-13083886 Campinas, SP, Brazil
dc.description.affiliationUnesp	State Univ Sao Paulo, Inst Geosci & Exact Sci, 24 A, BR-13506900 Rio Claro, SP, Brazil
dc.description.sponsorship	Araucaria Foundation (Parana/Brazil)
dc.description.sponsorship	Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
dc.description.sponsorship	Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
dc.format.extent	34-60
dc.identifier	http://dx.doi.org/10.1016/j.eswa.2018.06.021
dc.identifier.citation	Expert Systems With Applications. Oxford: Pergamon-elsevier Science Ltd, v. 112, p. 34-60, 2018.
dc.identifier.doi	10.1016/j.eswa.2018.06.021
dc.identifier.issn	0957-4174
dc.identifier.uri	http://hdl.handle.net/11449/186232
dc.identifier.wos	WOS:000442708600003
dc.language.iso	eng
dc.publisher	Elsevier B.V.
dc.relation.ispartof	Expert Systems With Applications
dc.rights.accessRights	Acesso aberto
dc.source	Web of Science
dc.subject	Document representation
dc.subject	Topic model
dc.subject	Association rules
dc.subject	Clustering
dc.subject	Text classification
dc.subject	Context-aware recommender systems
dc.title	Latent association rule cluster based model to extract topics for classification and recommendation applications	en
dc.type	Artigo
dcterms.license	http://www.elsevier.com/about/open-access/open-access-policies/article-posting-policy
dcterms.rightsHolder	Elsevier B.V.
dspace.entity.type	Publication
unesp.author.orcid	0000-0002-8552-6655[3]
unesp.campus	Universidade Estadual Paulista (UNESP), Instituto de Geociências e Ciências Exatas, Rio Claro	pt
unesp.department	Estatística, Matemática Aplicada e Computação - IGCE	pt

Coleções

Rio Claro - IGCE - Instituto de Geociências e Ciências Exatas

Latent association rule cluster based model to extract topics for classification and recommendation applications

Arquivos

Coleções