SMS Spam Detection Through Skip-gram Embeddings and Shallow Networks
dc.contributor.author | de Sousa, Gustavo José [UNESP] | |
dc.contributor.author | Pedronette, Daniel Carlos Guimarães [UNESP] | |
dc.contributor.author | Papa, João Paulo [UNESP] | |
dc.contributor.author | Guilherme, Ivan Rizzo [UNESP] | |
dc.contributor.institution | Universidade Estadual Paulista (UNESP) | |
dc.date.accessioned | 2022-05-01T13:41:25Z | |
dc.date.available | 2022-05-01T13:41:25Z | |
dc.date.issued | 2021-01-01 | |
dc.description.abstract | The drastic decrease in mobile SMS costs turned phone users more prone to spam messages, usually with unwanted marketing or questionable content. As such, researchers have proposed different methods for detecting SMS spam messages. This paper presents a technique for embedding SMS messages into vector spaces that is suitable for spam detection. The proposed approach relies on mining patterns that are relevant for distinguishing spam from legitimate messages. A subset of those patterns is used to construct a function that maps text messages into a multidimensional vector space. The extracted patterns are represented as skip-grams of token attributes, where a skip-gram can be seen as a generalization of the n-gram model that allows a distance greater than one between matched tokens in the text. We evaluate the proposed approach using the generated vectors for spam classification on the UCI Spam Collection dataset. The experiments showed that our method combined with shallow networks reached accuracy that is competitive with state-of-the-art approaches. | en |
dc.description.affiliation | São Paulo State University - UNESP | |
dc.description.affiliationUnesp | São Paulo State University - UNESP | |
dc.description.sponsorship | Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) | |
dc.description.sponsorship | Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) | |
dc.description.sponsorshipId | FAPESP: 2014/12236-1 | |
dc.description.sponsorshipId | FAPESP: 2018/15597-6 | |
dc.description.sponsorshipId | FAPESP: 2019/07665-4 | |
dc.description.sponsorshipId | CNPq: 307066/2017-7 | |
dc.description.sponsorshipId | CNPq: 309439/2020-5 | |
dc.format.extent | 4193-4201 | |
dc.identifier.citation | Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, p. 4193-4201. | |
dc.identifier.scopus | 2-s2.0-85123915540 | |
dc.identifier.uri | http://hdl.handle.net/11449/234090 | |
dc.language.iso | eng | |
dc.relation.ispartof | Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 | |
dc.source | Scopus | |
dc.title | SMS Spam Detection Through Skip-gram Embeddings and Shallow Networks | en |
dc.type | Trabalho apresentado em evento | pt |
dspace.entity.type | Publication | |
relation.isOrgUnitOfPublication | aef1f5df-a00f-45f4-b366-6926b097829b | |
relation.isOrgUnitOfPublication.latestForDiscovery | aef1f5df-a00f-45f4-b366-6926b097829b | |
unesp.campus | Universidade Estadual Paulista (UNESP), Faculdade de Ciências, Bauru | pt |
unesp.campus | Universidade Estadual Paulista (UNESP), Instituto de Geociências e Ciências Exatas, Rio Claro | pt |
unesp.department | Computação - FCEstatística, Matemática Aplicada e Computação - IGCE | pt |