Atenção!


O atendimento às questões referentes ao Repositório Institucional será interrompido entre os dias 20 de dezembro de 2025 a 4 de janeiro de 2026.

Pedimos a sua compreensão e aproveitamos para desejar boas festas!

Logo do repositório

SMS Spam Detection Through Skip-gram Embeddings and Shallow Networks

dc.contributor.authorSousa, Gustavo Jose de [UNESP]
dc.contributor.authorGuimaraes Pedronette, Daniel Carlos [UNESP]
dc.contributor.authorPapa, Joao Paulo [UNESP]
dc.contributor.authorGuilherme, Ivan Rizzo [UNESP]
dc.contributor.authorXia, F.
dc.contributor.authorZong, C.
dc.contributor.authorLi, W.
dc.contributor.authorNavigli, R.
dc.contributor.institutionUniversidade Estadual Paulista (UNESP)
dc.date.accessioned2025-04-29T20:10:00Z
dc.date.issued2021-01-01
dc.description.abstractThe drastic decrease in mobile SMS costs turned phone users more prone to spam messages, usually with unwanted marketing or questionable content. As such, researchers have proposed different methods for detecting SMS spam messages. This paper presents a technique for embedding SMS messages into vector spaces that is suitable for spam detection. The proposed approach relies on mining patterns that are relevant for distinguishing spam from legitimate messages. A subset of those patterns is used to construct a function that maps text messages into a multidimensional vector space. The extracted patterns are represented as skip-grams of token attributes, where a skip-gram can be seen as a generalization of the n-gram model that allows a distance greater than one between matched tokens in the text. We evaluate the proposed approach using the generated vectors for spam classification on the UCI Spam Collection dataset. The experiments showed that our method combined with shallow networks reached accuracy that is competitive with state-of-the-art approaches.en
dc.description.affiliationSao Paulo State Univ UNESP, Sao Paulo, Brazil
dc.description.affiliationUnespSao Paulo State Univ UNESP, Sao Paulo, Brazil
dc.description.sponsorshipFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
dc.description.sponsorshipConselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
dc.description.sponsorshipPetrobras
dc.description.sponsorshipIdFAPESP: 2014/122361
dc.description.sponsorshipIdFAPESP: 2018/15597-6
dc.description.sponsorshipIdFAPESP: 2019/07665-4
dc.description.sponsorshipIdCNPq: 307066/20177
dc.description.sponsorshipIdCNPq: 309439/2020-5
dc.description.sponsorshipIdPetrobras: 2019/00697-8
dc.format.extent4193-4201
dc.identifier.citationFindings Of The Association For Computational Linguistics, Acl-ijcnlp 2021. Stroudsburg: Assoc Computational Linguistics-acl, p. 4193-4201, 2021.
dc.identifier.urihttps://hdl.handle.net/11449/307636
dc.identifier.wosWOS:001181734703038
dc.language.isoeng
dc.publisherAssoc Computational Linguistics-acl
dc.relation.ispartofFindings Of The Association For Computational Linguistics, Acl-ijcnlp 2021
dc.sourceWeb of Science
dc.titleSMS Spam Detection Through Skip-gram Embeddings and Shallow Networksen
dc.typeTrabalho apresentado em eventopt
dcterms.rightsHolderAssoc Computational Linguistics-acl
dspace.entity.typePublication

Arquivos

Coleções