Hate Speech Detection in Portuguese Using BERTimbau

Frediani, João Otávio Rodrigues Ferreira [UNESP]; Garcia, Gabriel Lino [UNESP]; Paiola, Pedro Henrique [UNESP]; Passos, Leandro Aparecido [UNESP]; Papa, João Paulo [UNESP]; Marana, Aparecido Nilceu [UNESP]

doi:10.1007/978-3-031-76607-7_18

Hate Speech Detection in Portuguese Using BERTimbau

dc.contributor.author	Frediani, João Otávio Rodrigues Ferreira [UNESP]
dc.contributor.author	Garcia, Gabriel Lino [UNESP]
dc.contributor.author	Paiola, Pedro Henrique [UNESP]
dc.contributor.author	Passos, Leandro Aparecido [UNESP]
dc.contributor.author	Papa, João Paulo [UNESP]
dc.contributor.author	Marana, Aparecido Nilceu [UNESP]
dc.contributor.institution	Universidade Estadual Paulista (UNESP)
dc.date.accessioned	2025-04-29T18:40:48Z
dc.date.issued	2025-01-01
dc.description.abstract	Hate speech refers to language expressions that attack individuals or groups based on specific characteristics associated with their identities, causing lasting damage. Social networks have become a pertinent environment for hate speech proliferation since they allow anonymity and maintain a safe distance from aggressors and assaulted victims. With the amount of data published every minute, automatic identification of hate speech using machine learning gathered much attention from academic and industrial researchers. However, as with many natural language processing tasks, the efforts mainly focused on English, and languages like Portuguese remain less explored. Therefore, this paper aims to experiment with different techniques to deal with the challenges associated with low-resource languages in automatic hate speech detection. It evaluates whether knowledge transferred from offensive speech detection as a source task can be effective for hate detection and if the unbalanced data poses an obstacle for a Portuguese pre-trained BERT model, BERTimbau. Experimental results show that transferring learning between tasks does not improve performance and that using balanced data leads to better F1 scores and Cohen’s Kappa.	en
dc.description.affiliation	School of Sciences São Paulo State University (UNESP)
dc.description.affiliationUnesp	School of Sciences São Paulo State University (UNESP)
dc.format.extent	244-255
dc.identifier	http://dx.doi.org/10.1007/978-3-031-76607-7_18
dc.identifier.citation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v. 15368 LNCS, p. 244-255.
dc.identifier.doi	10.1007/978-3-031-76607-7_18
dc.identifier.issn	1611-3349
dc.identifier.issn	0302-9743
dc.identifier.scopus	2-s2.0-85210225301
dc.identifier.uri	https://hdl.handle.net/11449/298909
dc.language.iso	eng
dc.relation.ispartof	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
dc.source	Scopus
dc.subject	Hate Speech
dc.subject	Machine Learning
dc.subject	Natural Language Processing
dc.subject	Portuguese Language
dc.subject	Undersampling
dc.title	Hate Speech Detection in Portuguese Using BERTimbau	en
dc.type	Trabalho apresentado em evento	pt
dspace.entity.type	Publication
relation.isOrgUnitOfPublication	aef1f5df-a00f-45f4-b366-6926b097829b
relation.isOrgUnitOfPublication.latestForDiscovery	aef1f5df-a00f-45f4-b366-6926b097829b
unesp.author.orcid	0000-0002-6544-9066[1]
unesp.author.orcid	0000-0003-1236-7929[2]
unesp.author.orcid	0000-0001-9093-535X[3]
unesp.author.orcid	0000-0003-3529-3109[4]
unesp.author.orcid	0000-0002-6494-7514[5]
unesp.author.orcid	0000-0003-4861-7061[6]
unesp.campus	Universidade Estadual Paulista (UNESP), Faculdade de Ciências, Bauru	pt

Coleções

Bauru - FC - Faculdade de Ciências

Hate Speech Detection in Portuguese Using BERTimbau

Arquivos

Coleções