Logo do repositório

Hate Speech Detection in Portuguese Using BERTimbau

dc.contributor.authorFrediani, João Otávio Rodrigues Ferreira [UNESP]
dc.contributor.authorGarcia, Gabriel Lino [UNESP]
dc.contributor.authorPaiola, Pedro Henrique [UNESP]
dc.contributor.authorPassos, Leandro Aparecido [UNESP]
dc.contributor.authorPapa, João Paulo [UNESP]
dc.contributor.authorMarana, Aparecido Nilceu [UNESP]
dc.contributor.institutionUniversidade Estadual Paulista (UNESP)
dc.date.accessioned2025-04-29T18:40:48Z
dc.date.issued2025-01-01
dc.description.abstractHate speech refers to language expressions that attack individuals or groups based on specific characteristics associated with their identities, causing lasting damage. Social networks have become a pertinent environment for hate speech proliferation since they allow anonymity and maintain a safe distance from aggressors and assaulted victims. With the amount of data published every minute, automatic identification of hate speech using machine learning gathered much attention from academic and industrial researchers. However, as with many natural language processing tasks, the efforts mainly focused on English, and languages like Portuguese remain less explored. Therefore, this paper aims to experiment with different techniques to deal with the challenges associated with low-resource languages in automatic hate speech detection. It evaluates whether knowledge transferred from offensive speech detection as a source task can be effective for hate detection and if the unbalanced data poses an obstacle for a Portuguese pre-trained BERT model, BERTimbau. Experimental results show that transferring learning between tasks does not improve performance and that using balanced data leads to better F1 scores and Cohen’s Kappa.en
dc.description.affiliationSchool of Sciences São Paulo State University (UNESP)
dc.description.affiliationUnespSchool of Sciences São Paulo State University (UNESP)
dc.format.extent244-255
dc.identifierhttp://dx.doi.org/10.1007/978-3-031-76607-7_18
dc.identifier.citationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v. 15368 LNCS, p. 244-255.
dc.identifier.doi10.1007/978-3-031-76607-7_18
dc.identifier.issn1611-3349
dc.identifier.issn0302-9743
dc.identifier.scopus2-s2.0-85210225301
dc.identifier.urihttps://hdl.handle.net/11449/298909
dc.language.isoeng
dc.relation.ispartofLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
dc.sourceScopus
dc.subjectHate Speech
dc.subjectMachine Learning
dc.subjectNatural Language Processing
dc.subjectPortuguese Language
dc.subjectUndersampling
dc.titleHate Speech Detection in Portuguese Using BERTimbauen
dc.typeTrabalho apresentado em eventopt
dspace.entity.typePublication
relation.isOrgUnitOfPublicationaef1f5df-a00f-45f4-b366-6926b097829b
relation.isOrgUnitOfPublication.latestForDiscoveryaef1f5df-a00f-45f4-b366-6926b097829b
unesp.author.orcid0000-0002-6544-9066[1]
unesp.author.orcid0000-0003-1236-7929[2]
unesp.author.orcid0000-0001-9093-535X[3]
unesp.author.orcid0000-0003-3529-3109[4]
unesp.author.orcid0000-0002-6494-7514[5]
unesp.author.orcid0000-0003-4861-7061[6]
unesp.campusUniversidade Estadual Paulista (UNESP), Faculdade de Ciências, Baurupt

Arquivos