Handling imbalanced datasets through Optimum-Path Forest

dc.contributor.authorPassos, Leandro Aparecido [UNESP]
dc.contributor.authorJodas, Danilo S. [UNESP]
dc.contributor.authorRibeiro, Luiz C.F. [UNESP]
dc.contributor.authorAkio, Marco [UNESP]
dc.contributor.authorde Souza, Andre Nunes [UNESP]
dc.contributor.authorPapa, João Paulo [UNESP]
dc.contributor.institutionUniversidade Estadual Paulista (UNESP)
dc.date.accessioned2022-05-01T13:57:34Z
dc.date.available2022-05-01T13:57:34Z
dc.date.issued2022-04-22
dc.description.abstractIn the last decade, machine learning-based approaches became capable of performing a wide range of complex tasks sometimes better than humans, demanding a fraction of the time. Such an advance is partially due to the exponential growth in the amount of data available, which makes it possible to extract trustworthy real-world information from them. However, such data is generally imbalanced since some phenomena are more likely than others. Such a behavior yields considerable influence on the machine learning model's performance since it becomes biased on the more frequent data it receives. Despite the considerable amount of machine learning methods, a graph-based approach has attracted considerable notoriety due to the outstanding performance over many applications, i.e., the Optimum-Path Forest (OPF). In this paper, we propose three OPF-based strategies to deal with the imbalance problem: the O2PF and the OPF-US, which are novel approaches for oversampling and undersampling, respectively, as well as a hybrid strategy combining both approaches. The paper also introduces a set of variants concerning the strategies mentioned above. Results compared against several state-of-the-art techniques over public and private datasets confirm the robustness of the proposed approaches.en
dc.description.affiliationDepartment of Computing São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01
dc.description.affiliationDepartment of Electrical Engineering São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01
dc.description.affiliationUnespDepartment of Computing São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01
dc.description.affiliationUnespDepartment of Electrical Engineering São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01
dc.description.sponsorshipFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
dc.description.sponsorshipConselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
dc.description.sponsorshipIdFAPESP: #2013/07375-0
dc.description.sponsorshipIdFAPESP: #2014/12236-1
dc.description.sponsorshipIdFAPESP: #2017/02286-0
dc.description.sponsorshipIdFAPESP: #2018/21934-5
dc.description.sponsorshipIdFAPESP: #2019/07665-4
dc.description.sponsorshipIdFAPESP: #2019/18287-0
dc.description.sponsorshipIdFAPESP: #2020/12101-0
dc.description.sponsorshipIdCNPq: #307066/2017-7
dc.description.sponsorshipIdCNPq: #427968/2018-6
dc.identifierhttp://dx.doi.org/10.1016/j.knosys.2022.108445
dc.identifier.citationKnowledge-Based Systems, v. 242.
dc.identifier.doi10.1016/j.knosys.2022.108445
dc.identifier.issn0950-7051
dc.identifier.scopus2-s2.0-85125266467
dc.identifier.urihttp://hdl.handle.net/11449/234201
dc.language.isoeng
dc.relation.ispartofKnowledge-Based Systems
dc.sourceScopus
dc.subjectImbalanced data
dc.subjectOptimum-Path Forest
dc.subjectOversampling
dc.subjectUndersampling
dc.titleHandling imbalanced datasets through Optimum-Path Foresten
dc.typeArtigo
unesp.author.orcid0000-0003-3529-3109[1]
unesp.campusUniversidade Estadual Paulista (Unesp), Faculdade de Ciências, Baurupt
unesp.departmentComputação - FCpt

Arquivos