Handling imbalanced datasets through Optimum-Path Forest
dc.contributor.author | Passos, Leandro Aparecido [UNESP] | |
dc.contributor.author | Jodas, Danilo S. [UNESP] | |
dc.contributor.author | Ribeiro, Luiz C.F. [UNESP] | |
dc.contributor.author | Akio, Marco [UNESP] | |
dc.contributor.author | de Souza, Andre Nunes [UNESP] | |
dc.contributor.author | Papa, João Paulo [UNESP] | |
dc.contributor.institution | Universidade Estadual Paulista (UNESP) | |
dc.date.accessioned | 2022-05-01T13:57:34Z | |
dc.date.available | 2022-05-01T13:57:34Z | |
dc.date.issued | 2022-04-22 | |
dc.description.abstract | In the last decade, machine learning-based approaches became capable of performing a wide range of complex tasks sometimes better than humans, demanding a fraction of the time. Such an advance is partially due to the exponential growth in the amount of data available, which makes it possible to extract trustworthy real-world information from them. However, such data is generally imbalanced since some phenomena are more likely than others. Such a behavior yields considerable influence on the machine learning model's performance since it becomes biased on the more frequent data it receives. Despite the considerable amount of machine learning methods, a graph-based approach has attracted considerable notoriety due to the outstanding performance over many applications, i.e., the Optimum-Path Forest (OPF). In this paper, we propose three OPF-based strategies to deal with the imbalance problem: the O2PF and the OPF-US, which are novel approaches for oversampling and undersampling, respectively, as well as a hybrid strategy combining both approaches. The paper also introduces a set of variants concerning the strategies mentioned above. Results compared against several state-of-the-art techniques over public and private datasets confirm the robustness of the proposed approaches. | en |
dc.description.affiliation | Department of Computing São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01 | |
dc.description.affiliation | Department of Electrical Engineering São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01 | |
dc.description.affiliationUnesp | Department of Computing São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01 | |
dc.description.affiliationUnesp | Department of Electrical Engineering São Paulo State University, Av. Eng. Luiz Edmundo Carrijo Coube, 14-01 | |
dc.description.sponsorship | Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) | |
dc.description.sponsorship | Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) | |
dc.description.sponsorshipId | FAPESP: #2013/07375-0 | |
dc.description.sponsorshipId | FAPESP: #2014/12236-1 | |
dc.description.sponsorshipId | FAPESP: #2017/02286-0 | |
dc.description.sponsorshipId | FAPESP: #2018/21934-5 | |
dc.description.sponsorshipId | FAPESP: #2019/07665-4 | |
dc.description.sponsorshipId | FAPESP: #2019/18287-0 | |
dc.description.sponsorshipId | FAPESP: #2020/12101-0 | |
dc.description.sponsorshipId | CNPq: #307066/2017-7 | |
dc.description.sponsorshipId | CNPq: #427968/2018-6 | |
dc.identifier | http://dx.doi.org/10.1016/j.knosys.2022.108445 | |
dc.identifier.citation | Knowledge-Based Systems, v. 242. | |
dc.identifier.doi | 10.1016/j.knosys.2022.108445 | |
dc.identifier.issn | 0950-7051 | |
dc.identifier.scopus | 2-s2.0-85125266467 | |
dc.identifier.uri | http://hdl.handle.net/11449/234201 | |
dc.language.iso | eng | |
dc.relation.ispartof | Knowledge-Based Systems | |
dc.source | Scopus | |
dc.subject | Imbalanced data | |
dc.subject | Optimum-Path Forest | |
dc.subject | Oversampling | |
dc.subject | Undersampling | |
dc.title | Handling imbalanced datasets through Optimum-Path Forest | en |
dc.type | Artigo | |
unesp.author.orcid | 0000-0003-3529-3109[1] | |
unesp.campus | Universidade Estadual Paulista (Unesp), Faculdade de Ciências, Bauru | pt |
unesp.department | Computação - FC | pt |