Handling imbalanced datasets through Optimum-Path Forest
Carregando...
Arquivos
Fontes externas
Fontes externas
Data
Orientador
Coorientador
Pós-graduação
Curso de graduação
Título da Revista
ISSN da Revista
Título de Volume
Editor
Tipo
Artigo
Direito de acesso
Arquivos
Fontes externas
Fontes externas
Resumo
In the last decade, machine learning-based approaches became capable of performing a wide range of complex tasks sometimes better than humans, demanding a fraction of the time. Such an advance is partially due to the exponential growth in the amount of data available, which makes it possible to extract trustworthy real-world information from them. However, such data is generally imbalanced since some phenomena are more likely than others. Such a behavior yields considerable influence on the machine learning model's performance since it becomes biased on the more frequent data it receives. Despite the considerable amount of machine learning methods, a graph-based approach has attracted considerable notoriety due to the outstanding performance over many applications, i.e., the Optimum-Path Forest (OPF). In this paper, we propose three OPF-based strategies to deal with the imbalance problem: the O2PF and the OPF-US, which are novel approaches for oversampling and undersampling, respectively, as well as a hybrid strategy combining both approaches. The paper also introduces a set of variants concerning the strategies mentioned above. Results compared against several state-of-the-art techniques over public and private datasets confirm the robustness of the proposed approaches.
Descrição
Palavras-chave
Imbalanced data, Optimum-Path Forest, Oversampling, Undersampling
Idioma
Inglês
Citação
Knowledge-Based Systems, v. 242.




