Parallelization of the DIANA algorithm in openMP

Ribeiro, Hethini [UNESP]Spolon, Roberta [UNESP]Manacero, Aleardo [UNESP]Lobato, Renata S. [UNESP]Parallelization of the DIANA algorithm in openMPSpringer Nature2019DIANAMachine learningOpenMPParallelizationMy UniversityMy UniversityJong Hyuk Park, Hong Shen, Yunsick Sung, Hui Tian2019-10-062019-10-062019-01-01engTrabalho apresentado em eventoCommunications in Computer and Information Science, v. 931, p. 171-176.978-981-13-5906-4978-981-13-5907-11865-09291865-0937http://hdl.handle.net/11449/19015310.1007/978-981-13-5907-1_182-s2.0-8506229454655686813740948600000-0001-8248-08260000-0003-3164-26580000-0002-4581-7482pub.1111982283171-176Acesso abertoGlobal data production has been increasing by approximately 40% per year since the beginning of the last decade. These large datasets, also called Big Data, are posing great challenges in many areas and in particular in the Machine Learning (ML) field. Although ML algorithms are able to extract useful information from these large data repositories, they are computationally expensive such as AGNES and DIANA, which have O(n) and O(2 n ) complexity, respectively. Therefore, the big challenge is to process large amounts of data in a realistic time frame. In this context, this paper proposes the parallelization of the DIANA OpenMP algorithm. Initial tests with a database with 5000 elements presented a speed up of 5,2521. It is believed that, according to Gustafson’s law, for a larger database the results will also be larger.