Publicação:
Natural language processing and machine learning in the categorization of scientific papers: a study around ?cultural heritage?

dc.contributor.authorJesus, Ananda Fernanda de [UNESP]
dc.contributor.authorTriques, Maria Ligia
dc.contributor.authorSegundo, Jose Eduardo Santarem [UNESP]
dc.contributor.authorAlbuquerque, Ana Cristina de
dc.contributor.institutionUniversidade Estadual Paulista (UNESP)
dc.contributor.institutionUniversidade Estadual de Londrina (UEL)
dc.date.accessioned2023-07-29T12:00:51Z
dc.date.available2023-07-29T12:00:51Z
dc.date.issued2023-01-01
dc.description.abstractAims to verify the potential of applying Natural Language Processing (NLP) and Machine Learning (ML) techniques in the thematic categorization of scientific articles on the theme cultural heritage from two situations in which categories are established a priori and later. Applied research is developed, with quantitative and qualitative results, where the first corpus consisting of scientific articles in Portuguese, on a thematic basis of Information Science, manually selected and categorized; and the second corpus, composed of scientific articles in English retrieved from the Web of Science, automatically categorized by search strategies and application of Booleans. Both were submitted to two categorization test procedures (supervised and unsupervised algorithm). The results show that in both, the participation of the researcher is essential in defining the representativeness of the chosen sample, and this has an impact on the precision and accuracy of the applied algorithms. The importance of detailing and rigor in the pre-processing of data and sample size is highlighted, however, it is emphasized that, in the case of this study, only a larger volume of data did not guarantee that the results were representative from the point of view of the domain studied, which warns that there are always multidisciplinary discussions and analyzes that allow verifying and readjusting the sample parameters.en
dc.description.affiliationUniv Estadual Paulista, Programa Posgrad Ciencia Informacao, Marilia, SP, Brazil
dc.description.affiliationUniv Estadual Londrina, Programa Posgrad Ciencia Informacao, Londrina, PR, Brazil
dc.description.affiliationUnespUniv Estadual Paulista, Programa Posgrad Ciencia Informacao, Marilia, SP, Brazil
dc.format.extent167-184
dc.identifierhttp://dx.doi.org/10.26512/rici.v16.n1.2023.47537
dc.identifier.citationRevista Ibero-americana de Ciencia da Informacao. Brasilia: Univ Brasilia, Dept Ciencia Informacao, v. 16, n. 1, p. 167-184, 2023.
dc.identifier.doi10.26512/rici.v16.n1.2023.47537
dc.identifier.issn1983-5213
dc.identifier.urihttp://hdl.handle.net/11449/245641
dc.identifier.wosWOS:000992663100009
dc.language.isoeng
dc.publisherUniv Brasilia, Dept Ciencia Informacao
dc.relation.ispartofRevista Ibero-americana De Ciencia Da Informacao
dc.sourceWeb of Science
dc.subjectMachine learning
dc.subjectNatural language processing
dc.subjectNeural network algorithm
dc.subjectCultural heritage
dc.subjectHierarchical clustering algorithm
dc.titleNatural language processing and machine learning in the categorization of scientific papers: a study around ?cultural heritage?en
dc.typeArtigo
dcterms.rightsHolderUniv Brasilia, Dept Ciencia Informacao
dspace.entity.typePublication
unesp.campusUniversidade Estadual Paulista (Unesp), Faculdade de Filosofia e Ciências, Maríliapt
unesp.departmentCiência da Informação - FFCpt

Arquivos