Corpus-based Methodology for an Online Multilingual Collocations Dictionary: First Steps

dc.contributor.authorOrenha-Ottaiano, Adriane [UNESP]
dc.contributor.authorGarcia, Marcos
dc.contributor.authorde Oliveira Silva, Maria Eugênia Olímpio
dc.contributor.authorL'Homme, Marie-Claude
dc.contributor.authorRamos, Margarita Alonso
dc.contributor.authorValêncio, Carlos Roberto [UNESP]
dc.contributor.authorTenório, William [UNESP]
dc.contributor.institutionUniversidade Estadual Paulista (UNESP)
dc.contributor.institutionUniversidade de Santiago de Compostela
dc.contributor.institutionUniversity of Alcalá
dc.contributor.institutionUniversité de Montréal
dc.contributor.institutionUniversidade da Coruña
dc.date.accessioned2023-03-02T12:09:18Z
dc.date.available2023-03-02T12:09:18Z
dc.date.issued2021-01-01
dc.description.abstractThis paper describes the first steps of a corpus-based methodology for the development of an online Platform for Multilingual Collocations Dictionaries (PLATCOL). The platform is aimed to be customized for different target audiences according to their needs. It covers various syntactic structures of collocations that fit into the following taxonomy: verbal, adjectival, nominal, and adverbial. Part of its design, layout and methodological procedures are based on the Bilingual Online Collocations Dictionary Platform (Orenha-Ottaiano, 2017). The methodology also relies on the combination of automatic methods to extract candidate collocations (Garcia et al., 2019a) with careful post-editing performed by lexicographers. The automatic approaches take advantage of NLP tools to annotate large corpora with lemmas, PoS-tags and dependency relations in five languages (English, French, Portuguese, Spanish and Chinese). Using these data, we apply statistical measures (Evert et al., 2017; Garcia et al., 2019b) and distributional semantics strategies to select the candidates (Garcia et al., 2019c) and retrieve corpus-based examples (Kilgarriff et al., 2008). We also rely on automatic definition extraction (Bond & Foster, 2013) so that collocations can be more effectively organized according to their specific senses.en
dc.description.affiliationSão Paulo State University (UNESP)
dc.description.affiliationUniversidade de Santiago de Compostela
dc.description.affiliationUniversity of Alcalá
dc.description.affiliationOLST Université de Montréal
dc.description.affiliationUniversidade da Coruña
dc.description.affiliationUnespSão Paulo State University (UNESP)
dc.format.extent1-28
dc.identifier.citationProceedings of Electronic Lexicography in the 21st Century Conference, v. 2021-July, p. 1-28.
dc.identifier.issn2533-5626
dc.identifier.scopus2-s2.0-85137087660
dc.identifier.urihttp://hdl.handle.net/11449/242228
dc.language.isoeng
dc.relation.ispartofProceedings of Electronic Lexicography in the 21st Century Conference
dc.sourceScopus
dc.subjectautomatic extraction
dc.subjectcollocations
dc.subjectcollocations dictionary
dc.subjectlexicography
dc.subjectonline platform
dc.titleCorpus-based Methodology for an Online Multilingual Collocations Dictionary: First Stepsen
dc.typeTrabalho apresentado em evento

Arquivos