A framework for speaker retrieval and identification through unsupervised learning

dc.contributor.authorCampos, Victor de Abreu [UNESP]
dc.contributor.authorPedronette, Daniel Carlos Guimarães [UNESP]
dc.contributor.institutionUniversidade Estadual Paulista (Unesp)
dc.date.accessioned2019-10-06T15:41:55Z
dc.date.available2019-10-06T15:41:55Z
dc.date.issued2019-11-01
dc.description.abstractSpeaker recognition is a task of remarkable relevance, with applications in diversified domains. Recently, mainly due to the facilities in audio-visual content acquisition, the capacity of analyzing growing datasets independent of labeled data has become a crucial advantage. This paper presents a speaker recognition approach based on recent unsupervised learning methods, which do not require any labeled data or user intervention. The approach is organized in terms of a framework which exploits a rank-based formulation. The similarity information defined by speaker modeling techniques is encoded in ranked lists, which are used as input by the unsupervised learning algorithms. Vector quantization, Gaussian mixture models and i-vectors are employed as modeling techniques, while the algorithms RL-Sim and ReckNN are used for unsupervised learning tasks. The framework was experimentally evaluated on query-by-example speaker retrieval and speaker identification tasks, both on clean and noisy speech recordings. An experimental evaluation was conducted on three public datasets, different languages, and recordings conditions. Effectiveness gains up to +56% on retrieval measures were obtained through the use of unsupervised learning algorithms over traditional speaker recognition techniques.en
dc.description.affiliationDepartment of Statistics Applied Mathematics and Computing State University of São Paulo (UNESP)
dc.description.affiliationUnespDepartment of Statistics Applied Mathematics and Computing State University of São Paulo (UNESP)
dc.description.sponsorshipFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
dc.description.sponsorshipConselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
dc.description.sponsorshipIdFAPESP: #2015/07934-4
dc.description.sponsorshipIdFAPESP: #2017/25908-6
dc.description.sponsorshipIdFAPESP: #2018/15597-6
dc.description.sponsorshipIdCNPq: #308194/2017-9
dc.format.extent153-174
dc.identifierhttp://dx.doi.org/10.1016/j.csl.2019.04.004
dc.identifier.citationComputer Speech and Language, v. 58, p. 153-174.
dc.identifier.doi10.1016/j.csl.2019.04.004
dc.identifier.issn1095-8363
dc.identifier.issn0885-2308
dc.identifier.scopus2-s2.0-85065105944
dc.identifier.urihttp://hdl.handle.net/11449/187617
dc.language.isoeng
dc.relation.ispartofComputer Speech and Language
dc.rights.accessRightsAcesso aberto
dc.sourceScopus
dc.subjectGaussian mixture model
dc.subjecti-vector
dc.subjectSpeaker recognition
dc.subjectSpeaker retrieval
dc.subjectUnsupervised learning
dc.subjectVector quantization
dc.titleA framework for speaker retrieval and identification through unsupervised learningen
dc.typeArtigo

Arquivos

Coleções