Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition

Neto, João Baptista Cardia; Ferrari, Claudio; Marana, Aparecido Nilceu [UNESP]; Berretti, Stefano; Del Bimbo, Alberto

Publicação:
Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition

dc.contributor.author	Neto, João Baptista Cardia
dc.contributor.author	Ferrari, Claudio
dc.contributor.author	Marana, Aparecido Nilceu [UNESP]
dc.contributor.author	Berretti, Stefano
dc.contributor.author	Del Bimbo, Alberto
dc.contributor.institution	São Paulo State Technological College (FATEC)
dc.contributor.institution	University of Parma
dc.contributor.institution	Universidade Estadual Paulista (UNESP)
dc.contributor.institution	University of Florence
dc.date.accessioned	2023-07-29T16:05:41Z
dc.date.available	2023-07-29T16:05:41Z
dc.date.issued	2023-01-23
dc.description.abstract	In this article, we propose a hybrid framework for cross-resolution 3D face recognition which utilizes a Streamed Attention Network (SAN) that combines handcrafted features with Convolutional Neural Networks (CNNs). It consists of two main stages: first, we process the depth images to extract low-level surface descriptors and derive the corresponding Descriptor Images (DIs), represented as four-channel images. To build the DIs, we propose a variation of the 3D Local Binary Pattern (3DLBP) operator that encodes depth differences using a sigmoid function. Then, we design a CNN that learns from these DIs. The peculiarity of our solution consists in processing each channel of the input image separately, and fusing the contribution of each channel by means of both self- and cross-attention mechanisms. This strategy showed two main advantages over the direct application of Deep-CNN to depth images of the face; on the one hand, the DIs can reduce the diversity between high- and low-resolution data by encoding surface properties that are robust to resolution differences. On the other, it allows a better exploitation of the richer information provided by low-level features, resulting in improved recognition. We evaluated the proposed architecture in a challenging cross-dataset, cross-resolution scenario. To this aim, we first train the network on scanner-resolution 3D data. Next, we utilize the pre-trained network as feature extractor on low-resolution data, where the output of the last fully connected layer is used as face descriptor. Other than standard benchmarks, we also perform experiments on a newly collected dataset of paired high- and low-resolution 3D faces. We use the high-resolution data as gallery, while low-resolution faces are used as probe, allowing us to assess the real gap existing between these two types of data. Extensive experiments on low-resolution 3D face benchmarks show promising results with respect to state-of-the-art methods.	en
dc.description.affiliation	São Paulo State Technological College (FATEC), Rua Maranhão, 898, Catanduva
dc.description.affiliation	Department of Architecture and Engineering University of Parma, Parco Area delle Scienze, 181/A
dc.description.affiliation	Recogna Laboratory São Paulo State University (UNESP), Av. Eng. Luís Edmundo Carrijo Coube
dc.description.affiliation	Micc University of Florence, Viale Giovanni Battista Morgagni, 65
dc.description.affiliationUnesp	Recogna Laboratory São Paulo State University (UNESP), Av. Eng. Luís Edmundo Carrijo Coube
dc.identifier	http://dx.doi.org/10.1145/3527158
dc.identifier.citation	ACM Transactions on Multimedia Computing, Communications and Applications, v. 19, n. 1, 2023.
dc.identifier.doi	10.1145/3527158
dc.identifier.issn	1551-6865
dc.identifier.issn	1551-6857
dc.identifier.scopus	2-s2.0-85148038627
dc.identifier.uri	http://hdl.handle.net/11449/249658
dc.language.iso	eng
dc.relation.ispartof	ACM Transactions on Multimedia Computing, Communications and Applications
dc.source	Scopus
dc.subject	3D face recognition
dc.subject	convolutional neural networks
dc.subject	feature descriptors
dc.subject	self- and cross-attention
dc.title	Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition	en
dc.type	Artigo
dspace.entity.type	Publication
unesp.campus	Universidade Estadual Paulista (UNESP), Faculdade de Ciências, Bauru	pt
unesp.department	Computação - FC	pt

Coleções

Bauru - FC - Faculdade de Ciências

Publicação: Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition

Arquivos

Coleções

Publicação:
Learning Streamed Attention Network from Descriptor Images for Cross-Resolution 3D Face Recognition