Unsupervised Dual-Layer Aggregation for Feature Fusion on Image Retrieval Tasks
Loading...
Files
External sources
External sources
Date
Advisor
Coadvisor
Graduate program
Undergraduate course
Journal Title
Journal ISSN
Volume Title
Publisher
Type
Work presented at event
Access right
Files
External sources
External sources
Abstract
The revolutionary advances in image representation have led to impressive progress in many image understanding-related tasks, primarily supported by Convolutional Neural Networks (CNN) and, more recently, by Transformer models. Despite such advances, assessing the similarity among images for retrieval in unsupervised scenarios remains a challenging task, mostly grounded on traditional pairwise measures, such as the Euclidean distance. The scenario is even more challenging when different visual features are available, requiring the selection and fusion of features without any label information. In this paper, we propose an Unsupervised Dual-Layer Aggregation (UDLA) method, based on contextual similarity approaches for selecting and fusing CNN and Transformer-based visual features trained through transfer learning. In the first layer, the selected features are fused in pairs focused on precision. A sub-set of pairs is selected for a second layer aggregation focused on recall. An experimental evaluation conducted in different public datasets showed the effectiveness of the proposed approach, which achieved results significantly superior to the best-isolated feature and also superior to a recent fusion approach considered as baseline.
Description
Keywords
Language
English
Citation
Brazilian Symposium of Computer Graphic and Image Processing.





