RESSALVA Atendendo solicitação do autor, o texto completo desta Tese será disponibilizado somente a partir de 28/06/2026. UNIVERSIDADE ESTADUAL PAULISTA “JÚLIO DE MESQUITA FILHO” Câmpus de Rio Claro Programa de Pós-Graduação em Ciência da Computação Lucas Pascotti Valem Contextual Similarity Learning for Image Retrieval and Classification: Applications in Person Re-Identification Rio Claro - SP 2024 UNIVERSIDADE ESTADUAL PAULISTA “Júlio de Mesquita Filho” Instituto de Geociências e Ciências Exatas Câmpus de Rio Claro Lucas Pascotti Valem Contextual Similarity Learning for Image Retrieval and Classification: Applications in Person Re-Identification Tese de Doutorado apresentada ao Instituto de Geociências e Ciências Exatas do Câmpus de Rio Claro, da Universidade Estadual Paulista “Júlio de Mesquita Filho”, como parte dos requisitos para obtenção do título de Doutor em Ciência da Computação. Orientador: Prof. Dr. Daniel Carlos Guimarães Pedronette Rio Claro - SP 2024 V151c Valem, Lucas Pascotti Contextual similarity learning for image retrieval and classification: applications in person re-identification / Lucas Pascotti Valem. -- Rio Claro, 2024 261 f. : il., tabs. Tese (doutorado) - Universidade Estadual Paulista (UNESP), Instituto de Geociências e Ciências Exatas, Rio Claro Orientador: Daniel Carlos Guimarães Pedronette 1. Ciência da Computação. 2. Inteligência Artificial. 3. Recuperação de Imagens por Conteúdo. 4. Classificação de Imagens. 5. Re-Identificação de Pessoas. I. Título. Sistema de geração automática de fichas catalográficas da Unesp. Biblioteca da Universidade Estadual Paulista (UNESP), Instituto de Geociências e Ciências Exatas, Rio Claro. Dados fornecidos pelo autor(a). Essa ficha não pode ser modificada. UNIVERSIDADE ESTADUAL PAULISTA “Júlio de Mesquita Filho” Instituto de Geociências e Ciências Exatas Câmpus de Rio Claro Lucas Pascotti Valem Contextual Similarity Learning for Image Retrieval and Classification: Applications in Person Re-Identification Tese de Doutorado apresentada ao Instituto de Geociências e Ciências Exatas do Câmpus de Rio Claro, da Universidade Estadual Paulista “Júlio de Mesquita Filho”, como parte dos requisitos para obtenção do título de Doutor em Ciência da Computação. Comissão Examinadora • Prof. Dr. Daniel Carlos Guimarães Pedronette (Orientador) Instituto de Geociências e Ciências Exatas (IGCE) Universidade Estadual Paulista - UNESP • Profa. Dra. Agma Juci Machado Traina Instituto de Ciências Matemáticas e de Computação (ICMC) Universidade de São Paulo - USP • Prof. Dr. Hélio Pedrini Instituto de Computação (IC) Universidade Estadual de Campinas - UNICAMP • Prof. Dr. João Paulo Papa Faculdade de Ciências (FC) Universidade Estadual Paulista - UNESP • Prof. Dr. Wallace Correa de Oliveira Casaca Instituto de Biociências, Letras e Ciências Exatas (IBILCE) Universidade Estadual Paulista - UNESP Conceito: Aprovado. Rio Claro (SP), 28 de junho de 2024. Acknowledgements First and foremost, I thank God for the gift of life and health. My parents and family, for their love and unconditional support. My advisor, for the guidance, support, and trust. All the professors, both national and international, who contributed to this research. The university and all the faculty members of the graduate program. Friends and colleagues, for the encouragement and support. The São Paulo Research Foundation (FAPESP), Fulbright, and Petrobras for the financial support. Resumo O crescimento exponencial das coleções de imagens produziu um aumento significativo nas aplicações de aprendizado de máquina e recuperação de imagens em diversos cenários. Apesar dos avanços recentes, muitos métodos ainda dependem fortemente de grandes volumes de dados rotulados para treinamento, o que representa um obstáculo importante, uma vez que produzir dados rotulados é geralmente custoso. Para enfrentar esse desafio, várias técnicas foram desenvolvidas. Um aspecto crítico de tais abordagens é definir a similaridade entre imagens de maneira eficaz, o que continua sendo um desafio central em aplicações de recuperação e aprendizado de máquina, tais como classificação. A questão central está intrinsecamente relacionada à forma como a informação é representada e aos métodos usados para comparar essas representações. Uma grande limitação é que a maioria ainda depende de medidas par-a-par e ignoram outras informações significativas presentes na vizinhança que podem ser usadas para melhorar os resultados. Este trabalho foca em melhorar a eficácia da recuperação de imagens por conteúdo visual e tarefas de classificação usando similaridade contextual, indo além das métricas tradicionais par-a-par para explorar as relações entre os elementos. O aprendizado de similaridade contextual é empregado para explorar relações de vizinhança entre os elementos, usando técnicas tais como informações baseadas em ranqueamento, medidas contextuais, grafos e hipergrafos para modelar a informação contextual de forma eficaz. Esta tese propõe sete métodos novos aplicados a cenários de propósito geral e re-identificação de pessoas (Re-ID) abordando diferentes contribuições. Três tarefas principais foram consideradas: estimativa de eficácia de consultas, recuperação e classificação de imagens. Foi realizada uma ampla avaliação experimental, totalizando 17 coleções de imagens e mais de 50 descritores visuais. Os métodos propostos, quando comparados com o estado-da-arte, demonstram resultados que são comparáveis ou superiores aos das abordagens existentes na maioria dos casos. Palavras-chave: Similaridade Contextual; Recuperação de Imagens; Classificação de Imagens; Estimativas de Eficácia; Re-identificação de Pessoas; Aprendizado de Representações. Abstract The exponential growth of image collections has demanded a significant increase in the use of machine learning and image retrieval applications across various scenarios. Despite the relevant advances, many methods still rely heavily on large volumes of labeled data for training, which establishes an important obstacle, once producing labeled data is generally expensive and time-consuming. To address this challenge, numerous techniques have been developed recently. A critical aspect of these approaches is effectively defining image similarity, which remains a central challenge in retrieval and machine learning applications, such as classification. The core of this issue is intrinsically linked to how information is represented and the methods used to compare these representations. A major limitation is that most of them still rely on pairwise measures, ignoring other meaningful information present in the neighborhood that can be used to further increase the results. This work focuses on improving the effectiveness of image retrieval by visual content and classification tasks using contextual similarity, moving beyond traditional pairwise measures to exploit relationships among elements. Contextual similarity learning is employed to capture underlying relationships among elements, using techniques such as rank-based models, contextual measures, graphs, and hypergraphs to model contextual information effectively. This dissertation proposes seven novel methods applied across general-purpose and person re-identification (Re-ID) scenarios addressing different contributions. Three main tasks were considered: query performance prediction, image retrieval, and image classification. A wide experimental evaluation was conducted, totaling 17 datasets and more than 50 visual image descriptors. The proposed methods, when compared with state-of-the-art and recent baselines, demonstrate results that are comparable to or surpass those of existing approaches in most cases. Keywords: Contextual Similarity Information; Image Retrieval; Image Classification; Query Performance Prediction; Person Re-ID; Representation Learning. List of Figures Figure 1.1 – Overview of goals and contributions and how contextual similarity is exploited. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Figure 1.2 – Dissertation structure: organization, main concepts, proposed approaches, and publications. . . . . . . . . . . . . . . . . . . . . . . . 37 Figure 2.1 – Typical architecture of a CBIR system. Figure adapted from [316]. . . . 40 Figure 2.2 – Overview of unsupervised similarity learning workflow for image retrieval. 44 Figure 2.3 – Overview of unsupervised similarity learning applied for rank-aggregation in image retrieval. . . . . . . . . . . . . . . . . . . . . 50 Figure 2.4 – General diagram of a Re-ID system. Figure adapted from [25]. . . . . . 52 Figure 2.5 – Categorization of graph learning approaches. Figure adapted from [369]. 57 Figure 2.6 – Hypergraph illustration. . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Figure 2.7 – Incidence matrix H example. . . . . . . . . . . . . . . . . . . . . . . . 60 Figure 4.1 – Example of precision × recall curve. . . . . . . . . . . . . . . . . . . . 80 Figure 5.1 – Illustration of a confusion matrix of probabilities between classes. . . . 90 Figure 5.2 – Diagram illustrating the main stages of the DRNE. . . . . . . . . . . . 96 Figure 5.3 – Illustration that exemplifies the calculation of a contextual image constructed from a hypothetical ranked list. . . . . . . . . . . . . . . . 97 Figure 5.4 – Examples of images generated for synthetic ranked lists with different degrees of effectiveness. . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Figure 5.5 – Proposed CNN model for effectiveness prediction. . . . . . . . . . . . . 99 Figure 5.6 – Diagram of the proposed approach (RQPPF) for self-supervised query performance prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Figure 5.7 – Losses along training epochs for train and validation sets. . . . . . . . . 105 Figure 5.8 – Correlation of MAP and effectiveness estimation measures on DukeMTMC.108 Figure 5.9 – Two examples of ranked lists (good and bad queries) for Duke dataset and OSNET-AIN descriptor. . . . . . . . . . . . . . . . . . . . . . . . . 109 Figure 5.10–Impact of parameters on Pearson correlation between MAP and our approach on MPEG-7 dataset. . . . . . . . . . . . . . . . . . . . . . . . 109 Figure 5.11–Proposed approach against MAP on MPEG-7 dataset (all descriptors). Pearson Correlation = 0.8977. . . . . . . . . . . . . . . . . . . . . . . . 112 Figure 5.12–Two examples of RQPPF results on ranked lists of Market dataset (CNN-HACNN descriptor). . . . . . . . . . . . . . . . . . . . . . . . . . 112 Figure 6.1 – Illustrative example of original Jaccard index limitation. . . . . . . . . 117 Figure 6.2 – Query on Holidays with results for RBO and JacMax. . . . . . . . . . . 121 Figure 6.3 – Visual example of fusion result on Market dataset. . . . . . . . . . . . . 121 Figure 7.1 – Overview of the HRSF proposed approach. . . . . . . . . . . . . . . . . 125 Figure 7.2 – Evaluation of the impact of parameter k on MAP and R1 for Market1501 dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Figure 7.3 – Evaluation of the HQPP measure compared to the MAP on DukeMTMC dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Figure 7.4 – Average MAP of top pairs on CUHK03 dataset. . . . . . . . . . . . . . 133 Figure 7.5 – Average MAP of top pairs on the Market dataset. . . . . . . . . . . . . 133 Figure 7.6 – Average MAP of top pairs on Duke dataset. . . . . . . . . . . . . . . . 133 Figure 7.7 – Selected Combination (among top-5) on Market considering MAP. . . . 135 Figure 7.8 – Selected Combination (among top-5) on Duke considering MAP. . . . . 136 Figure 7.9 – Distance distribution for two query images on DukeMTMC dataset. . . 139 Figure 7.10–Examples to illustrate the impact of HRSF selection and fusion on the CUHK03 dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Figure 7.11–Examples to illustrate the impact of HRSF selection and fusion on the DukeMTMC dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Figure 8.1 – Overall organization of Rank Flow Embedding (RFE). . . . . . . . . . 144 Figure 8.2 – Impact of parameter α in function σ (Equation 8.4) as the rank position varies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Figure 8.3 – Impact of parameters α and T (number of iterations) on MAP for two datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Figure 8.4 – Ablation study for RFE on 6 datasets considering two descriptors each. 158 Figure 8.5 – Feature space illustrations for RFE embeddings computed by t-SNE on Flowers dataset with CNN-ResNet descriptor. . . . . . . . . . . . . . . 169 Figure 8.6 – Examples of ranked lists before and after RFE was applied on 3 different datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Figure 9.1 – Workflow of our proposed Manifold-GCN framework for image classification. The steps of the approach are numbered. . . . . . . . . . 173 Figure 9.2 – Impact of manifold learning approaches on F-measure results considering GCN-SGC on different datasets and features. . . . . . . . . . . . . . . 183 Figure 9.3 – t-SNE visualizations showing improved feature space using manifold learning and reciprocal graph on the Flowers dataset. . . . . . . . . . . 186 Figure 10.1–MiniImageNet images used as references for the bidimensional space plots.196 Figure 10.2–Bidimensional space for similar and dissimilar images on the MiniImageNet dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Figure 10.3–Workflow of the steps of the proposed approach. . . . . . . . . . . . . . 197 Figure 10.4–Accuracy (%) on the test set for different batch sizes. . . . . . . . . . . 200 Figure 10.5–Accuracy (%) on the test set across epochs comparing SupCon to CCL. 201 Figure 10.6–t-SNE visualization for 9 classes comparing the features of the original method to CCL on the Food101 dataset with 20% of training data. . . 202 Figure 11.1–RFE and JaccardMax relative gains (%) over MAP of descriptors. . . . 208 Figure 11.2–Relative gains (%) obtained by CCL in comparison to SupCon for different train/test splits. . . . . . . . . . . . . . . . . . . . . . . . . . . 210 Figure 11.3–Published and submitted collaborations and their connection to terms and concepts related to this dissertation. . . . . . . . . . . . . . . . . . 218 List of Tables Table 2.1 – Examples of traditional distance measures. . . . . . . . . . . . . . . . . 43 Table 2.2 – Summary of concepts and terminologies discussed for Re-ID. . . . . . . 55 Table 3.1 – State-of-the-art methods in Re-ID with results of MAP (%) and R1 (%). 72 Table 4.1 – General-purpose datasets used in the experimental evaluation. . . . . . 84 Table 4.2 – Descriptors used for general-purpose datasets. . . . . . . . . . . . . . . . 85 Table 4.3 – Re-ID datasets used the experimental evaluation. . . . . . . . . . . . . . 86 Table 4.4 – Values of MAP and R1 for each Re-ID descriptor on each dataset. . . . 87 Table 4.5 – Datasets used to evaluate each of the proposed methods, categorized by task and type of supervision. . . . . . . . . . . . . . . . . . . . . . . . . 88 Table 5.1 – Pearson correlation between MAP and effectiveness estimation measures on Flowers dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Table 5.2 – Pearson correlation between estimation measures for all descriptors of MPEG-7 dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Table 5.3 – Pearson correlation between MAP and effectiveness estimation measures on datasets considering train with k = 20. . . . . . . . . . . . . . . . . . 107 Table 5.4 – Pearson correlation between our proposed RQPPF and MAP considering different regression models and measures. . . . . . . . . . . . . . . . . . 110 Table 5.5 – Relative gains obtained by RQPPF using the Authority estimation measure for modeling the features. . . . . . . . . . . . . . . . . . . . . . 111 Table 5.6 – Relative gains obtained by RQPPF using the Reciprocal estimation measure for modeling the features. . . . . . . . . . . . . . . . . . . . . . 111 Table 5.7 – Comparing RQPPF and DRNE to baselines. Pearson correlation between MAP and effectiveness estimations is reported. . . . . . . . . . . . . . . 113 Table 5.8 – Pearson correlation between MAP and combinations of methods on Re-ID datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Table 6.1 – Re-ranking results considering MAP (%). . . . . . . . . . . . . . . . . . 119 Table 6.2 – Rank-aggregation results for different measures. . . . . . . . . . . . . . . 119 Table 6.3 – State-of-the-art on Holidays dataset (MAP). . . . . . . . . . . . . . . . 120 Table 6.4 – State-of-the-art on UKBench dataset (N-S Score). . . . . . . . . . . . . 120 Table 6.5 – Comparison with person Re-ID baselines. . . . . . . . . . . . . . . . . . 120 Table 7.1 – Table of symbols used in the definition of HSRF [331]. . . . . . . . . . . 126 Table 7.2 – The best selected combination of each size (among the top-5) is reported on each dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Table 7.3 – Proposed approach compared to early and late fusion baselines. . . . . . 135 Table 7.4 – State-of-the-art comparison considering MAP (%) and R-01 (%). . . . . 138 Table 7.5 – State-of-the-art methods ranked by their results. . . . . . . . . . . . . . 138 Table 8.1 – Retrieval results of RFE on general-purpose image datasets (Flowers, Corel5k, and ALOI) considering MAP (%). . . . . . . . . . . . . . . . . 159 Table 8.2 – Retrieval results of RFE on the Holidays dataset considering MAP (%). 160 Table 8.3 – Retrieval results of RFE on the UKBench dataset for both NS-Score and MAP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Table 8.4 – Retrieval results of RFE on 3 Re-ID datasets (CUHK03, Market, and Duke) considering both R1 and MAP. . . . . . . . . . . . . . . . . . . . 161 Table 8.5 – Semi-supervised classification (accuracy) on Flowers dataset using RFE embeddings for different input features. . . . . . . . . . . . . . . . . . . 162 Table 8.6 – Semi-supervised classification (accuracy) on Corel5k dataset using RFE embeddings for different input features. . . . . . . . . . . . . . . . . . . 163 Table 8.7 – Evaluation of RFE on unseen queries considering MAP (%). . . . . . . . 163 Table 8.8 – State-of-the-art (SOTA) comparison with other variants of diffusion processes on the ORL (R@15) and the MPEG-7 (R@40) datasets. . . . 164 Table 8.9 – State-of-the-art comparison on Flowers, Corel5k, and ALOI datasets (MAP %). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Table 8.10–State-of-the-art comparison on Holidays dataset (MAP). . . . . . . . . . 165 Table 8.11–State-of-the-art comparison on UKBench dataset (NS-Score). . . . . . . 165 Table 8.12–State-of-the-art (SOTA) comparison on person Re-ID datasets considering MAP (%) and R-01 (%). . . . . . . . . . . . . . . . . . . . . 166 Table 8.13–Accuracy comparison (%) for baselines on Flowers and Corel5k datasets. The RFE is compared with semi-supervised classification baselines. . . . 167 Table 9.1 – Impact of manifold learning approaches and Reciprocal Graph on the classification accuracy of 5 different GCN models on Flowers dataset. . . 180 Table 9.2 – Impact of manifold learning approaches and Reciprocal Graph on the classification accuracy of 5 different GCN models on Corel5k dataset. . . 181 Table 9.3 – Impact of manifold learning approaches and Reciprocal Graph on the classification accuracy of 5 different GCN models on CUB200 dataset. . 182 Table 9.4 – Results (%) for GCN-SGC on CUHK03 dataset. . . . . . . . . . . . . . 184 Table 9.5 – Results (%) on Market1501 dataset. . . . . . . . . . . . . . . . . . . . . 184 Table 9.6 – Results (%) for GCN-SGC on DukeMTMC dataset. . . . . . . . . . . . 185 Table 9.7 – Manifold-GCN compared to baseline approaches on Flowers, Corel5k, and CUB200 datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Table 9.8 – Execution time (in seconds) for manifold learning methods and GCN approaches for both training and testing. . . . . . . . . . . . . . . . . . 190 Table 10.1–Neural network architecture and default hyperparameters utilized in the evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Table 10.2–Impact of batch size on accuracy (%) on Food101 dataset, considering a split of 20% for training. . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Table 10.3–Impact of parameter k (neighborhood size) on accuracy (%). Results highlighted in gray deviate less than 0.20 from the best value in bold. . 200 Table 10.4–Accuracies (%) achieved for 100 epochs of training, comparing the proposed CCL with other contrastive losses on three datasets. . . . . . . 201 Table 10.5–Accuracies (%) achieved on the Food101 dataset when comparing the proposed CCL against SupCon [143], for different training epochs. . . . 201 Table 11.1–Relative gains of DRNE and RQPPF when compared to Authority and Reciprocal Density. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Table 11.2–Comparison between the proposed approaches on person Re-ID considering MAP (%) and R-01 (%). . . . . . . . . . . . . . . . . . . . . 205 Table 11.3–Proposed approaches on person Re-ID ranked according to their effectiveness (R1 and MAP). . . . . . . . . . . . . . . . . . . . . . . . . 205 Table 11.4–Proposed approaches compared to state-of-the-art on person Re-ID considering MAP (%) and R-01 (%). . . . . . . . . . . . . . . . . . . . . 206 Table 11.5–State-of-the-art (SOTA) comparison on Holidays dataset (MAP). . . . . 207 Table 11.6–State-of-the-art (SOTA) comparison on UKBench dataset (NS-Score). . 208 Table 11.7–Accuracy comparison (%) for baselines on Flowers and Corel5k datasets. The RFE and Manifold-GCN are compared with classification baselines. 209 Table 11.8–Research questions addressed by each of the proposed approaches. . . . 213 Table 11.9–Future work related to each of the proposed approaches. . . . . . . . . . 221 List of Abbreviations and Acronyms ACC Color Autocorrelogram Descriptor ACF Aggregated Channel Features Detector AF Attention Features AIN Adaptive Instance Normalization AIR Articulation-Invariant Representation Descriptor ALOI Amsterdam Library of Object Images ANML Adaptive Neighborhood Metric Learning AP Average Precision APPNP Approximate Personalized Propagation of Neural Predictions ARMA Auto-Regressive Moving Average Filter Convolution ARN Adaptation and Re-Identification Network ASC Aspect Shape Context Descriptor ATNET Adaptive Transfer Network BAS Beam Angle Statistics Descriptor BFS Breadth-First Search BFSTREE Breadth-First Search Tree BIC Border/Interior Pixel Classification Descriptor BOVW Bag of Visual Words BOW Bag of Words CAMEL Cross-view Asymmetric Metric LEarning CAP Camera-aware proxies CBIR Content-Based Image Retrieval CC Connected Components CCL Contextual Contrastive Loss CCOM Color Co-Occurrence Matrix Descriptor CEDD Color and Edge Directivity Descriptor CFD Contour Features Descriptor CG Correlation Graph CIFAR Canadian Institute For Advanced Research CLD Color Layout Descriptor CMC Cumulative Matching Characteristics CNN Convolutional Neural Networks COMO Compact Composite Moment-Based Descriptor CPRR Cartesian Product of Ranking References CPU Central Processing Unit CSGLP Camera Style Generation and Label Propagation CSRT Discriminative Correlation Filter with Channel and Spatial Reliability CUB200 Caltech-UCSD Birds Dataset CUHK Dataset from the Chinese University of Hong Kong DAAM Domain Adaptive Attention Model DCNN Deep Convolutional Neural Networks DIDAL Discriminative Identity-Feature Exploring and Differential Aware Learning DPM Deformable Parts Model DPNET Dual Path Network DRNE Deep Rank Noise Estimator DUKEMTMC Duke Multi-Tracking Multi-Camera Dataset EANET Enhancing Alignment Network ECN Exemplar Memory Convolutional Network EHD Edge Histogram Descriptor ELF Ensemble of Localized Features EMTL Enhanced Multi-Dataset Transfer Learning FBRESNET Facebook Residual Neural Network FCTH Fuzzy Color and Texture Histogram FN False Negatives FOH Fuzzy Opponent Histogram FP False Positives GAN Generative Adversarial Network GAT Graph Attention Networks GB Gigabytes GBICOV Covariance descriptor based on Bio-inspired Features GCN Graph Convolutional Network GCN-APPNP Approximate Personalized Propagation of Neural Predictions GCN GCN-ARMA Auto-Regressive Moving Average Filter Convolution GCN GCN-GAT Graph Attention Networks GCN GCN-SGC Simple Graph Convolution GCN GDP Graph Diffusion Process GIST Global Image Descriptor for low-dimensional features GNN Graph Neural Network GNN-KNN-LDS KNN variation of GNN-LDS GNN-LDS Learning Discrete Structures for Graph Neural Networks GOG Gaussian Of Gaussians Descriptor GPU Graphics Processing Unit GRAD-NET Graph Diffusion Network GRID UnderGround Re-IDentification (GRID) Dataset GS Graph Sampling GSP Graph Signal Processing GSSL Graph-based Semi-Supervised Learning HACNN Harmonious Attention Network HCT Hierarchical Clustering with Hard-batch Triplet Loss HHL Hetero and Homogeneously Learning HLBP Histogram of Local Binary Patterns HQPP Hypergraph Query Performance Prediction HRSF Hypergraph Rank Selection and Fusion HSV Color Space: Hue, Saturation, Value IBN Instance-Batch Normalization ICE Inter-Instance Contrastive Encoding ICS Intra-Camera Supervise ID Identifier IDSC Inner Distance Shape Context IICS Intra-inter Camera Similarity IR Information Retrieval ISSDA Iterative Self-Supervised Domain Adaptation JCD Joint Composite Descriptor JVCT Joint Generative and Contrastive Learning KCF Kernelized Correlation Filters KISSME Keep-it-simple-and-straightforward distance learning KNN K Nearest Neighbors LAS Local Activity Spectrum LBP Local Binary Patterns LCDP Locally constrained diffusion process LDA Linear Discriminant Analysis LDFV Local Descriptors encoded by Fisher Vector LDS-GNN Learning Discrete Structures for Graph Neural Networks LGBM Light Gradient Boosting Machine LHRR Log-based Hypergraph of Ranking Reference LMNN Large margin nearest neighbor learning LOMO Local Maximal Occurrence Descriptor LS Label Spreading LSTM Long short-term memory MAM Memory Access Method MAP Mean Average Precision MAR MultilAbel Reference Learning MATE Multi-Task Multi-Label MCFS Muti-cluster Feature Selection MCRN Multi-Centroid Representation Network MGCE-HCL Multi-Granularity Clustering Ensemble-based Hybrid Contrastive Learning MGH Metadata Guided Hypergraph ML Machine Learning MLFN Multi-Level Factorisation Network MMCL Memory-based Multi-label Classification Loss MOSSE Minimum Output Sum of Squared Error MPEG Moving Picture Experts Group MR Manifold Ranking MSE Mean Squared Error MSMT Multi-Scene Multi-Time Re-ID Dataset NASNET Neural Architecture Search Network NDFS Non-negative Discriminative Feature Selection NET Network NMF Non-negative Matrix Factorization NNCLR Nearest-Neighbor Contrastive Learning of Visual Representations NP-HARD Nondeterministic Polynomial-time Hard NS Abbreviation of NS-Score NS-SCORE Score for UKBench Dataset O2CAP Offline-Online Associated Camera-Aware Proxies OLDFP Object Level Deep Feature Pooling OOD Out-of-Domain OPF Optimum-Path Forest ORL Our Database of Faces OSNET Omni-Scale Feature Learning Neural Network OSNET-AIN OSNET with Adaptive Instance Normalization OSNET-IBN OSNET with Instance-Batch Normalization PAF Part Association Field PAUL Patch-Based Unsupervised Learning Framework PCA Principal Component Analysis PHOG Pyramidal Histogram of oriented gradients PIF Part Intensity Field PK-SAMPLER Random Sampling Method in Re-ID QPP Query Performance Prediction RAM Random Access Memory RBF Radial Basis Function Kernel RBO Rank-Biased Overlap RDNN Residual Dense Neural Network RDP Regularized Diffusion Process RDPAC Rank Diffusion Process with Assured Convergence RE-ID Person Re-Identification RESNET Residual neural network RFE Rank Flow Embedding RGB Red, Green, Blue RL-SIM Ranked Lists Similarity Approach RLCC Refining Pseudo Labels with Clustering Consensus RQPPF Regression for Query Performance Prediction Framework SCC Strongly Connected Components SCD Scalable Color Descriptor SCH Simple Color Histogram SD Self-diffusion for Image Segmentation and Clustering SDC Scale-invariant Feature Transform Dense Color SENET Squeeze-and-Excitation Network SGC Simple Graph Convolution SGD Stochastic Gradient Descent SIFT Scale-invariant Feature Transform SORT Simple Online and Realtime Tracking SOTA State-of-the-art SP Spatial Pyramid SPACC Spatial Pyramid Color Autocorrelogram Descriptor SPCEDD Spatial Pyramid Color and Edge Directivity Descriptor SPEC Spectral Regression SPFCTH Spatial Pyramid Fuzzy Color and Texture Histogram SPGAN Similarity preserving generative adversarial network SPJCD Spatial Pyramid Joint Composite Descriptor SPLBP Spatial Pyramid Local Binary Patterns SS Segment Saliences SSL Softened Similarity Learning Approach STF Swin-Transformer SURF Speeded-Up Robust Features SVD Singular Value Decomposition SVM Support Vector Machines SVR Support Vector Regression SWIN-TF Swin-Transformers T-SNE t-Distributed Stochastic Neighbor Embedding TAUDL Tracklet Association Unsupervised Deep Learning TN True Negatives TP True Positives TPG Tensor Product Graph UDA Unsupervised Data Augmentation UDLF Unsupervised Distance Learning Framework UGAF-RSF Unsupervised Genetic Algorithm Framework for Rank Selection and Fusion UKBENCH University of Kentucky Dataset USRF Unsupervised Selective Rank Fusion Method UTAL Unsupervised Tracklet Association Learning VAL-PAT Framework for Transferable Representations of Pedestrians VGGNET Visual Geometry Group Network VIT Vision Transformer VOC Vocabulary Tree VRAM Video Random Access Memory WHOS Weighted Histograms of Overlapping Stripes WSEF Weakly Supervised Experiments Framework YOLO You Only Look Once object detection network List of Symbols A(i) Set of all elements in a batch, except the image of index i. C Number of virtual classes in the synthetic scenario. Cn Set of combinations where each combination is of size n. E Set of edges of a graph. Eh Set of hyperedges of a hypergraph. G A graph. H A hypergraph model or the number of feature maps (or hidden units) in the hidden layer of a GCN. I The set of indices for all augmented samples in a batch. L Size of ranked lists. M Confusion matrix of probabilities between classes. Mc Confusion matrix of probabilities between elements of the same class. Mf Sparse matrix used by RFE to accumulate normalized scores from different rankers. N The size of the collection C, i.e., dataset size. NNk(i) The set of k nearest neighbors of image i. NNY k (i) A subset of NNk(i) containing only images from the same class of image i. Nb Number of image pairs in a training batch. P (i) The set of indices of all positive samples in the batch distinct from image i. Ri Ranker of index i. S Selection set of all possible combinations of rankers. Sp Selection set of pairs of rankers. T , t Number of iterations. V Set of graph vertices. VL Set of labeled nodes in the graph. VU Unlabeled subset of the node set. α Constant for the normalization equation of RFE. β Weight or relevance of correlation in the selection measure. zi The embedding of the data sample i generated by the metric learning model. ◦ Hadamard (element-wise) product. δ A function that computes the distance between two feature vectors. ϵ A function that extracts a feature vector from an image. ηf Fused affinity measure used for rank aggregation. ηr(i, x) Function that assigns a weight to image x according to its position in τi. γ An effectiveness estimation measure. γA Authority effectiveness estimation measure. γR Reciprocal Density effectiveness estimation measure. Â Normalized adjacency matrix of a graph. λ A correlation measure (e.g., RBO). R The set of real numbers. A Affinity matrix (RFE) or adjacency matrix (GCNs). C Similarity measure matrix based on Cartesian product. D Distance matrix. HG HRSF Hypergraph model. H Incidence matrix for HRSF or matrix encoding the similarity information of h-embeddings for RFE. I Identity matrix. S Similarity matrix. W Affinity matrix (HRSF) or weight matrix in the definition of GCNs. X Feature vectors provided as input to the GCN. Z Matrix of embeddings learned by the GCN model. b Reciprocal neighborhood binary vector used in the computation of RQPPF meta-features. bi Reciprocal neighborhood binary vector for image i. ci Connected component of index i. cq CC-embedding of a connected component q computed by RFE. ei Representation vector (embedding) of the element of index i from the dataset. f Contextual Rank-based Feature (Meta-Feature) vector. fs Set of synthetic features used for training the regression model. ft Set of test features used for testing the regression model. hi Row i of matrix H, named h-embedding. p Reciprocal rank position vector used in the computation of RQPPF meta-features. pi Reciprocal rank position vector for image i. q Effectiveness estimation vector used in the computation of RQPPF meta-features. qi Effectiveness estimation vector for image i. xi Feature vector representing the image oi, a row of matrix X. zi Row i of matrix Z, embedding representation for the node vi. C Image dataset. CL Set containing the L most similar images to image oq in the collection C. Ec Set of candidate edges defined by RFE. Lccl Proposed contextual contrastive loss. Lsup Supervised contrastive loss. N Neighborhood set. N (oq, k) Neighborhood set containing the k most similar elements to oq. Nr(oq, k) Reciprocal neighborhood set for image oq. Nr Reciprocal neighborhood set. R Set of rankers. S Set of connected components. T Set of ranked lists for all the images in the dataset. Ti Set of ranked lists produced by ranker Ri. Tj The set of ranked lists produced by the ranker Rj. X A subset of C. Y A set of labels (classes). R Set of rankers provided as input to the method. X∗ Selected combination among all sizes. X∗ n Selected combination composed by n rankers. Xn Candidate combination composed by n rankers. µ Constant used in RBO correlation measure. ϕ Regression model for query performance prediction. ψ The Contrastive loss temperature parameter. ρ A similarity measure. σ Normalization function used in RFE. τR n Ordered list of combinations of size n. Also referred to as the selection list. τR n (Xi n) Position of the combination Xi n in the selection list τR n . τq Ranked list of image q. τq(i) The position of image oi in the ranked list τq. τi,q Ranked list of image of index q calculated by ranker i. τi Ranked list of image of index i. τq,f (i) Position of image oi in the ranked list of oq according to feature f . τq(i) Position of the image i in the ranked list of query image q. xℓ The ℓ-th image in the batch. x̃ℓ The ℓ-th augmented image in the batch. yℓ The label corresponding to the ℓ-th image. ỹℓ The label corresponding to the ℓ-th augmented image. Ã Adjusted adjacency matrix, A + I. D̃ Degree matrix of Ã. × Multiplication operator. ξ The current epoch number. ξtotal The total number of epochs. aij Entry in the adjacency matrix indicating the presence (1) or absence (0) of an edge between vertices oi and oj. c Number of classes (or categories). c(i, j) Element of matrix C. cp Pairwise similarity relationship based on Cartesian product. d Number of vector dimensions. de The dimensionality of the RFE embedding space in which each object is represented. ei A hyperedge of index i. fg Function that, given a hypergraph and an incidence matrix, calculates a graph (RFE). fh Function that, given ranked lists, calculates a hypergraph and an incidence matrix by re-ranking through hypergraph embeddings (RFE). fm Manifold learning function that processes a set of ranked lists T . fp(oq, i) Function returning the i-th neighbor of image q. fr Function representing unsupervised similarity learning. fs Function for ranker selection. fgcn Function representing the graph convolutional network model. h(ei, vj) Reliance of vertex vj to belong to a hyperedge ei. hp(eq) Weight of hyperedge eq. hij An element of H representing the similarity of object oj in the context of the hyperedge ei. k Size of the neighborhood set. kd A variable representing a specific depth for computing a correlation measure. kstart The initial value of k for the first epoch. kv Size of virtual classes for synthetic data. m Number of features, i.e., size of the set R. n Size of a combination. nk Number of candidate edges for RFE graph. oi Indicates any object (element) belonging to the dataset, whose index is i. obji Object of index i, often abbreviated as oi. p Pairwise relationship function defined by RFE. px Pixel of position x in a grayscale image. sc RFE Similarity measure attributed to pairs based on the similarity between h-embeddings and confidence of the hyperedge. tc Threshold for edge computation in the connected components stage of the RFE. thend Final threshold of the Correlation Graph. thinc Correlation Graph threshold increment. thstart Initial threshold of the Correlation Graph. vi A node in the node set V representing an image oi. vl A labeled node. w Selection measure for combinations of rankers. w(ei) A positive weight assigned to a hyperedge ei. wp(i, x) A weight function that assigns relevance to a vertex ox based on its position in a ranked list. wp Selection measure for pairs of rankers proposed by HRSF. Th (T ) Set of ranked lists after T iterations of RFE. yi Label (class) of object oi, i.e., i-th row of Y . Contents 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.2 Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.3 Dissertation Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 32 1.4 Goals and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.1 Machine Learning and Categories of Supervision . . . . . . . . . . . 38 2.2 Content-Based Image Retrieval (CBIR) . . . . . . . . . . . . . . . . . 39 2.2.1 Feature Extraction and Ranking . . . . . . . . . . . . . . . . . . . . . . . 41 2.2.2 Unsupervised Similarity Learning for Re-Ranking . . . . . . . . . . . . . . . 43 2.2.3 Formal Definitions and Notations . . . . . . . . . . . . . . . . . . . . . . 45 2.3 Feature Selection and Fusion . . . . . . . . . . . . . . . . . . . . . . . 46 2.3.1 Query Performance Prediction . . . . . . . . . . . . . . . . . . . . . . . . 47 2.3.2 Rank Correlation Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.3.3 Rank-Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.4 Person Re-Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.4.1 Concepts and Terminologies . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.5 Graph-Based Semi-Supervised Classification . . . . . . . . . . . . . . 55 2.5.1 Graph Convolutional Networks (GCNs) . . . . . . . . . . . . . . . . . . . . 57 2.5.2 Formal Definitions and Notations . . . . . . . . . . . . . . . . . . . . . . 59 2.6 Hypergraph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.6.1 Formal Definitions and Notations . . . . . . . . . . . . . . . . . . . . . . 61 3 RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.1 Similarity Learning in Image Retrieval . . . . . . . . . . . . . . . . . . 62 3.2 Person Re-Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.2.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2.2 Metric Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.2.3 Evolution of the State-of-the-Art . . . . . . . . . . . . . . . . . . . . . . . 69 3.3 Query Performance Prediction . . . . . . . . . . . . . . . . . . . . . . 73 3.4 Semi-Supervised Classification and Graph Convolutional Networks . 75 3.5 Contrastive Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4 EXPERIMENTAL PROTOCOL . . . . . . . . . . . . . . . . . . . . . 79 4.1 Effectiveness Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.1.1 Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.1.2 Query Performance Prediction . . . . . . . . . . . . . . . . . . . . . . . . 82 4.1.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.2 Datasets and Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2.1 General-Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.2.2 Person Re-Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2.3 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5 SELF-SUPERVISED CONTEXTUAL EFFECTIVENESS ESTIMATION MEASURES . . . . . . . . . . . . . . . . . . . . . . . 89 5.1 Synthetic Data Generation . . . . . . . . . . . . . . . . . . . . . . . . 89 5.2 Deep Rank Noise Estimator (DRNE) . . . . . . . . . . . . . . . . . . 95 5.2.1 Computing Contextual Images from Ranked Lists . . . . . . . . . . . . . . 96 5.2.2 Denoising Convolutional Neural Network for Effectiveness Estimation . . . . 97 5.3 Regression for Query Performance Prediction Framework (RQPPF) 100 5.3.1 Background Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.3.2 Contextual Rank-based Features . . . . . . . . . . . . . . . . . . . . . . . 102 5.3.3 Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.4.1 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.4.2 DRNE Parameter Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4.3 DRNE Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4.4 RQPPF Parameter Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.4.5 RQPPF Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.4.6 Joint Comparison and Discussion . . . . . . . . . . . . . . . . . . . . . . . 112 6 RANK CORRELATION MEASURES FOR MANIFOLD LEARNING ON IMAGE RETRIEVAL . . . . . . . . . . . . . . . . . . . . . . . . 115 6.1 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.1.1 Jaccard Max Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.1.2 Application on Manifold Learning . . . . . . . . . . . . . . . . . . . . . . 117 6.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.2.1 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.2.3 Visual Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7 HYPERGRAPH RANK SELECTION AND FUSION (HRSF) . . . . 122 7.1 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.1.1 Unsupervised Ranker Selection . . . . . . . . . . . . . . . . . . . . . . . . 126 7.1.2 Hypergraph Query Performance Prediction . . . . . . . . . . . . . . . . . . 127 7.1.3 Hypergraph Manifold Rank Aggregation . . . . . . . . . . . . . . . . . . . 129 7.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.2.1 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.2.2 Comparison with Fusion Baselines . . . . . . . . . . . . . . . . . . . . . . 134 7.2.3 State-of-the-Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7.2.4 Visual Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 8 RANK FLOW EMBEDDING (RFE) . . . . . . . . . . . . . . . . . . 142 8.1 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 8.1.1 Formal Definition for Rank-based Manifold and Representation Learning . . 145 8.1.2 Rank Normalization by Reciprocal Sigmoid . . . . . . . . . . . . . . . . . 146 8.1.3 Re-Ranking by Hypergraph Embeddings . . . . . . . . . . . . . . . . . . . 147 8.1.4 Re-Ranking by Cartesian Product . . . . . . . . . . . . . . . . . . . . . . 150 8.1.5 Graph over Hypergraph and Connected Components . . . . . . . . . . . . 150 8.1.6 Embeddings for Classification . . . . . . . . . . . . . . . . . . . . . . . . . 153 8.1.7 Unseen Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8.1.8 Rank Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8.2.1 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8.2.2 Parametric Space Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8.2.3 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 8.2.4 Retrieval Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.2.5 Classification Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 8.2.6 Unseen queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.2.7 Comparison with State-of-the-art for Unsupervised Image Retrieval . . . . . 163 8.2.8 Comparison with State-of-the-art for Semi-Supervised Image Classification . 167 8.2.9 Visual Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 9 CONTEXTUAL MANIFOLD LEARNING ON GRAPH CONVOLUTIONAL NETWORKS (MANIFOLD-GCN) . . . . . . . 171 9.1 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 9.1.1 Similarity Measurement and Ranking Model . . . . . . . . . . . . . . . . . 173 9.1.2 Unsupervised Manifold Learning . . . . . . . . . . . . . . . . . . . . . . . 174 9.1.3 Graph Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 9.1.4 Graph Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . . . 176 9.1.5 GCN Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 9.1.6 Manifold Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . 177 9.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 178 9.2.1 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 9.2.2 Classification Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 9.2.3 Person Re-ID Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 9.2.4 Visualization Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 9.2.5 Comparison with Other Approaches . . . . . . . . . . . . . . . . . . . . . 185 9.2.6 Efficiency Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 10 CONTEXTUAL CONTRASTIVE LOSS (CCL) . . . . . . . . . . . . 191 10.1 Supervised Contrastive Loss . . . . . . . . . . . . . . . . . . . . . . . 192 10.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 10.2.1 Pairwise Similarity and Contextual Information . . . . . . . . . . . . . . . 194 10.2.2 Neighborhood Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 10.2.3 Contextual Similarity and Symmetry Discussion . . . . . . . . . . . . . . . 195 10.2.4 Neighborhood Size and Logarithmic Decay . . . . . . . . . . . . . . . . . . 195 10.2.5 Proposed Contextual Contrastive Loss (CCL) . . . . . . . . . . . . . . . . 195 10.2.6 Proposed Training Workflow . . . . . . . . . . . . . . . . . . . . . . . . . 196 10.3 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 198 11 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 11.1 Discussion of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 11.1.1 Query Performance Prediction . . . . . . . . . . . . . . . . . . . . . . . . 203 11.1.2 Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 11.1.3 Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 11.2 Contributions and Research Questions . . . . . . . . . . . . . . . . . 211 11.3 Publications and International Fellowship . . . . . . . . . . . . . . . . 214 11.4 Code Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 11.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 26 1 Introduction Effectively defining the similarity between images is a central challenge in retrieval and machine learning applications. This issue is deeply connected to: (i): how information is represented; and (ii): the measures used to compare these representations [321, 294, 371, 250]. This work presents contributions in both directions, proposing seven novel approaches. This dissertation discusses and presents contributions aimed at improving the effectiveness of image retrieval by visual content and classification tasks using contextual similarity learning. This introductory chapter outlines an overview of this work and is organized as follows: Section 1.1 discusses the motivations of the conducted research. Section 1.2 presents the challenges and research questions addressed. Section 1.3 states the main hypothesis validated in this dissertation. Section 1.4 discusses the objectives and contributions of the study. Section 1.5 describes the overall structure of this document, including a summary of each chapter’s content and an illustration of how concepts and terms relate to the contributions. 1.1 Motivation In recent years, there has been an exponential increase in the volume of image data, primarily due to advancements in technologies for generating, storing, and sharing visual information [320, 80]. Additionally, there are numerous applications (e.g., surveillance cameras [390, 137, 426], medical imaging [341, 6, 1], remote sensing systems [160], social media [140]) that generate vast amounts of visual data. In this scenario, image retrieval and machine learning tasks such as image classification are increasingly being utilized in many applications [255]. Remarkable progress has been made in these methods, particularly due to the consistent evolution of deep learning [109, 31]. However, the majority of them are supervised and depend on large volumes of labeled data for training. In contrast, the production of labeled data is challenging since it is often expensive and time-consuming to obtain [91]. It may also require a specialist for labeling, especially according to the specificity of the domain. Aiming at filling this gap, many unsupervised, semi-supervised, and even self-supervised approaches have been proposed to deal with such a challenge [107]. In most of these methods, effectively modeling data is crucial for exploiting the information available in the unlabeled data. For most approaches, the essence of learning hinges on the ability to model data Chapter 1. Introduction 27 accurately, which involves different concepts, in particular, representation approaches and distance or similarity measures [321]. This is especially important for Content-Based Image Retrieval (CBIR) systems, which retrieve images based on visual content rather than metadata [316]. These systems usually employ feature extraction and representation methods, which have evolved considerably [294]. Such methods have transitioned from traditional hand-crafted features [254] to more advanced deep learning approaches [294, 438], including Vision Transformers [77, 202]. However, most comparison tasks still rely on pairwise measures [294, 80], which do not exploit contextual information [245]. In general, the term context can be broadly understood as all the relevant information pertinent to an application and its users. This work considers the idea of contextual similarity that consists of exploiting the relationships beyond pairwise analysis, involving other elements, such as the neighborhood or more related additional information [246, 245]. The term contextual similarity learning is used for the learning process that employs contextual similarity for more effectively capturing the underlying relationships among elements. An essential aspect is that contextual similarity information can be modeled in many different forms, using different representations and structures [235, 416], among them: (i) graphs [86, 366]: they can be used for exploiting the relationships between neighbors, which is a key aspect for understanding the local context and influence among interconnected entities; (ii) ranked lists [246, 245]: in image retrieval, each ranked list contains the most similar elements for a given query. The similarities between elements can be redefined according to the analysis of the neighborhoods available in these lists. The position of each element in each list also contains valuable information; (iii) clustering [404, 373]: identify and group data points that are similar according to predefined criteria. These groups allow the discovery of inherent patterns or relationships that may not be apparent upon initial observation. Besides these approaches, there is still a wide range of methodologies that can be proposed to exploit contextual information in numerous scenarios. Similarity learning applied to retrieval is generally explored by re-ranking tasks. Despite the crescent popularity of these methods, more robust structures have not yet been extensively employed in most cases. Structures that represent higher-order similarities, such as relationships among neighbors of neighbors, can be particularly advantageous. Hypergraphs, for example, allow edges to connect multiple vertices, offering a sophisticated technique for capturing these relationships [403, 251]. Additionally, most unsupervised re-ranking approaches [18, 282, 108] provide a new ranked list representation as output, but new features are not produced in return, which could be used to encode contextual information for classifiers, for example. Another application that could deeply take advantage of the enrichment of contextual information is feature selection and fusion [260, 389, 424, 329, 327]. Among different strategies, the selection of features can be done through effectiveness estimation Chapter 1. Introduction 28 and correlation measures [329]. They consider the idea that fusion benefits from elements with high effectiveness and that are also complementary. The creation and usage of contextual structures for estimating the effectiveness and measuring correlations is still an area that requires further research. This work exploits contextual similarity learning for general-purpose image retrieval and person re-identification, usually abbreviated as person Re-ID. Person Re-ID is a type of surveillance application that has been gaining a lot of attention and nowadays is of fundamental importance in many camera surveillance systems. The task consists of identifying individuals across multiple cameras that have no overlapping views [137]. A Re-ID system broadly consists of three main steps [426]: person detection, feature extraction, and person retrieval or matching. This work focuses on the final step, which can be viewed as a specific image retrieval application [137]. Person Re-ID is a complex task that presents numerous difficulties [390, 137, 426], including (i) varying angles of view between cameras, (ii) low-resolution images, (iii) changes in lighting conditions, (iv) occlusions blocking part of the view, (v) the difficulty of manually labeling images for use in training algorithms, (vi) unbalanced classes or classes with very few elements, (vii) the complexity of modeling data, and (viii) the extensive volume of data that needs to be processed. Amid these challenges in Re-ID, many approaches introduced more robust deep learning models [390], such as Vision Transformers [111, 163, 221], metric learning [185, 156, 185, 396], and Siamese networks [340, 339]. Other strategies include dataset expansion with augmentations [132, 291] or artificial data considering appearance attributes, body parts, temporal information, and different types of multimodal information. Additionally, metric learning is often employed for Re-ID due to its capacity to be effective when dealing with unseen data [185] since it focuses on learning distances or similarities rather than specific features of the training data. This approach allows the model to generalize better to new examples that were not present in the training set. Apart from all these advancements, post-processing methods that exploit contextual information have gained significant attention due to their ability to improve results provided by latent features of different deep learning models. Various unsupervised post-processing strategies are based on the idea of exploiting the information of reciprocal neighborhoods and measuring the co-occurrence of elements in ranked lists [429, 174, 225, 165, 211, 96, 388, 95, 108], demonstrating substantial improvements. Although these approaches are becoming increasingly common in Re-ID, methods for selection and fusion remain relatively scarce. This scenario highlights the importance of investigating methods capable of effectively exploiting contextual information. In addition to retrieval, it is also imperative to address scenarios with limited labeled data for classification [135]. Graph convolutional networks (GCNs) offer a promising Chapter 1. Introduction 29 solution for semi-supervised classification by learning from both labeled and unlabeled data considering graph structures [412]. Moreover, GCNs can learn node and graph embeddings that capture complex dependencies and structural relationships [141]. However, GCNs are not widely used for image classification since graphs are typically not available in image domains [274, 343, 307]. Therefore, effectively modeling these graphs, which can be utilized to exploit contextual information, is a crucial topic for research. Another approach that has recently demonstrated continuous advances for improving classification results is the use of contrastive learning [143, 47]. Unlike the commonly used cross-entropy loss, which aims to minimize the difference between the predicted and true class probabilities, contrastive loss focuses on learning similarities and dissimilarities between data points rather than merely categorizing them [47]. Despite this, most contrastive losses consider only pairwise measures [143, 47, 49, 47], with only a few incorporating some type of neighborhood information [441, 82, 183]. Moreover, these approaches often require huge volumes of data for training (i.e., labeled or unlabeled) [47, 49], even in self-supervised scenarios, which is a challenge in circumstances where data is scarce. In light of the presented discussion and all challenges, the focus of this dissertation is to exploit the use of contextual similarity information with the objective of improving the effectiveness of image retrieval and classification, particularly in cases where labeled data is limited or non-existent. This dissertation primarily concentrates on unsupervised learning, while also proposing semi-supervised and supervised approaches. 1.2 Research Challenges Contextual similarity information can be applied in a variety of fields. However, appropriately representing and exploiting contextual information in each scenario poses significant difficulties. There are many research challenges related to various applications that can be used to improve the effectiveness of image retrieval and classification tasks. In the following, several topics are discussed and corresponding research questions are presented for each: • Selection and fusion in person Re-ID: The selection and fusion involves choosing the most relevant features from the data and combining them to enhance the retrieval effectiveness [260]. There are various feature extractors and possible combinations between them. Selecting the right features is crucial because manually evaluating all combinations becomes impractical as the number of features increases linearly and the number of combinations increases exponentially. The concept is based on the idea that fusion is most effective when it involves elements that are both highly efficient and complementary. For person Re-ID, the complexity of accurately matching Chapter 1. Introduction 30 individuals across different camera views becomes significantly more challenging in unsupervised applications due to the absence of labeled data [137]. Effectively modeling and exploiting patterns in the data is crucial in this scenario. Research question: – How can contextual similarity information be used for selection and fusion in unsupervised person Re-ID? • Query performance prediction: Also known as effectiveness estimation, query performance prediction (QPP) encompasses techniques for assessing the quality of ranked lists in scenarios where no labels are provided. In this context, the ability to assess the effectiveness of the retrieval process provides a significant advantage for different tasks, including enabling the selection of more effective ranked lists. However, QPP is very challenging, especially in unsupervised tasks. One of the main difficulties is elaborating an approach that effectively generalizes across diverse scenarios [262]. Bridging this gap represents a major challenge that can be mitigated by incorporating contextual similarity information. Research question: – How can data be modeled using contextual similarity information for query performance prediction? • Synthetic data: Recently, self-supervised approaches have been proposed to address scenarios where labeled data is scarce. Among the different means of self-supervision, one of them is by using synthetic data. There are many advantages and benefits of using synthetic data, primarily due to its flexibility and control in generating large volumes of annotated data. In domains where safety and privacy are relevant, using real data can raise privacy concerns and legal issues. Synthetic data does not carry these risks. However, creating representative synthetic data presents many difficulties. One of the primary challenges is to accurately reflect the complexity and variability of real-world data [72]. The generated synthetic data is expected to encompass a wide range of scenarios, including rare events and edge cases, to ensure comprehensive learning. Research questions: – How can contextual similarity information be used to generate synthetic data? – How can contextual similarity learning be employed on synthetically generated data? • Unsupervised similarity learning methods: Despite the potential of unsupervised similarity learning methods to improve retrieval results, effectively representing and encoding the maximum amount of contextual information remains Chapter 1. Introduction 31 a challenge. This difficulty is amplified because these methods operate without labels and cannot utilize relevance feedback [312] as supervised algorithms do. These methods usually exploit the relationships among images through ranked lists and similarity among elements [95, 108, 250]. The primary challenge lies in modeling and leveraging this similarity information, which can be approached through various strategies such as graph structures [381], contextual measures [128], and more [382, 384]. Utilizing more complex structures to represent second-order similarity (i.e., relationships such as neighbors of neighbors) can be particularly relevant, for example. Research question: – How can more complex structures, which encode contextual information more effectively, be applied to unsupervised similarity learning? • Representation learning and embeddings: Feature learning is of fundamental importance in many retrieval and classification applications [321]. However, the capacity to encode information of an image in an embedding is very challenging. When converting an image into an embedding, some information is inevitably lost. This loss must be minimized to ensure that the most critical features of the image are retained. There is also the semantic gap [24, 115] between the raw pixel data of an image and the human interpretation of the image’s content. Unsupervised similarity learning approaches usually post-process ranked lists to enhance image retrieval results but do not provide any form of embeddings that can be used for other tasks, such as classification. Research question: – How can contextual information from similarity learning approaches be encoded to generate embeddings that are useful for tasks beyond retrieval, such as classification? • Contextual similarity and Graph Convolutional Networks (GCNs): The GCNs effectively capture relationships and interactions within complex networks, enhancing results in tasks involving structured data. However, graphs are not inherently available for most image datasets, and GCNs heavily rely on these structures to deliver significant results [141, 307]. The main challenge involves accurately modeling the graph for effective use by the GCN. Research question: – How can contextual similarity information be incorporated into the input graph utilized by Graph Convolutional Networks (GCNs) and improve their classification results? Chapter 1. Introduction 32 • Correlation measures and manifold learning: Manifold learning is a technique for uncovering simpler, underlying structures in complex high-dimensional data [133]. Correlation measures quantify the similarity between data points, which is very useful to model relationships in the data. However, this is challenging since data can be complex and heterogeneous, involving multiple variables with nonlinear relationships that are difficult to capture [16]. Also, outliers may present significant challenges in data analysis. Research questions: – Can rank-based information be utilized to measure the correlation between images more effectively? – Can a correlation measure be proposed and applied to enhance image retrieval with manifold learning? • Contrastive learning: It has been extensively used in self-supervised and supervised learning due to its effectiveness in learning representations that distinguish between similar and dissimilar images. It offers an alternative to cross-entropy by yielding more semantically meaningful image embeddings. However, most contrastive losses rely on pairwise measures to assess the similarity between elements [143, 47], ignoring more general neighborhood information that can be leveraged to enhance model robustness and generalization [441]. Research question: – How can contextual similarity information be incorporated into metric learning, including its direct integration into losses such as contrastive loss? The contributions presented and discussed in this work address those important research challenges. 1.3 Dissertation Statement Driven by the challenges identified in the literature, primarily the difficulty of obtaining a large amount of labeled data and the increasing need for methods that exploit contextual information, we explore the application of contextual similarity learning in different scenarios. The main hypothesis of the work is briefly stated as follows: Contextual similarity learning can improve the effectiveness of image retrieval and classification tasks across general-purpose and person re-identification (Re-ID) applications. This concept is applicable to unsupervised, semi-supervised, and supervised approaches, particularly in contexts where labeled data is limited. Chapter 1. Introduction 33 The hypothesis is validated by the proposed approaches and a comprehensive experimental evaluation presented in this dissertation. 203 11 Conclusions This chapter concludes this dissertation by discussing the contributions and other relevant aspects. Section 11.1 reviews the main results obtained for each task: query performance prediction, image retrieval, and image classification. Additionally, it provides a comparative analysis of the proposed methods alongside other approaches from the literature. Section 11.2 discusses how the contributions address the research questions of this study. Section 11.3 lists the publications and submissions obtained, along with the international Fulbright fellowship. Section 11.4 mentions the available codes for the proposed approaches. Finally, Section 11.5 presents potential extensions and future work, describing their connections to the contributions achieved in this research. 11.1 Discussion of Results Given the notable outcomes achieved by contextual similarity learning across all scenarios considered, this section discusses the results for each task. For query performance prediction, the results of RQPPF and DRNE are jointly compared and discussed in Section 11.1.1. Section 11.1.2 compares the approaches evaluated for image retrieval in person Re-ID and general-purpose datasets, including comparisons with the state-of-the-art. Section 11.1.3 overviews Manifold-GCN and RFE semi-supervised classification results obtained and a comparison with the state-of-the-art. Moreover, the gains achieved by CCL are briefly discussed. 11.1.1 Query Performance Prediction A great variety of experiments was conducted to evaluate DRNE and RQPPF, showing their capacity to effectively perform QPP in various datasets. Table 11.1 presents a summary of the relative gains for both approaches in comparison to Authority [243] and Reciprocal Density [248], which are used as baselines. Notice that RQPPF provided gains in all the evaluated scenarios, while DRNE showed inferior performance in some cases, especially when compared to Reciprocal in the MPEG-7 dataset. However, for the most part, DRNE provided higher and more consistent gains when compared to Reciprocal. Specifically for the AIR descriptor, DRNE revealed superior results in all cases. In general, the results showed that the proposed methods are better than the baselines in most cases. Additionally, the choice of the best method depends not only on the dataset but also on the descriptor. RQPPF is more flexible and also uses Authority and Reciprocal as part of its formulation, while DRNE does not. However, DRNE seems Chapter 11. Conclusions 204 more robust to outlier descriptors, while RQPPF does not. Among potential extensions, combining DRNE and RQPPF is one of the possibilities for future work. Table 11.1 – Relative gains of DRNE and RQPPF when compared to Authority (Auth.) and Reciprocal Density (Rec.). Average gains are reported for each dataset. Descriptor Original Compared to Auth. [243] Compared to Rec. [248] MAP DRNE RQPPF DRNE RQPPF MPEG-7 AIR [103] 89.39% +14.81% +3.50% +16.84% +12.99% ASC [191] 85.28% -2.50% +3.76% -8.29% +1.84% IDSC [190] 81.70% -3.93% +3.69% -7.58% +1.88% CFD [244] 80.71% +3.47% +3.99% -1.24% +2.71% BAS [13] 71.52% +0.85% +3.69% -5.13% +1.09% SS [317] 37.67% +7.08% +6.01% +3.32% +3.52% Average Gain +3.30% +4.11% -0.35% +4.01% Brodatz LAS [308] 75.15% +7.50% +9.01% +9.59% +11.15% CCOM [148] 57.57% +3.30% +7.33% +8.60% +11.18% LBP [231] 48.40% +0.75% +5.10% +18.42% +15.36% Average Gain +3.85% +7.15% +12.20% +12.56% Market OSNET [436] 43.30% -2.63% +1.19% +5.40% +5.45% ResNet [110] 22.82% -0.89% +0.13% +7.95% +5.29% Average Gain -1.76% +0.66% +6.68% +5.37% Duke OSNET [436] 52.69% +0.71% +2.35% +1.00% +3.37% ResNet [110] 32.00% -2.14% +0.52% -0.12% +2.46% Average Gain -0.72% +1.44% +0.44% +2.92% 11.1.2 Image Retrieval Four of the seven proposed methods were evaluated in image retrieval: HRSF, JaccardMax, RFE, and Manifold-GCN. The results obtained are reviewed for both person Re-ID and general-purpose datasets, including comparisons against each other and with the state-of-the-art. A brief discussion about the gains is also presented. • Person Re-ID Considering the wide variety of descriptors employed and to provide a fair comparison, Table 11.2 presents the best results obtained for each method using only the OSNET model and its variants (i.e., OSNET, OSNET-IBN, and OSNET-AIN) on Market, DukeMTMC, and CUHK03 datasets. For the Market and CUHK03 datasets, HRSF leads with the best results for both R1 and MAP. HRSF is the only method that performs selection, which is an advantage over the others since it can select the best combination of descriptors among the OSNET variants. For the DukeMTMC dataset, RFE and JaccardMax compete for the best results. The worst results in this table are the ones obtained by the Manifold-GCN. Besides Manifold-GCN being semi-supervised, while all the other approaches are unsupervised, this result highlights the importance of future research for this method. Since it was mainly proposed for classification, the results for retrieval are significantly behind others, probably Chapter 11. Conclusions 205 due to the features not being properly distributed in the latent space, which requires further investigation in future work. Table 11.3 presents the methods ranked according to their results for each measure and dataset. The average rank reveals that, while HRSF shows the best results in most cases, JaccardMax and RFE follow closely, with average ranks of 2.0 and 2.2, respectively. As previously discussed, Manifold-GCN is behind with an average rank of 3.7. Table 11.2 – Comparison between the proposed approaches on person Re-ID considering MAP (%) and R-01 (%). The best results obtained with the OSNET descriptor and its variants are reported. Datasets Method Year Market1501 DukeMTMC CUHK03 R1 MAP R1 MAP R1 MAP HRSF (X∗, best result) [331] 2022 75.71 62.94 77.24 68.88 39.04 39.69 Correlation Graph + Jaccard Max [324] 2022 73.25 59.84 76.21 69.27 — — RFE [334] 2023 72.42 59.51 77.69 69.21 36.89 39.24 Manifold-GCN [333] 2023 70.30 57.48 74.22 65.83 35.19 35.99 Table 11.3 – Proposed approaches on person Re-ID ranked according to their effectiveness (R1 and MAP). The best results obtained with the OSNET descriptor and its variants were considered. Datasets Method Year Market1501 DukeMTMC CUHK03 Average R1 MAP R1 MAP R1 MAP Rank HRSF (X∗, best result) [331] 2022 1 1 2 3 1 1 1.5 Correlation Graph + Jaccard Max [324] 2022 2 2 3 1 — — 2.0 RFE [334] 2023 3 3 1 2 2 2 2.2 Manifold-GCN [333] 2023 4 4 4 4 3 3 3.7 To compare the proposed methods with the state-of-the-art in person Re-ID, which is presented in Table 11.4, the best results for each approach were considered. For HRSF, JaccardMax, and RFE the best results used OSNET and its variants. Unlike the others, the JaccardMax evaluation employed the TransReID descriptor, which provided better results for this method. This table highlights in bold the highest value for each column. The best among our methods is also highlighted. All the baseline results are the ones reported in the literature, following the same protocol as ours. In general, it can be observed that the proposed approaches provide better results for MAP than R1 when compared to other methods. This evinces that they can significantly improve the top positions of ranked lists, but not necessarily achieve the best results when considering only the first position. In this case, Market1501 was revealed as the most challenging dataset, where the proposed methods are better or comparable to the ones up to 2020. After that, the baselines show a considerable improvement. In contrast, for the DukeMTMC dataset, the MAP of 73.96% obtained by the proposed JaccardMax (2022) is the best result achieved, surpassed only by VAL-PAT, which is a very recent approach from 2023. For the CUHK03 dataset, many of the methods have no results reported in Chapter 11. Conclusions 206 Table 11.4 – Proposed approaches compared to state-of-the-art on person Re-ID considering MAP (%) and R-01 (%). Datasets Method Year Market1501 DukeMTMC CUHK03 R1 MAP R1 MAP R1 MAP Unsupervised Methods ARN [181] 2018 70.3 39.4 60.2 33.4 — — EANet [118] 2018 66.4 40.6 45.0 26.4 51.4 31.7 TAUDL [170] 2018 63.7 41.2 61.7 43.5 44.7 31.2 ECN [431] 2019 75.1 43.0 63.3 40.4 — — MAR [397] 2019 67.7 40.0 87.1 48.0 — — UTAL [171] 2019 69.2 46.2 62.3 44.6 56.3 42.3 SSL [189] 2020 71.7 37.8 52.5 28.6 — — HCT [402] 2020 80.0 56.4 69.6 50.7 — — CAP [353] 2021 91.4 79.2 81.1 67.3 — — IICS [376] 2021 89.5 72.9 80.0 64.4 — — RLCC [415] 2021 90.8 77.7 83.2 69.2 — — ICE [43] 2021 93.8 82.3 83.3 69.9 — — MGH [368] 2021 93.2 81.7 83.7 70.2 — — MGCE-HCL [297] 2022 92.1 79.6 82.5 67.5 — — MCRN [367] 2022 92.5 80.8 83.5 69.9 — — O2CAP [354] 2022 92.5 82.7 83.9 71.2 — — DIDAL [201] 2023 94.2 84.8 — — — — VAL-PAT [23] 2023 — — 86.1 74.9 — — Domain Adaptative Methods HHL (D,M) [430] 2018 62.2 31.4 46.9 27.2 — — HHL (C03) [430] 2018 56.8 29.8 42.7 23.4 — — ATNet (D,M) [197] 2019 55.7 25.6 45.1 24.9 — — CSGLP (D,M) [273] 2019 63.7 33.9 56.1 36.0 — — ISSDA (D,M) [306] 2019 81.3 63.1 72.8 54.1 — — ECN++ (D,M) [432] 2020 84.1 63.8 74.0 54.4 — — MMCL (D,M) [348] 2020 84.4 60.4 72.4 51.4 — — JVCT+ (D,M) [44] 2021 90.5 75.4 81.9 67.6 — — MCRN (D,M) [367] 2022 93.8 83.8 84.5 71.5 — — Cross-Domain Methods (single-source) EANet (C03) [118] 2018 59.4 33.3 39.3 22.0 — — EANet (D,M) [118] 2018 61.7 32.9 51.4 31.7 — — SPGAN (D,M) [71] 2018 43.1 17.0 33.1 16.7 — — DAAM (D,M) [121] 2019 42.3 17.5 29.3 14.5 — — AF3 (D,M) [195] 2019 67.2 36.3 56.8 37.4 — — AF3 (MT) [195] 2019 68.0 37.7 66.3 46.2 — — PAUL (MT) [380] 2019 68.5 40.1 72.0 53.2 — — Cross-Domain Methods (multi-source) CAMEL [396] 2017 54.5 26.3 — — 31.9 — EMTL [370] 2018 52.8 25.1 39.7 22.3 — — Baseline by [153] 2019 80.5 56.8 67.4 46.9 29.4 27.4 Proposed Methods (contributions) HRSF (X∗, best result) [331] 2022 75.71 62.94 77.24 68.88 39.04 39.69 Correlation Graph + Jaccard Max [324] 2022 75.42 63.53 78.59 73.96 — — RFE [334] 2023 72.42 59.51 77.69 69.21 36.89 39.24 Manifold-GCN [333] 2023 70.30 57.48 74.22 65.83 35.19 35.99 the literature, since this dataset is not as commonly evaluated as the others. However, all methods provided a better MAP than the baselines, being only behind UTAL. The presented comparisons raise two topics for discussion: (i) Why the obtained results are significantly better in DukeMTMC? Why does Market1501 appear to be Chapter 11. Conclusions 207 considerably more difficult? (ii) RFE and CG [249] + JaccardMax exhibit close results when using the same descriptors for Re-ID. Is this also true in other scenarios? The first topic is challenging to answer, especially because Market1501 and DukeMTMC datasets have very similar characteristics (e.g., dataset size, number of individuals, images per person, size of the train and evaluation sets, and number of cameras). However, one particular difference might explain it. The Market1501 dataset was annotated using an automated detector, the Deformable Part Model (DPM), which is known to be prone to noise and potential misalignment. Conversely, the DukeMTMC was manually annotated by humans providing cleaner data with well-aligned bounding boxes. Further investigation to address this aspect can be conducted as future work. Regarding the close results of RFE and CG [249] + JaccardMax for Re-ID, these methods are compared on general-purpose datasets to evaluate if they exhibit similar behavior. • General-Purpose Datasets Tables 11.5 and 11.6 compare RFE and CG [249] + JaccardMax with the state-of-the-art in image retrieval tasks for the datasets Holidays and UKBench, respectively. In both datasets, RFE outperformed all the baselines. For Holidays, CG [249] + JaccardMax is behind RFE with 91.12% but still surpasses most of the other methods. In contrast, for UKbench, both achieved the same result of 3.97, which is very close to the maximum score (i.e., 4). Table 11.5 – State-of-the-art (SOTA) comparison on Holidays dataset (MAP). MAP for state-of-the-art methods Jégou Tolias Paulin Qin Zheng Sun Zheng et al. [127] et al. [315] et al. [238] et al. [268] et al. [425] et al. [299] et al. [423] 75.07% 82.20% 82.90% 84.40% 85.20% 85.50% 85.80% Pedronette Arandjelovic Li Razavian Pedronette Gordo Valem et al. [241] et al. [12] et al. [178] et al. [271] et al. [253] et al. [104] et al. [329] 86.16% 87.50% 89.20% 89.60% 90.02% 90.30% 90.51% Valem Liu Pedronette Pedronette Yu Berman et al. [328] et al. [203] et al. [251] et al. [252] et al. [398] et al. [26] 90.51% 90.89% 90.94% 91.25% 91.40% 91.80% Proposed Approaches CG + JacMax RFE 91.12% 91.97% Chapter 11. Conclusions 208 Table 11.6 – State-of-the-art (SOTA) comparison on UKBench dataset (NS-Score). N-S-Scores for state-of-the-art methods Qin Zhang Zheng Bai Xie Lv Liu Pedronette et al. [267] et al. [413] et al. [424] et al. [16] et al. [371] et al. [210] et al. [203] et al. [241] 3.67 3.83 3.84 3.86 3.89 3.91 3.92 3.93 Bai Liu Valem Bai Valem Valem Chen et al. [20] et al. [159] et al. [328] et al. [17] et al. [329] et al. [327] et al. [50] 3.93 3.93 3.93 3.94 3.94 3.95 3.96 Proposed Approaches CG + JacMax RFE 3.97 3.97 • Discussion about Gains From the observed results, we can notice that the proposed approaches are comparable or better than state-of-the-art approaches in most cases. The best method in each scenario varies since each dataset and descriptor presents different aspects. An important attribute of the proposed approaches is their capacity to improve the input data by employing contextual similarity learning. Figure 11.1 presents the relative gains of RFE and JaccardMax for different datasets and descriptors. This demonstrates the capacity of contextual similarity learning to improve the results across multiple scenarios. The Holidays and UKBench datasets exhibited smaller gains because their descriptors already achieved higher results, making further enhancements more challenging compared to other datasets. Despite this, it is impressive that, despite the advancements in feature extraction, for different deep learning models from CNNs to Vision Transformers, the potential to obtain improved results was achieved across all cases. Core l5k (R esN et) Core l5k (V IT-B 16 ) Core l5k (S WIN-TF ) Holid ay s ( Re sN et) Holid ay s ( VIT-B 16 ) Holid ay s ( CNN-OLD FP) Ukb en ch (Re sN et) Ukb en ch (VIT-B 16 ) Ukb en ch (CNN-OLD FP) Mark et (OSN et- AIN) Duke (O SN et- AIN) Datasets (Descriptors) 0 5 10 15 20 25 30 35 40 Re la tiv e Ga in s ( % ) 33.6 19.9 28.7 1.0 1.3 2.0 2.8 3.2 1.2 32.6 29.9 38.6 24.1 30.7 3.3 2.9 1.9 2.9 3.2 1.2 37.5 32.2 Relative gains (%) compared to original MAP of descriptors CG + JaccardMax RFE Figure 11.1 – RFE and JaccardMax relative gains (%) over MAP of descriptors. Chapter 11. Conclusions 209 11.1.3 Image Classification Since both Manifold-GCN and RFE were employed for semi-supervised classification, Table 11.7 compares them to baselines, both traditional and recent, on Flowers and Corel5k datasets. The values achieved by Manifold-GCN are the highest in all the cases, and they are closely followed by RFE. These results reveal the high effectiveness of the proposed approaches that, besides the significant gains, are also comparable or superior to various methods in the literature. Table 11.7 – Accuracy comparison (%) for baselines on Flowers and Corel5k datasets. We compared the proposed RFE and Manifold-GCN with semi-supervised classification baselines. The methods are compared with different input features. The results of our methods are highlighted with a gray background; the best results for each pair of features and dataset are marked in bold. Method Input Flowers Corel5k CoMatch [169] Images 82.55 85.70 kNN 63.67 76.80 SVM [54] 80.54 88.73 OPF [8] 71.77 83.56 SL-Perceptron 75.44 83.56 ML-Perceptron 78.88 87.10 PseudoLabel+SGD [162] 82.69 89.76 LS+kNN [433] ResNet 73.49 83.98 LS+SVM [433, 54] Features 73.53 83.26 LS+OPF [433, 8] 72.66 82.32 LS+SL-Perceptron [433] 72.34 82.38 LS+ML-Perceptron [433] 73.03 82.53 GNN-LDS [90] 54.98 62.69 GNN-KNN-LDS [90] 79.32 88.94 WSEF [264] 85.12 91.68 RFE 84.95 91.54 Manifold-GCN 85.88 93.08 kNN 48.71 58.78 SVM [54] 73.30 85.89 OPF [8] 64.00 81.33 SL-Perceptron 71.84 82.28 ML-Perceptron 72.62 86.90 PseudoLabel+SGD [162] 76.87 89.85 LS+kNN [433] SENet 58.05 72.16 LS+SVM [433, 54] Features 59.84 72.79 LS+OPF [433, 8] 59.25 72.20 LS+SL-Perceptron [433] 59.27 72.19 LS+ML-Perceptron [433] 59.39 72.24 GNN-LDS [90] 52.24 65.80 GNN-KNN-LDS [90] 73.69 89.95 WSEF [264] 76.16 89.74 RFE 77.56 92.20 Manifold-GCN 78.82 92.79 Chapter 11. Conclusions 210 The CCL was also proposed and evaluated for classification but in supervised scenarios. Since it is the only method proposed in this category and it uses different datasets in the protocol, a direct comparison of it with other proposed methods is not feasible. Therefore, a discussion about its gains is presented. The experimental evaluation in Chapter 10 showed that the results are consistently better than those of SupCon, which CCL is based on, and SimCLR, another method commonly used as a baseline in this task. Figure 11.2 presents a plot that evinces the capacity of CCL to provide gains when compared to SupCon for three datasets and with higher values as the training set size decreases. The integration of contextual information within the contrastive loss significantly improved the results, as initially hypothesized, with gains up to 10.759%. Food101 MiniImageNet CIFAR-100 Dataset 0 2 4 6 8 10 Re la tiv e Ga in s ( % ) 2.67 0.985 1.61 3.532 1.849 1.917 5.335 2.964 2.437 10.759 7.042 4.774 CCL: Relative gains by dataset and training/testing splits Train = 20%, Test = 80% Train = 40%, Test = 60% Train = 60%, Test = 40% Train = 80%, Test = 20% Figure 11.2 – Relative gains (%) obtained by CCL in comparison to SupCon for different train/test splits considering 100 training epochs. 222 Bibliography [1] Agarwal, M. and Mostafa, J. (2011). Content-based image retrieval for alzheimer’s disease detection. In 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI), pages 13–18. [2] Albawi, S.; Mohammed, T. A.; and Al-Zawi, S. (2017). Understanding of a convolutional neural network. In 2017 International Conference on Engineering and Technology (ICET), pages 1–6. [3] Ali, N.; Zafar, B.; Iqbal, M. K.; Sajid, M.; Younis, M. Y.; Dar, S. H.; Mahmood, M. T.; and Lee, I. H. (2019). Modeling global geometric spatial information for rotation invariant classification of satellite images. PLoS One, 14(7):e0219833. [4] Alnissany, A. and Dayoub, Y. (2023). Modified centroid triplet loss for person re-identification. Journal of Big Data, 10(1):74. [5] Alqasemi, F. A.; Alabbasi, H. Q.; Sabeha, F. G.; Alawadhi, A.; Kahlid, S.; and Zahary, A. (2019). Feature selection approach using knn supervised learning for content-based image retrieval. In 2019 First International Conference of Intelligent Computing and Engineering (ICOICE), pages 1–5. [6] Alves, C. and Traina, A. J. M. (2022). Variational autoencoders for medical image retrieval. In 2022 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pages 1–6. [7] Alvin, Y. H. Y. and Chakraborty, D. (2023). Approximate Maximum Rank Aggregation: Beyond the Worst-Case. In Bouyer, P. and Srinivasan, S., editors, 43rd IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2023), volume 284 of Leibniz International Proceedings in Informatics (LIPIcs), pages 12:1–12:21, Dagstuhl, Germany. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. [8] Amorim, W. P.; Falcão, A. X.; and d. Carvalho, M. H. (2014). Semi-supervised pattern classification using optimum-path forest. In 2014 27th SIBGRAPI Conference on Graphics, Patterns and Images, pages 111–118. [9] An, L.; Chen, X.; Yang, S.; and Li, X. (2017). Person re-identification by multi-hypergraph fusion. IEEE Transactions on Neural Networks and Learning Systems, 28(11):2763–2774. Bibliography 223 [10] Anand, A.; Leonhardt, J.; Rudra, K.; and Anand, A. (2022). Supervised contrastive learning approach for contextual ranking. In Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR ’22, page 61–71, New York, NY, USA. Association for Computing Machinery. [11] Antelmi, A.; Cordasco, G.; Polato, M.; Scarano, V.; Spagnuolo, C.; and Yang, D. (2023). A survey on hypergraph representation learning. ACM Comput. Surv., 56(1). [12] Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; and Sivic, J. (2016). Netvlad: Cnn architecture for weakly supervised place recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5297–5307. [13] Arica, N. and Vural, F. T. Y. (2003). BAS: a perceptual shape descriptor based on the beam angle statistics. Pattern Recognition Letters, 24(9-10):1627–1639. [14] Awad, M. and Khanna, R. (2015). Machine Learning, pages 1–18. Apress, Berkeley, CA. [15] Baeza-Yates, R. and Ribeiro-Neto, B. (2013). Recuperação de Informação: Conceitos e Tecnologia das Máquinas de Busca. Editora Bookman. [16] Bai, S. and Bai, X. (2016). Sparse contextual activation for efficient visual re-ranking. IEEE Trans. on Image Processing (TIP), 25(3):1056–1069. [17] Bai, S.; Bai, X.; Tian, Q.; and Latecki, L. J. (2017). Regularized diffusion process for visual retrieval. In Conf. on Artificial Intelligence (AAAI), pages 3967–3973. [18] Bai, S.; Bai, X.; Tian, Q.; and Latecki, L. J. (2019). Regularized diffusion process on bidirectional context for object retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(5):1213–1226. [19] Bai, S.; Zhang, F.; and Torr, P. H. (2021a). Hypergraph convolution and hypergraph attention. Pattern Recognition, 110:107637. [20] Bai, S.; Zhou, Z.; Wang, J.; Bai, X.; Latecki, L. J.; and Tian, Q. (2017). Ensemble diffusion for retrieval. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 774–783. [21] Bai, X.; Bai, S.; and Wang, X. (2015). Beyond diffusion process: Neighbor set similarity for fast re-ranking. Information Sciences, 325:342 – 354. [22] Bai, Z.; Wang, Z.; Wang, J.; Hu, D.; and Ding, E. (2021b). Unsupervised multi-source domain adaptation for person re-identification. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12909–12918. Bibliography 224 [23] Bao, L.-N.; Wei, L.; Qiu, X.; gang Zhou, W.; Li, H.; and Tian, Q. (2023). Learning transferable pedestrian representation from multimodal information supervision. ArXiv, abs/2304.05554. [24] Barz, B. and Denzler, J. (2021). Content-based image retrieval and the semantic gap in the deep learning era. In Del Bimbo, A.; Cucchiara, R.; Sclaroff, S.; Farinella, G. M.; Mei, T.; Bertini, M.; Escalante, H. J.; and Vezzani, R., editors, Pattern Recognition. ICPR International Workshops and Challenges, pages 245–260, Cham. Springer International Publishing. [25] Bedagkar-Gala, A. and Shah, S. K. (2014). A survey of approaches and trends in person re-identification. Image and Vision Computing, 32(4):270 – 286. [26] Berman, M.; Jégou, H.; Andrea, V.; Kokkinos, I.; and Douze, M. (2019). MultiGrain: a unified image embedding for classes and instances. arXiv e-prints. [27] Berthelot, D.; Carlini, N.; Cubuk, E. D.; Kurakin, A.; Sohn, K.; Zhang, H.; and Raffel, C. (2020). Remixmatch: Semi-supervised learning with distribution matching and augmentation anchoring. In International Conference on Learning Representations. [28] Berthelot, D.; Carlini, N.; Goodfellow, I. J.; Papernot, N.; Oliver, A.; and Raffel, C. (2019). Mixmatch: A holistic approach to semi-supervised learning. CoRR, abs/1905.02249. [29] Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; and Upcroft, B. (2016). Simple online and realtime tracking. In 2016 IEEE International Conference on Image Processing (ICIP), pages 3464–3468. [30] Bianchi, F. M.; Grattarola, D.; Livi, L.; and Alippi, C. (2021). Graph neural networks with convolutional arma filters. IEEE TPAMI, pages 1–1. [31] Black, E. and Fredrikson, M. (2021). Leave-one-out unfairness. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 285–295, New York, NY, USA. Association for Computing Machinery. [32] Bolme, D. S.; Beveridge, J. R.; Draper, B. A.; and Lui, Y. M. (2010). Visual object tracking using adaptive correlation filters. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2544–2550. [33] Bossard, L.; Guillaumin, M.; and Van Gool, L. (2014). Food-101 – mining discriminative components with random forests. In Fleet, D.; Pajdla, T.; Schiele, B.; and Tuytelaars, T., editors, Computer Vision – ECCV 2014, pages 446–461, Cham. Springer International Publishing. Bibliography 225 [34] Bretto, A. (2013). Hypergraph Theory: An Introduction. Springer International Publishing. [35] Brodatz, P. (1966). Textures: A Photographic Album for Artists and Designers. Dover. [36] Cai, D.; Zhang, C.; and He, X. (2010). Unsupervised feature selection for multi-cluster data. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, page 333–342, New York, NY, USA. Association for Computing Machinery. [37] Camps, O.; Gou, M.; Hebble, T.; Karanam, S.; Lehmann, O.; Li, Y.; Radke, R. J.; Wu, Z.; and Xiong, F. (2017). From the lab to the real world: Re-identification in an airport camera network. IEEE Transactions on Circuits and Systems for Video Technology, 27(3):540–553. [38] Chakraborty, D.; Das, S.; Khan, A.; and Subramanian, A. (2022). Fair rank aggregation. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022). [39] Chang, X.; Hospedales, T. M.; and Xiang, T. (2018). Multi-level factorisation net for person re-identification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). [40] Chatzichristofis, S. A. and Boutalis, Y. S. (2008a). Cedd: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval. In Proceedings of the 6th international conference on Computer vision systems, ICVS’08, pages 312–322. [41] Chatzichristofis, S. A. and Boutalis, Y. S. (2008b). Fcth: Fuzzy color and texture histogram - a low level feature for accurate image retrieval. In WIAMIS, pages 191–196. [42] Chaudhuri, U.; Banerjee, B.; and Bhattacharya, A. (2019). Siamese graph convolutional network for content based remote sensing image retrieval. Computer Vision and Image Understanding, 184:22–30. [43] Chen, H.; Lagadec, B.; and Bremond, F. (2021a). Ice: Inter-instance contrastive encoding for unsupervised person re-identification. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 14940–14949. [44] Chen, H.; Wang, Y.; Lagadec, B.; Dantcheva, A.; and Bremond, F. (2021b). Joint generative and contrastive learning for unsupervised person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2004–2013. [45] Chen, S.-B.; Tian, X.-Z.; Ding, C. H. Q.; Luo, B.; Liu, Y.; Huang, H.; and Li, Q. (2020a). Graph convolutional network based on manifold similarity learning. Cognitive Computation, 12(6):1144. Bibliography 226 [46] Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 785–794, New York, NY, USA. Association for Computing Machinery. [47] Chen, T.; Kornblith, S.; Norouzi, M.; and Hinton, G. (2020b). A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org. [48] Chen, W.; Liu, Y.; Wang, W.; Bakker, E. M.; Georgiou, T.; Fieguth, P. W.; Liu, L.; and Lew, M. S. (2021c). Deep image retrieval: A survey. CoRR, abs/2101.11282. [49] Chen, X. and He, K. (2021). Exploring simple siamese representation learning. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15745–15753. [50] Chen, X. and Li, Y. (2020). Deep feature learning with manifold embedding for robust image retrieval. Algorithms, 13(12). [51] Chen, Y.; Li, J.; Xiao, H.; Jin, X.; Yan, S.; and Feng, J. (2017). Dual path networks. In Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., editors, Advances in Neural Information Processing Systems 30, pages 4467–4475. Curran Associates, Inc. [52] Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1800–1807. [53] Cieplinski, L. (2001). Mpeg-7 color descriptors and their applications. In Skarbek, W., editor, Computer Analysis of Images and Patterns, pages 11–20, Berlin, Heidelberg. Springer Berlin Heidelberg. [54] Cortes, C. and Vapnik, V. (1995). Support-vector networks. Mach. Learn., 20(3):273–297. [55] Cristani, M. and Murino, V. (2018). Chapter 10 - person re-identification. In Chellappa, R. and Theodoridis, S., editors, Academic Press Library in Signal Processing, Volume 6, pages 365 – 394. Academic Press. [56] Cronen-Townsend, S.; Zhou, Y.; and Croft, W. B. (2002). Predicting query performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’02, page 299–306. Bibliography 227 [57] Dabov, K.; Foi, A.; Katkovnik, V.; and Egiazarian, K. (2007). Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8):2080–2095. [58] Dai, J.; Zhang, P.; Lu, H.; and Wang, H. (2018). Video person re-identification by temporal residual learning. IEEE Transactions on Image Processing, PP. [59] Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 886–893 vol. 1. [60] Datta, R.; Joshi, D.; Li, J.; and Wang, J. Z. (2008). Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2):5:1–5:60. [61] Datta, S.; Ganguly, D.; Greene, D.; and Mitra, M. (2022). Deep-qpp: A pairwise interaction-based deep learning model for supervised query performance prediction. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, WSDM ’22, page 201–209, New York, NY, USA. Association for Computing Machinery. [62] De Almeida, L. B.; Pereira-Ferrero, V. H.; Valem, L. P.; Almeida, J.; and Pedronette, D. C. G. (2021). Representation learning for image retrieval through 3d cnn and manifold ranking. In 2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 417–424. [63] De Almeida, L. B.; Valem, L. P.; and Pedronette, D. C. G. (2022). Graph convolutional networks and manifold ranking for multimodal video retrieval. In 2022 IEEE International Conference on Image Processing (ICIP), pages 2811–2815. [64] De Fernando, F. A.; Pedronette, D. C. G.; de Sousa, G. J.; Valem, L. P.; and Guilherme, I. R.