UNIVERSIDADE ESTADUAL PAULISTA “JÚLIO DE MESQUITA FILHO” Câmpus de Rio Claro Programa de Pós-Graduação em Ciência da Computação Bionda Rozin Machine Learning and Information Retrieval Techniques for Time Series Analysis Rio Claro - SP 2024 UNIVERSIDADE ESTADUAL PAULISTA “Júlio de Mesquita Filho” Instituto de Geociências e Ciências Exatas Câmpus de Rio Claro Bionda Rozin Machine Learning and Information Retrieval Techniques for Time Series Analysis Dissertação de Mestrado apresentada ao Instituto de Geociências e Ciências Exatas do Câmpus de Rio Claro, da Universidade Estadual Paulista “Júlio de Mesquita Filho”, como parte dos requisitos para obtenção do título de Mestre em Ciência da Computação. Orientador: Prof. Dr. Daniel Carlos Guimarães Pedronette Rio Claro - SP 2024 R893m Rozin, Bionda Machine learning and information retrieval techniques for time series analysis / Bionda Rozin. -- Rio Claro, 2024 134 p. : il., tabs. Dissertação (mestrado) - Universidade Estadual Paulista (UNESP), Instituto de Geociências e Ciências Exatas, Rio Claro Orientador: Daniel Carlos Guimarães Pedronette 1. Aprendizado de Máquina. 2. Recuperação da Informação. 3. Extração de Características. 4. Ranqueamento. 5. Analise de Series Temporais. I. Título. Sistema de geração automática de fichas catalográficas da Unesp. Dados fornecidos pelo autor(a). UNIVERSIDADE ESTADUAL PAULISTA “Júlio de Mesquita Filho” Instituto de Geociências e Ciências Exatas Câmpus de Rio Claro Bionda Rozin Machine Learning and Information Retrieval Techniques for Time Series Analysis Dissertação de Mestrado apresentada ao Instituto de Geociências e Ciências Exatas do Câmpus de Rio Claro, da Universidade Estadual Paulista “Júlio de Mesquita Filho”, como parte dos requisitos para obtenção do título de Mestre em Ciência da Computação. Comissão Examinadora • Prof. Dr. Daniel Carlos Guimarães Pedronette (Orientador) Instituto de Geociências e Ciências Exatas (IGCE) Universidade Estadual Paulista - UNESP • Prof. Dr. Diego Furtado Silva Instituto de Ciências Matemáticas e de Computação (ICMC) Universidade de São Paulo - USP • Prof. Dr. Lucas C. Ribas Departamento de Ciências da Computação e Estatística (DCCE) Universidade Estadual Paulista - UNESP Conceito: Aprovada. Rio Claro (SP), 04 de setembro de 2024. Acknowledgements I am grateful for my life, health, and resilience. I thank my parents and those who stand close to me, always encouraging me always to strive for more. I am extremely grateful to my advisor, Professor Daniel Pedronette, for all the guidance and support. I also extend my gratitude to Professor Ricardo da Silva Torres for agreeing to be my advisor during my exchange in a foreign country. Thanks to Emilio Bergamin Junior, Professor Fabricio Aparecido Breve, and Professor Felipe Arruda Moura, for the collaborative work. I appreciate the support from São Paulo State University (UNESP), professors and colleagues from the research group. I am also thankful to the Wageningen University & Research, for welcoming me as an exchange student. Grants #2022/01359-1 and #2023/08087-0, São Paulo Research Foundation (FAPESP) Resumo Considerando o vasto domínio de aplicações de dados temporais, como o setor médico, agrícola, financeiro e científico, por exemplo, exige-se cada vez mais a análise e processamento desse tipo de dado. Tarefas como recuperação da informação, classificação e agrupamento são cruciais para analisar séries temporais em diferentes contextos e com diferentes objetivos. Recuperação da informação aplicadas em conjuntos de séries temporais permitem a identificação de padrões e ranqueamento dos dados conforme a sua semelhança, enquanto a classificação rotula séries temporais com base em um conjunto de treinamento, e tarefas de clustering agrupam séries temporais com base em suas similaridades. Ainda, há a classificação semi-supervisionada, que considera ambos os dados rotulados e não-rotulados para classificar os dados. No geral, as tarefas de aprendizado de máquina e recuperação da informação são extremamente dependentes de uma boa representação computacional dos dados, gerando resultados mais eficazes e conclusões mais assertivas em relação à tarefa executada. Neste cenário, um dos desafios é obter uma boa representação computacional das séries temporais. Além disso, medidas de similaridade geralmente consideram apenas a similaridade par a par, desconsiderando informações importantes presentes na vizinhança dos itens analisados, no conjunto de dados. O objetivo dessa pesquisa é aplicar tecnicas de aprendizado de maquina e recuperação da informação para obter resultados efetivos em análises de séries temporais. Quatro diferentes métodos são empregados e diferentes extratores de características são avaliados em todas tarefas. Primeiro, um estudo comparativo de representação e ranqueamento de séries temporais univariadas por meio de aprendizado contextual de distância baseado em ranqueamento é conduzido em 10 conjuntos de dados diferentes, levando a ganhos de mAP de até 31,78%. Dando sequência a esta linha de pesquisa, propomos a análise de séries temporais multivariada processando cada dimensão da série individualmente e utilizando métodos de agregação contextual de ranques para mesclar resultados e obter uma representação de similaridade utilizada para recuperação e classificação, obtendo resultados competitivos a dois métodos do estado da arte. Também é proposto um arcabouço baseado em agrupamento para análise de dados baseada na codificação de gráficos temporais, onde os dados são divididos usando critérios de segmentação temporal, e resultados altamente interpretativos são alcançados neste arcabouço quando aplicado à análise de posse de bola em partidas de futebol. Por último, é proposta uma classificação semi-supervisionada de séries temporais univariadas utilizando métodos de representação por imagem e propagação de rótulos, alcançando resultados semelhantes à classificação supervisionada. Palavras-chave: Classificação; Recuperação da Informação; Agrupamento; Aprendizado Semi-Supervisionado; Aprendizado de Maquina; Extração de Características; Ranqueamento; Analise de Series Temporais. Abstract Due to the great applicability of time series in diverse scenarios, such as medicine, agriculture, economics, and science, the analysis and processing of this kind of data is demanding. Tools such as information retrieval, classification, and clustering are crucial for analyzing time series in different contexts and with different objectives. Information retrieval tasks in time series data can identify patterns and rank data by similarity. At the same time, classification can label time series based on a training set, and clustering can group time series based on their similarities. Also, semi-supervised classification considers both labeled and unlabeled data to perform classification. In general, Machine Learning and Information Retrieval tasks are extremely dependent on a good computational representation of data, generating more effective results and assertive conclusions about the performed task. In this scenario, one of the main challenges is to obtain good features from Time Series. Also, similarity metrics usually consider only pairwise relations, not considering important information in the neighborhood of the analyzed items in the dataset. The objective of this research is to apply machine learning and information retrieval techniques for obtaining effective results in time series analysis. Four different methods are employed, and different feature extractors are evaluated in all tasks. First, a comparative study of univariate time series representation and ranking through contextual ranked-based distance learning is conducted in 10 different datasets, leading to mAP gains up to 31.78%. Giving sequence to this research line, we propose multivariate time series analysis by processing each dimension of the series individually and using contextual rank aggregation methods to merge results and obtain a similarity representation used for retrieval and classification, obtaining competitive results to two SOTA methods. A clustering-based framework for data analysis based on temporal graph encoding is also proposed, where data is split using time segmentation criteria, and highly interpretative results are reached in this framework when applied to ball possession analysis in football matches. Last, semi-supervised classification of univariate time series using imaging methods and label propagation is proposed, reaching similar results to supervised classification. Keywords: Classification; Information Retrieval; Clustering; Semi-Supervised Learning; Machine Learning; Feature Extraction; Ranking; Time Series Analysis. List of Figures Figure 1.1 – Overview of goals and contributions of this work. . . . . . . . . . . . . 22 Figure 1.2 – Organization of the work . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Figure 2.1 – Univariate and Multivariate Time Series . . . . . . . . . . . . . . . . . 25 Figure 2.2 – Comparison between the Euclidean Distance alignment and DTW alignment. Extracted from [36] . . . . . . . . . . . . . . . . . . . . . . 27 Figure 2.3 – Typical architecture of a content-based time series retrieval system. Inspired in the content-based image retrieval system presented in [191] 28 Figure 2.4 – Feature extraction and distance calculation between two time series . . 30 Figure 2.5 – Lines xs, ys and angle α for the point pi (i = 100 and 0 < kb < 50). . . 32 Figure 2.6 – Time series represented by DFT. Extracted from [151] . . . . . . . . . 33 Figure 2.7 – Time series and its DTW representation. Extracted from [123] . . . . . 33 Figure 2.8 – Weights of the most representative kernels. Extracted from [152] . . . . 34 Figure 2.9 – Time series and its respective image representations . . . . . . . . . . . 35 Figure 2.10–Comparison between Euclidean Distance and Reciprocal kNN Graph+CCs results in the dataset TwoMoons . . . . . . . . . . . . . . 39 Figure 4.1 – Pipeline employed in the comparative study. . . . . . . . . . . . . . . . 60 Figure 4.2 – Visual ranking results on Rock [14] dataset for DWT-Detail representations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Figure 4.3 – Visual ranking results on Rock [14] dataset for BAS representations. . . 68 Figure 4.4 – Visual ranking results on Rock [14] dataset when the time series raw values are used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Figure 4.5 – Visual ranking results on Rock [14] dataset for RP-ViT representations. 69 Figure 4.6 – Measurements for the Beef dataset, considering the ROCKET feature extractor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Figure 4.7 – Measurements for GunPoint dataset, considering the RP+ResNet feature extractor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Figure 4.8 – Comparision between datasets for GAF+ViT feature extractor. . . . . 71 Figure 5.1 – Proposal for Multivariate Time Series retrieval and classification . . . . 74 Figure 5.2 – Accuracy varying over the value of k. . . . . . . . . . . . . . . . . . . . 77 Figure 6.1 – Synetsis of our clustering-based framework for data analysis . . . . . . 81 Figure 6.2 – Generic pipeline for data characterization based graph visual rhythm representations and transfer learning procedures. . . . . . . . . . . . . 82 Figure 6.3 – Graph changes over time. Red and Blue graphs correspond to different teams, while the yellow point represents the ball. . . . . . . . . . . . . 83 Figure 6.4 – Example of Graph Visual Rhythm representation . . . . . . . . . . . . 85 Figure 6.5 – Statistics for clusters created based on the configuration, including the use of Betweenness Centrality, DINO V2, UMAP, and HDBSCAN. . . 97 Figure 6.6 – Statistics for clusters created based on the configuration, including using Local Efficiency, ResNet-152, UMAP, and HDBSCAN. . . . . . . . . . 98 Figure 6.7 – Statistics for clusters created based on the configuration, including the use of Average Path Lenght, ResNet-152, UMAP, and Affinity Propagation. 99 Figure 7.1 – Proposed methodology for time series semi-supervised classification . . 101 Figure 7.2 – Results for the ElectricDevices dataset as a function of rl. Supervised baselines were obtained from 8926 labeled points, corresponding to rl ≈ 0.54. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Figure 7.3 – Results for the CBF dataset as functions of the number of nearest neighbors k (up) and the rate of labeled data rl (down). Supervised baselines were obtained with initially 30 labeled points, corresponding to rl ≈ 0.03. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Figure 7.4 – Results for the ECG5000 dataset as a function of rl. Supervised baselines were obtained with initially 500 labeled points, corresponding to rl = 0.1.106 Figure 7.5 – Results for the Yoga dataset as a function of rl. Supervised baselines were obtained with initially 300 labeled points, corresponding to rl ≈ 0.09.107 Figure 7.6 – Classification results as a function of k for the Yoga (up) and Electric Devices (down) datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Figure 7.7 – Kernel density estimation of pairwise distances (left) and coefficient of variation of pairwise similarities as a function of k (right) for CBF, ECG5000, and Yoga. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 List of Tables Table 4.1 – mAP values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Table 4.2 – P@5 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Table 4.3 – P@10 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Table 4.4 – R@5 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Table 4.5 – R@10 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Table 5.1 – Retrieval results: P@10 . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Table 5.2 – Retrieval results: mAP . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Table 5.3 – Accuracy comparison results . . . . . . . . . . . . . . . . . . . . . . . . 78 Table 6.1 – Results for the Average Path Length graph measurement. . . . . . . . . 92 Table 6.2 – Results for the Betweenness Centrality graph measurement. . . . . . . . 92 Table 6.3 – Results for the Eccentricity graph measurement. . . . . . . . . . . . . . 93 Table 6.4 – Results for the Local Efficiency graph measurement. . . . . . . . . . . . 93 Table 6.5 – Results for the Vulnerability graph measurement. . . . . . . . . . . . . . 94 List of Abbreviations and Acronyms ACC Color Autocorrelogram AF AtrialFibrillation AFFNet Adaptive Feature Fusion Network AI Artificial Intelligence AMI Adjusted Mutual Information AP Affinity Propagation APCA Adaptive Piecewise Constant Approximation ARIMA Autoregressive integrated moving average AWR ArticularyWordRecognition BAS Beam Angle Statistics BFS-Tree Breadth-First Search Tree of Ranking References BIC Border/Interior Classification BM BasicMotions BOSS Bag-of-SFA-Symbols BOVW Bag of Visual Word Bow Bag-of-Words CBR Content-Based Information Retrieval CBTSR Content-based time series retrieval CFD Contour Features Descriptor CHEB Chebyshev Polynomials CHI Calisnki-Harabasz Index CLS Classification CNN Convolutional Neural Network conDetSEC constrained Deep embedding time SEries Clustering CPU Central Processing Unit CVA Change vector analysis DB DBSCAN DBA DTW Barycenter Averaging DBI Davies-Bouldin Index DBSCAN Density-Based Spatial Clustering of Applications with Noise DCT Discrete Cosine Transform Deep r-RSJBE Deep r-th root of Rank Supervised Joint Binary Embedding DFT Discrete Fourier Transform DHN-RAD Deep Hashing Network for Retrieval-based Anomaly Detection Dim. Red. Dimensionality Reduction Method DINO DIrNification with NOisy labels DISSIM Dissimilarity metric DNN Deep neural network DTW Dynamic Time Warping DUBCNs Deep Unsupervised Binary Coding Networks DV2 DINO V2 DWT Discrete Wavelet Transform ECG Electrocardiogram EDR Edit Distance on Real sequence EEG Electroencephalogram EffNV2 EfficientNet V2 EO-BoW Entropy-optimized BoW ERP Edit Distance with Real Penalty ESN Echo state network FCN Fully Convolutional Network FF Fusion Features FRF Functional Relation Field FrOKShape Fractional Order Shape-based k-cluster GAN Generative Adversarial Networks GAF Gramian Angular Field GRU Gated Recurrent Units GRF Gaussian Random Fields GVR Graph Visual Rhythm HB Heartbeat HC Hierarchical clustering HDB HDBSCAN HDBSCAN Hierarchical Density-Based Spatial Clustering of Applications with Noise HP Hodrick Prescott IPLA Indexable Piecewise Linear Approximation IR Information Retrieval KL Kullback-Leibler kNN k-Nearest Neighbors LAS Local Activity Spectrum LBP Local Binary Pattern LCSS Longest Common Subsequence LHRR Log-based Hypergraph of Ranking References LSH Locality-sensitive hashing LSTM Long Short-Term Memory MALSTM-FCN Multivariate Attention Long Short-Term Memory Fully Convolutional Network mAP Mean Average Precision MCFGNN Multichannel fusion graph neural network MDCNet Mode Decomposition and 2D Convolutional Network MESAMN Multiscale Echo Self-Attention Memory Network MF-Net multi-feature-based network MHAP Multivariate Highly Activated Period MI MotorImagery MLP Multilayer Perceptron MTF Markov Transition Field MTRL deep multi-task representation learning method MTS Multivariate Time Series MTSC Multivariate Time Series Classification N NATOPS Net Network NILM Non-Intrusive Load Monitoring NN Neural Network NSNP nonlinear spiking neural P NSST on-subsampled shearlet transform NTM Neural Turing Machines P@k Precision@k PAA Piecewise Aggregate Approximation PCA Principal Component Analysis PCF Polynomial curve fitting PD PenDigits PFD_DTW Polynomial function derivative features-based dynamic time warping PSEUDo Interactive Pattern Search in Multivariate Time Series with Locality-Sensitive Hashing and Relevance Feedback R@k Recall@k RDPAC Rank Diffusion Process with Assured Convergence RegEx Regular Expression ResNet Residual Network RFE Rank Flow Embedding RMSE Root mean squared error RN152 CNN ResNet-152 RobDecomp Robust series decomposition block RobTF Robust trend forecasting block ROCKET RandOm Convolutional KErnel Transform RP Recurrence Plot SAB Seasonal component adjustment block SampEn Sample entropy SAX Symbolic Aggregate approXimation SCARNet Stacked Convolution sequence AutoRegressive encoding Network SCV Spectral change vector Sil Silhouette Score SOTA State-of-the-art SpADe Spatial Assembling Distance SSL Semi-supervised learning Swale Sequence Weighted Alignment model SWJ StandWalkJump TF-IDF Term Frequency — Inverse Document Frequency TQuEST Threshold Queries T-encoder Transformer-based T-decoder Transformer-based decoder TS Time Series TCCT Tightly-coupled convolutional Transformer t-SNE t-distributed Stochastic Neighbor Embedding UDLF Unsupervised Distance Learning Framework UMAP Uniform Manifold Approximation and Projection UTBCN Unsupervised Transformer-based Binary Coding Networks VGG Visual Geometry Group ViT Vision Transformers List of Symbols D(x,y) Pairwise distance measure between two vectors x and y i Generic counter j Generic counter k Generic neighborhood parameter kc Number of clusters cc Cluster center T Univariate time series M Multivariate time series Rm×n Distance matrix of two time series of sizes m and n, respectively d Number of dimensions kr Number of convolutional kernels in ROCKET algorithm I Unit line vector K Neighborhood parameter P Set of points in a contour α Angle kb Beam angle statistics distance parameter n Number of points or dimensions xs Line segment ys Line segment T̃ Polar coordinate representation of time series W Markov Transition Matrix qbj Quantile bin wi,j Represents the frequency at which a point in quantile qbj is followed by a point in quantile qbi T⃗ Trajectory of time series drp Temporal delay mrp Trajectory dimension corr_preds Correct predictions in a labeled dataset preds Total number of predictions U Cluster V Cluster H(U) Entropy of cluster U MI(U, V ) Mutual Information between clusters U and V E(MI(U, V )) Expected mutual information TP True positives, i.e., number of relevant retrieved items in a ranked list ql Query l ri Represents the relevance (irrelevant = 0 or relevant = 1) of the i-th item in a ranked list Nr Number of relevant items in a collection for a query q z Mean standard deviation of a dataset n_coefs Number of Fourier coefficients to keep in DFT [58] norm_mean If True, center each time series before scaling in DFT [58] norm_std If True, scale each time series to unit variance in DFT [58] wavelet Wavelet family, containing important information such as the wavelet filters coefficients, which are used in DWT mode Signal extrapolation mode L Ranked List size LMULT Multiplier of ranked list size in RDPAC algorithm ti Timestamp G Graph V Set of nodes in a graph s Node in a graph t Node in a graph v Node in a graph d(s, t) Shortest path from node s to node t a Average path length cb(v) Betweenness centrality of node v cbn(v) Normalized betweenness centrality σ(s, t) Count of shortest paths from s to t σ(s, t|v) Count of shortest paths from s to t passing through v e(v) Eccentricity of node v E(v) Global Efficiency of node v deg(v) Degree of of node v Eloc(v) Local Efficiency of node v V (v) Vulnerability of node v G Temporal Graph F Graph Visual Rhythm mic Mean intra-cluster distance mnc Mean nearest-cluster distance for each sample Si Average dissimilarity of cluster i to all other clusters Sj Average dissimilarity of the neighboring cluster of cluster i to all other clusters d(Ci, Cj) Similarity between centroids of clusters i and j DW Dispersion within clusters DB Dispersion between clusters Lt Lower triangular matrix of symmetric distance matrix ||Lt(tk)||F Frobenius norm of Lt for a time stamp tk lij Point in the Lt matrix m Embedding dimension r Tolerance Tm(i) Time series segments B Number of cases where D[Tm(i), Tm(j)](i ̸= j) < r C Number of cases where D[Tm+1(i), Tm+1(j)](i ̸= j) < r p Probability of a result being equal or superior to an observed result C Collection of time series Dt Time series descriptor ϵ Function that extracts a feature vector vTi Feature vector of a time series Ti ρ(i, j) Function that computes the distance between two time series Ti and Tj based on their corresponding feature vectors A Matrix of dimension n× n, where Aij = ρ(i, j) τq Ranked list of time series Tq R Set of ranked lists for each time series in C ρr(i, j,R) More effective distance function used in unsupervised distance learning algorithms Aa Set of distance matrices Ra Set of ranked lists sets fa(Aa,Ra) Contextual rank aggregation function Âc Resulting distance matrix of fa(Aa,Ra) Rc Resulting ranked list based on Âc fa(Ra) Contextual rank aggregation function that does not use the set Aa σ Parameter of Gaussian Similarity wgi,j Gaussian Similarity Hf Cost function Le Number of labeled elements ψi,s Represents the probability of the i−th node of a graph belonging to the s−th class t Iteration marker e upper bound for the difference between consecutive iterations ŷi Unknown label of i−th node of a graph D Dataset rl Percentage of labeled elements Contents 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.2 Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.3 Goals and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2 FOUNDATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.1 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3 Information Representation . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5 Time Series Representation . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5.1 1D-Signal-based Feature Extractors . . . . . . . . . . . . . . . . . . . . . 32 2.5.2 Image-Based Time Series Description . . . . . . . . . . . . . . . . . . . . 34 2.5.3 Deep Learning Methods as Image Feature Extractors . . . . . . . . . . . . 36 2.6 Unsupervised Distance Learning . . . . . . . . . . . . . . . . . . . . . 38 2.6.1 Log-based Hypergraph of Ranking References (LHRR) . . . . . . . . . . . 39 2.6.2 Rank Diffusion Process with Assured Convergence (RDPAC) . . . . . . . . 40 2.6.3 Breadth-First Search (BFS) Tree of Ranking References . . . . . . . . . . . 40 2.6.4 Rank Flow Embedding (RFE) . . . . . . . . . . . . . . . . . . . . . . . . 40 2.7 Univariate Time Series Datasets . . . . . . . . . . . . . . . . . . . . . 40 2.8 Evaluation Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.9 Software and Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . 43 3 RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.1 Univariate Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.1.1 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.1.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.1.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.1.4 Univariate Time Series Prediction . . . . . . . . . . . . . . . . . . . . . . 48 3.2 Multivariate Time Series . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2.1 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2.4 Multivariate Time Series Prediction . . . . . . . . . . . . . . . . . . . . . 53 3.3 Image Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3.1 Images to Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3.2 Time Series to Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4 Literature GAPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4 RE-RANKING AND REPRESENTATIONS FOR TIME SERIES RETRIEVAL: A COMPARATIVE STUDY . . . . . . . . . . . . . . . 58 4.1 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2.1 Implementation Details and Parameters Definition . . . . . . . . . . . . . . 61 4.2.2 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3.1 Retrieval Results – Quantitative Analysis . . . . . . . . . . . . . . . . . . . 62 4.3.2 Retrieval Results – Qualitative Analysis . . . . . . . . . . . . . . . . . . . 67 4.3.3 Retrieval Results – Analysis of Cases of Success and Failure . . . . . . . . . 69 4.4 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5 A RANKED-BASED FRAMEWORK BASED ON MANIFOLD LEARNING FOR MULTIVARIATE TIME SERIES RETRIEVAL AND CLASSIFICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.1 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.2 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.3 Impact of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2.4 Retrieval Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2.5 Classification and Comparison with Other Approaches . . . . . . . . . . . . 78 5.3 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6 FRAMEWORK FOR DATA ANALYSIS BY TEMPORAL GRAPH ENCODING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.1 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.1.1 Temporal Graph Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.1.2 Graph Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.1.3 Graph Visual Rhythms (GVR) . . . . . . . . . . . . . . . . . . . . . . . . 85 6.1.4 Image Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.1.5 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.1.6 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.2 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.2.1 Clustering Quality Measurements . . . . . . . . . . . . . . . . . . . . . . . 88 6.2.2 Domain Specific Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.2.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.3.1 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.3.2 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.4 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 7 SEMI-SUPERVISED TIME SERIES CLASSIFICATION THROUGH IMAGE REPRESENTATIONS . . . . . . . . . . . . . . . . . . . . . 100 7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.2 Semi-Supervised Classification . . . . . . . . . . . . . . . . . . . . . . 102 7.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.3.1 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7.4.1 Classification results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7.4.2 Statistical analysis of pairwise distances and similarities . . . . . . . . . . . 106 7.5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 8.1 Contributions and Research Questions . . . . . . . . . . . . . . . . . 110 8.2 Publications and International Fellowship . . . . . . . . . . . . . . . . 112 8.3 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 19 1 Introduction Analysing time series is not only of paramount importance, nowadays, but also a challenging task. It depends on available labels, time series representation, similarity measures, and results’ interpretability. This work proposes four different approaches that contribute to these directions. This work discusses and presents contributions to the field of time series analysis by using machine learning and information retrieval techniques. This introductory chapter gives an overview of this work and is organized as follows: Section 1.1 discusses the motivations of the research. Section 1.2 presents the challenges and research questions approached during the development of the work. Section 1.3 discusses the goals and contributions of the study. Section 1.4 outlines the structure of this work, including a brief description of each chapter’s content. 1.1 Motivation Time series data finds extensive applications in our daily lives, spanning diverse domains such as medicine, economics, and science. ECGs [200] and EEGs [169], yearly sales [24], sensors [180], weather forecasts [73], bitcoin charts [130], and general finances [223] are just a few examples of how time series data is employed. Formally, a time series refers to a set of numerical values arranged chronologically, with each point directly correlated to preceding points [149]. Considering the broad application in different domains, the development of tools for studying and analyzing time series is of central relevance. Additionally, various facets can be taken into account based on the desired objectives. From a computational perspective, supervised and unsupervised learning algorithms process time series data, utilizing either labeled or unlabeled information, while semi-supervised learning tasks combine both approaches [197]. Semi-supervised learning algorithms stand out when labeled data is scarce, considering that obtaining labels can be expensive and a time-consuming task. Classification and clustering algorithms, for instance, enable the labeling or grouping, respectively, of time series samples, where series with the same label (or group) exhibit similarity, while those with different labels (groups) are distantly related [83, 13]. Such techniques have proven to be effective in detecting cancer in early stages [228], identifying mobility patterns in populations [225], and enhancing poultry welfare through the analysis of chicken behaviors captured by 3-axis sensors [4]. Forecasting tasks involve predicting future values of time series based on historical data, encompassing applications like stock market variations [223] and even ECG predictions [157]. Chapter 1. Introduction 20 Despite the large number of successful applications, the effectiveness of machine learning tasks is directly dependent on the features extracted from the data. The features are intrinsically associated with the data representation in the space, impacting similarity measures. Time series features consider various aspects for data representation, including shape [10], temporal correlation [204], as well as bag-of-words based models [111] and kernel-based models [45]. Data representation and similarity measures also profoundly impact the effectiveness of another critical area: Information Retrieval (IR). IR applications generally involve managing, indexing, and retrieving data based on a given query, which represents an information need [120]. The retrieval process revolves around ranking the most similar elements relative to a given query. Therefore, data representation and similarity measures directly influence the effectiveness of information retrieval tasks, as different features can generate varying similarities for the same query, and different similarity measures can result in distinct ranked lists. Moreover, traditional measures such as Euclidean distance solely consider pair-to-pair comparisons, disregarding neighboring interactions, which could be leveraged to compute improved similarity measures [193]. Information Retrieval has diverse applications in the realm of time series analysis. For instance, it can be employed in plant phenology studies to identify the same species in different locations based on time series obtained from satellite images [7]. Similar techniques can be utilized to study the dispersion of volcanic plume height, which poses risks to airplanes [135]. Additionally, selecting a slice of a time series as a query enables the identification of similar slices, as exemplified in the case of distinguishing open and closed eyes based on EEG signals [227]. Building information retrieval methods capable of reaching high-quality results is a challenging task. An option to consider the existing relationships within a dataset beyond traditional pairwise similarities is the utilization of unsupervised distance learning algorithms after the information retrieval process. These algorithms can consider the relationships within a neighborhood and compute more efficient distances based on such information, enabling the generation of improved ranked lists [193]. Machine Learning and Information Retrieval algorithms are crucial tools for time series analysis, especially when strategies from both areas are combined. Our objectives are to explore the applicability of these methodologies in diverse contexts of time series analysis. Additionally, we aim to assess the impact of different time series representations and contextual similarity measures on these tasks. 1.2 Research Challenges While time series have a wide range of applicability, analyzing it is very challenging. There are diverse research challenges related to exploiting the encoded information in time Chapter 1. Introduction 21 series and how to analyze such data properly. Following, some topics related to this work are discussed: • Information encoded in the time series: There are diverse characteristics intrinsic to time series, such as shapes, frequency, trend, seasonality, among others. How can we extract meaningful features for a required task? Which features are more suitable for a machine learning or information retrieval task? • Impact of distance measurements: Diverse tasks depend on a similarity measure which allows to compare different time series. How do we find distance measurements that translate effectively the relationship between the data points? • Explore 2D representations of time series: There are diverse methods to represent time series as images [204, 53]. How the information contained in the images can be explored? • Explore contextual information for effective time series information retrieval: How can we explore more global relationships between time series to perform information retrieval? • Exploit information of each dimension of a multivariate time series: Multivariate time series are composed of n−dimensional data points and each time series dimension carries its own information. How to effectively explore information encoded in the dimensions of a multivariate series? • Temporal data analysis with lack of ground truth: Some time series datasets do not have labels, and clustering analysis emerges in these scenarios. How do we properly interpret the clustering results? • Effective use of labeled and non-labeled time series data: How to properly explore supervised and unsupervised information in time series data? 1.3 Goals and Contributions Our hypothesis is that information retrieval and machine learning techniques can be explored for effective time series analysis, especially under conditions of scarcity or absence of labeled data. Given the direct time series applications in real-world scenarios, our main goal is to obtain effective results on time series analysis by applying unsupervised, supervised, and semi-supervised learning approaches for time series clustering, classification, and similarity search. Also, our secondary goal is to study the impact of different time series feature extractors in the performed tasks. Chapter 1. Introduction 22 Efficiency is also important in machine learning and information retrieval tasks, but the experimental evaluation performed in this work only focused on effectiveness. Works such as [146, 69, 145, 195] discuss algorithmic complexity and efficiency in re-ranking tasks. Figure 1.1 presents the goals and contributions and how they relate to proposed approaches, considering different supervision strategies. Time Series Analysis Chapter 6 - Framework for Data Analysis by Temporal Graph Encoding Chapter 5 - A Ranked-Based Framework based on Manifold Learning for Multivariate Time Series Retrieval and Classification Chapter 4 - Re-Ranking and Representations for Time Series Retrieval: A Comparative Study Chapter 7 - Semi-Supervised Time Series Classification through Image Representations Investigate time series feature extractors Contextual rank aggregation methods for multivariate time series analysis Contextual rank-based distance learning methods for univariate time series retrieval Unsupervised distance learning methods as pre- processing step for classification Semi-supervised approaches for time series classification Supervised Unsupervised Semi- Supervised Comparative study on the impact of feature extractors and unsupervised distance learning methods for univariate time series retrieval. A Framework for data analysis based on time segmentation criteria, resulting in multivariate time series. Series are represented as GVR images and feature extraction, dimensionality reduction, clustering, and results analysis are performed. Application of imaging methods and deep learning approaches for time series representation and use of label propagation for semi-supervised classification. Unsupervised framework for data analysis using temporal graph encoding Supervision Types Goals and Contributions Chapters Approach Overview Task: Clustering Retrieval Classification Retrieval and Classification Use of feature extractors and rank aggregation techniques for multivariate time series retrieval and classification. Figure 1.1 – Overview of goals and contributions of this work. The proposed approaches directly relate to the research challenges discussed before. Following, an overview of each goal and contribution is given: • Investigate time series feature extractors: Diverse feature extractors were applied in time series for different tasks, i.e. classification, clustering, and information retrieval. Both 1D and imaging methods were evaluated. Features were extracted from the images using deep learning approaches. • Employ contextual rank-based distance learning methods for univariate time series retrieval: Pairwise distance measurements often ignore contextual information available in the dataset. Four unsupervised distance learning methods were applied in time series datasets for computing more global distance measurements between points in a dataset, by considering neighborhood relationships, and performing re-ranking. • Employ contextual rank aggregation methods for multivariate time series analysis: To fully explore information present in the dimensions of multivariate time Chapter 1. Introduction 23 series, we process each dimension individually and use rank aggregation methods to obtain a final result for multivariate time series retrieval. • Use unsupervised distance learning methods as a pre-processing step for classification: This topic is directly correlated to the previous one. The resulting similarity measure defined by the rank aggregation step is considered for performing a kNN classification in a multivariate time series dataset. • Proposal of an unsupervised framework for data analysis using temporal graph encoding: We propose a novel clustering-based framework based on time segmentation criteria, encoding information as multivariate time series, using domain-specific variables to evaluate and enhance the interpretability of results. This framework was validated for ball possession analysis in football matches. • Investigate the application of semi-supervised approaches for time series classification: To explore the information available in both labeled and unlabeled time series datasets, we employ the label propagation algorithm for time series semi-supervised classification. 1.4 Organization This work is organized according to the main contributions obtained in our research, which were published or submitted to conferences and journals. Below, the contents of each chapter are briefly described, indicating the associated publication or submission: • Chapter 2 - Foundation: defines the theoretical foundation of the work, presenting main concepts, and formal definitions; • Chapter 3 - Related Work: presents the related work, relevant to the topics approached in this work; • Chapter 4 - Re-Ranking and Representations for Time Series Retrieval: A Comparative Study: presents the comparative study of univariate time series retrieval approaches, considering different data representation and contextual re-ranking methods. The content of this chapter was submitted to a journal; • Chapter 5 - A Ranked-Based Framework based on Manifold Learning for Multivariate Time Series Retrieval and Classification: discusses a framework for multivariate time series retrieval and classification. The content of this chapter was submitted to a journal; • Chapter 6 - Framework for Data Analysis by Temporal Graph Encoding: presents a framework for temporal data analysis by considering time segmentation Chapter 1. Introduction 24 criteria. Experiments were conducted considering ball possessions from football matches. The content of this chapter is in the final process of writing and it is going to be submitted to a journal; • Chapter 7 - Semi-Supervised Time Series Classification through Image Representations: approaches the developed work about semi-supervised time series classification using image representations. The content of this chapter is available in a paper published in the proceedings of the International Conference on Computational Science and Its Applications (ICCSA 2023) [163]; • Chapter 8 - Conclusions: presents the relationships between the research questions and contributions and discusses future works. Figure 1.2 presents the organization of the work. Time Series Dataset Feature Extractors Imaging Methods Provided to Deep Learning Models Provided to Features Extract Extract Distance Measures Distance Matrices Used to compute Dimensionality Reduction Raw data Time Segmentation Criteria Graph Representation Graph Measurements Split of dimensions in Multivariate Time Series Is performed Is performed Clustering Used for Used to Calculate Generates Chapter 6 - Framework for Data Analysis by Temporal Graph Encoding Approached in Approached in Journal In Submission to Graph Represented as Label Propagation Used in Semi-Supervised Classification Performs Chapter 7 - Semi- Supervised Time Series Classification through Image Representations Approached in ICCSA 2023 Published Contextual Rank- based Distance Learning Contextual Rank Aggregation Consider New Distance Matrices Compute Unsupervised Re- ranking Exploits New Ranked Lists Ranked Lists Are computed Are computed Are correlated Chapter 4 - Re-Ranking and Representations for Time Series Retrieval: A Comparative Study Journal Approached in Submitted to Fusion-based Ranking Generates kNN Classification Used in Are correlated Chapter 5 - A Ranked-Based Framework based on Manifold Learning for Multivariate Time Series Retrieval and Classification Journal Submitted to Approached in Approached in Chapter Conference Journal Main Task Unsupervised Distance Learning Distance Matrix Ranked ListTerms, Concepts and Other Tasks Measures Dimension Split Task Block Figure 1.2 – Organization of the work 25 2 Foundation In this chapter, the main theoretical aspects of this work are defined. In Section 2.1, the overview of the time series is presented. Section 2.2 presents the principal concepts related to information retrieval. Section 2.3 describes the information representation concepts, while Section 2.4 presents the main concepts about machine learning. Section 2.5 presents examples of time series descriptors. In Section 2.6, the core concepts of Unsupervised Distance Learning are introduced. In Section 2.7 are described some univariate time series datasets, the primary evaluation metrics for information retrieval and classification tasks are defined in Section 2.8, and the leading software and frameworks utilized in this work are described in Section 2.9. 2.1 Time Series Time series are data that describe the evolution of one or more variables over time, considering equidistant periods [55]. Formally, a time series of size n can be defined as: Ts = (t1, t2, ..., tn), ti ∈ Rd, d ∈ N∗ (2.1) Time series composed of a single observed variable over time, i.e., d = 1, are called univariate [55], and an example to consider is a patient’s heartbeat. Figure 2.1(a) illustrates a univariate time series. Similarly, multivariate time series are composed of variables that exhibit temporal dependence [26], i.e. d > 1, such as data on gas concentrations in the atmosphere measured every hour. Figure 2.1(b) illustrates a multivariate time series. For instance, we will define a univariate time series as T and a multivariate time series as M . (a) Univariate Time Series (b) Multivariate Time series. Extracted from [178] Figure 2.1 – Univariate and Multivariate Time Series Chapter 2. Foundation 26 Considering the definitions, time series are complex structures capable of encoding a wealth of information. As a result, time series have a wide range of applicability, including the medical field [200], industrial sector [57], financial domain [130], among others. Therefore, analyzing and representing these structures are tasks of paramount importance and quite challenging. Also, aligning two time series is a difficult task. The alignment provided by traditional distance measures, such as the Euclidean Distance, usually does not provide the optimum alignment. In this context, the Dynamic Time Warping (DTW) metric calculates the optimum non-linear alignment between two time series of any size [175]. The value returned by the DTW function represents the distance between two-time series with better alignment. Formally, let the time series Tx = (tx1 , tx2 , ..., txm) and Ty = (ty1 , ty2 , ..., tyn) and Rm×n a matrix containing the distances D(Tx, Ty)i,j of the sequences Tx and Ty, where D is, usually, the Euclidean Distance. The objective is to find two sequences a and b, both of size l ≥ max(m,n), in which the index of combination a(i) corresponds to the time series Tx and b(i) corresponds to the time series Ty, where the total cost of combinations∑l i=1 D(Tx, Ty)a(i),b(i) are minimized. The alignment (a, b) matches the following conditions: 1. a(1) = b(1) = 1; 2. a(l) = m, b(l) = n; 3. (a(i+ 1), b(i+ 1)) − (a(i), b(i)) ∈ {(1, 0), (1, 1), (0, 1)}. In Figure 2.2, it is possible to observe the comparison between the Euclidean Distance’s linear alignment and the DTW algorithm’s non-linear alignment. 2.2 Information Retrieval Information retrieval is an area of computer science that deals with the representation, storage, organization, and access of items from diverse datasets, such as documents and web pages, for example [158]. An information retrieval problem involves a search task defined according to pre-selected criteria, which should return relevant results, where the organization and representation of the selected items facilitate user access. The significant growth in data generation, such as audio, images, 3D models, and other objects, makes searching a challenging problem [201]. Unlike text-based search, which has mature results [158, 119], conducting textual searches for multimedia data can be unsatisfactory, as textual descriptions may not capture all the characteristics of the given data [201] and may potentially introduce ambiguities. In this context, Content-Based Retrieval (CBR) systems enable comparisons between similar data based solely on their content [201]. The data is computationally Chapter 2. Foundation 27 Figure 2.2 – Comparison between the Euclidean Distance alignment and DTW alignment. Extracted from [36] represented as feature vectors, considering approaches such as texture, color, and shape for images [107], Zero Crossing Rate, and Short-time energy, for example, for audio [124], among other features for different types of media. In this way, extracting feature vectors for an entire dataset and enabling comparisons with any given query is possible. For each query, a result list is calculated containing the distances, in ascending order, between the elements of the dataset and the query [107]. Figure 2.3 illustrates a typical architecture of a CBR general model for time series application adapted from [191]. This architecture consists of (i) an interface, (ii) a query processing module, and (iii) a dataset. In the interface, user interaction occurs, where data is inputted, and query results are returned. Data characteristics are extracted in the query processing module, and ranking tasks are performed based on the obtained features. Lastly, the data and their corresponding feature vectors are stored in the dataset to avoid recalculations. The distances between the obtained feature vectors can be calculated in different ways. Let x = (x1, x2, ..., xd), xi ∈ R and y = (y1, y2, ..., yd), yi ∈ R, where d represents the number of dimensions, be two feature vectors that represent times series. The following Chapter 2. Foundation 28 Time Series Database Feature Vectors Time Series Interface Query Specification VisualizationData Insertion Query-processing module Ranking Distance Computation Feature Vector Extraction Query Pattern Similar Time Series Figure 2.3 – Typical architecture of a content-based time series retrieval system. Inspired in the content-based image retrieval system presented in [191] distance metrics can be used: • Euclidean: D(x,y) = √∑d i=1(xi − yi)2; • Chebysev: D(x,y) = maxd i=1|xi − yi|; • Manhattan: D(x,y) = ∑d i=1 |xi − yi|; • Minkowsky: D(x,y) = [∑d i=1 |xi − yi|p]1/p; • Canberra: D(x,y) = ∑d i=1 |xi−yi| |xi|+|yi| ; • Cosine: D(x,y) = 1 − x·y ||x||||y|| . Chapter 2. Foundation 29 In general, the Euclidean distance is widely used to calculate distances between elements [139]. However, despite the importance of distance metrics, the quality of the retrieved information is primarily influenced by the choice of the used representations (features). 2.3 Information Representation Describing a specific type of data is a crucial step in machine learning tasks, as the representation of the dataset directly influences the effectiveness of the performed tasks [132]. D-dimensional feature vectors represent each element in a dataset. Methods used for data description depend on the type of information encoded in the data. Various algorithms can be considered for feature extraction in the context of images. Shape extractors, such as Beam Angle Statistics [9], Contour Features Descriptor (CFD) [141], and Aspect Shape Context [113], allow for the study of different contours present in the image. Color-based methods, such as Color Autocorrelogram (ACC) [76] and Border/Interior Classification (BIC) [182], take into account color histograms of the images. Texture-based methods consider elements such as bricks or velvet, which produce variations in surface appearance. Examples include Local Binary Patterns (LBP) [133] and Local Activity Spectrum (LAS) [189]. Other methods, such as Bag of Visual Words (BOVW) [47] and deep learning-based models [89], can be mentioned in the scope of image feature extraction. In the case of text, representations such as Bag-of-Words, TF-IDF [158, 119], and again, deep learning-based approaches [110] are some examples found in the literature. Time series representation will be approached in Section 2.5. Given the presented overview, it is evident that various characteristics can be explored in multimedia datasets. On the other hand, different feature extractors can be used, leading to diverse results in machine learning tasks. Figure 2.4 illustrates extracting features from multimedia content and calculating the distance measure between the obtained features, resulting in a ranking score. The chosen representation for a given data directly influences the calculation of distance measures and, consequently, the ranking of similarities among items in the dataset. Therefore, it is crucial to select features carefully, considering the dataset’s characteristics and the executed tasks. 2.4 Machine Learning Machine learning is an area of artificial intelligence (AI) that involves learning algorithms capable of performing tasks or predictions without being explicitly set, only considering the characteristics of a provided dataset, finding patterns, and relevant information for a specific task [98], e.g., pattern recognition [224], sales prediction [211], Chapter 2. Foundation 30 Multimedia Content Feature Extraction Distance Calculation 0.3464 Resulting Score x y D(x,y) Figure 2.4 – Feature extraction and distance calculation between two time series simultaneous machine translation [77], among others. There are diverse categories of machine learning algorithms, and their choice depends on the problem to be solved and data availability. This work explored Supervised, Unsupervised, and Semi-Supervised Learning, which will be explained below. • Supervised Learning Supervised Learning algorithms are trained on labeled data, i.e., data with class information, and make decisions based on unlabeled data, i.e., unknown data [98]. The patterns in labeled data are considered to build a model capable of making predictions about the unknown data. The main supervised learning tasks are classification, in which the main goal is predicting the class of the elements, such as in image recognition [72], and regression, where the objective is to predict continuous values, such as forecasting about sales [54]. Examples of supervised algorithms are: Linear Regression [62], Support Vector Machines [20], Convolutional Neural Networks [72], Decision Trees [185], among others. Despite their popularity and highly accurate results, supervised tasks require large amounts of quality labeled data and elevated training time, making their execution impracticable in many situations. • Unsupervised Learning Chapter 2. Foundation 31 Unlike supervised algorithms, unsupervised learning tasks do not require labels, information about data, or user intervention, using only intrinsic data characteristics, as similarities between the objects [97], to perform the task. We can cite as examples of unsupervised learning tasks the dimensionality reduction, simplifying data while keeping its characteristics, and clustering, where data is grouped based on the similarities between points [97], among others. Examples of unsupervised learning algorithms are Principal Component Analysis (PCA) [90], for dimensionality reduction, and K-Means [116], for clustering. There are unsupervised distance learning algorithms in information retrieval tasks, exploring intrinsic characteristics and contextual information on datasets to find more global similarity measures, in contrast to the traditional pairwise measurements [214]. Unsupervised learning is an excellent tool for unlabeled data but lacks a ground truth for performance evaluation. • Semi-supervised Learning Semi-supervised learning combines concepts of both Supervised and Unsupervised Learning [29, 229], utilizing information present in the labeled and unlabeled data. An example is the semi-supervised classification, where the classifier is trained in a small sample of labeled data, and the information contained in the unlabeled data is also considered for performing the task. The main advantage of semi-supervised methods is the capability to perform with a scarcity of labeled data. Semi-supervised methods are divided into transductive and inductive methods. Transductive algorithms do not produce a model, considering only samples found during the trainment [198] for labeling data. When new data is added, it is necessary to perform the trainment again to consider the new samples. Otherwise, as traditional supervised methods, inductive algorithms build a model during the training, utilized for classifying new elements [198]. As examples of semi-supervised algorithms, we can cite Label Propagation [230] and Semi-Supervised Support Vector Machine (S3VM) [88]. Diverse assumptions are made about data distribution [29], being considered individually or in groups, depending on the semi-supervised method [198]. The assumptions are: • Clustering Assumption: Data points belonging to the same cluster must belong to the same class; • Smoothness Assumption: Samples close to each other in the feature space tend to belong to the same class, and the decision boundary must be placed in regions with low data density; • Manifold Assumption: Data must belong to the same class if placed close to each other in a lower dimensional representation. Chapter 2. Foundation 32 2.5 Time Series Representation A high-quality data representation is essential for achieving effective results in machine learning tasks. Obtaining suitable feature vectors from time series is also a crucial step in content-based information retrieval tasks, as the arrangement of ranked lists depends directly on this information. Data representation can consider shapes, frequencies, pattern repetitions, temporal correlation between points, and other characteristics. Several methods for feature extraction in time series are briefly described below. These methods were selected based on the variability of feature extraction approaches in time series. 2.5.1 1D-Signal-based Feature Extractors Effective machine learning relies on robust data representation, particularly for time series. Feature extraction from time series is pivotal for accurate information retrieval, encompassing various properties, such as shapes, frequencies, pattern repetitions, and temporal correlations. • Beam Angle Statistic The Beam Angle Statistics (BAS) algorithm [10, 164] is a shape descriptor that identifies concavities and convexities in contours and extracts related features. The method, including in prior research [215, 192], has demonstrated promise [192]. By considering a parameter kb, a contour described by points P = (p1, p2, . . . , pn) undergoes traversal. Features are generated based on the angle αi, formed by the intersection of line segments xs (connecting points (i − kb, pi−kb ) and (i, pi)) and ys (connecting points (i, pi) and (i + kb, pi+kb )), where i = kb, kb + 1, kb + 2, . . . , n − kb. This yields the representation BAS = (α1, α2, . . . , αn−2kb ) for any contour. Figure 2.5 presents an example of the BAS execution. Figure 2.5 – Lines xs, ys and angle α for the point pi (i = 100 and 0 < kb < 50). Chapter 2. Foundation 33 • Discrete Fourier Transform (DFT) The Discrete Fourier Transform (DFT) preserves point count while transforming a signal. It dissects time series into basis functions, where early functions capture trends and later ones account for noise. Each function possesses a complex Fourier coefficient. The initial coefficients approximate the time series [171]. Figure 2.6 shows an example of DFT representation. Figure 2.6 – Time series represented by DFT. Extracted from [151] • Discrete Wavelet Transform (DWT) The Discrete Wavelet Transform (DWT) breaks time series into diverse frequencies across scales, reducing dimensions and noise [28]. It yields coarse-grained approximation and fine-grained detail coefficients through specific wavelet functions involving convolution and downsampling. Figure 2.7 shows a DWT representation. Figure 2.7 – Time series and its DTW representation. Extracted from [123] • RandOm Convolutional KErnel Transform (ROCKET) ROCKET [45] is a time series classifier that can be used for feature extraction. This method reached impressive results for many time series datasets and is computationally inexpensive. The method is based on convolutional kernels generated randomly. It generates kr convolutional kernels, extracting two features per kernel: global max pooling and the Chapter 2. Foundation 34 proportion of positive values. Then, a linear classifier is trained on the features obtained for the classification task. ROCKET excels in efficiency, boasting lower computational complexity than alternatives, needing just one hyperparameter evaluation, the number of kernels kr. Figure 2.8 presents the weights of the most representative kernels, based on mutual information, for executing the algorithm. Figure 2.8 – Weights of the most representative kernels. Extracted from [152] The descriptors were selected based on many criteria. DWT and DFT were chosen as representatives of traditional approaches, i.e., well-established methods in the field [27]. ROCKET is a recent algorithm that can be used for time series classification and feature extraction. Still, to the best of our knowledge, this work was the first to apply it in information retrieval tasks. The use of the BAS algorithm is based on [192]. BAS represents the family of methods that rely on shape-based characterization. 2.5.2 Image-Based Time Series Description The representation of times series through images which are subsequently exploited for feature extraction is a promising approach [204, 53], as demonstrated in several applications, ranging from phenology [60] to land cover analysis studies [122, 49, 48]. In those methods, time series patterns are encoded in bi-dimensional representations, and pre-trained models (e.g., based on transfer learning procedures relying on Convolutional Neural Networks and Transformer-based models) are used for feature extraction. In the following, we discuss the approaches used for obtaining the bi-dimensional representations and the deep learning approaches employed for feature extraction. Figure 2.9 presents a time series and its respective 2-dimensional representations. • Gramian Angular Fields (GAF) [204] Chapter 2. Foundation 35 (a) Time series (b) GAF (c) MTF (d) RP Figure 2.9 – Time series and its respective image representations This method generates an image based on polar coordinates of a time series T . It transforms normalized series T into T̃ using polar coordinates in [−1, 1]. Using T̃ , a Gramian Angular Field Matrix (N ×N) forms, depicted in Equation 2.2: GAF = T̃ ′T̃ − √ I − T̃ 2 ′√ I − T̃ 2 (2.2) Here, I is a unit line vector. Each point GAFi,j in the Gramian Angular Field Matrix calculates the trigonometric sum of angles between corresponding points in T̃ , representing the temporal correlation between different time intervals in the series. • Markov Transition Fields (MTF) [204] This method creates time series images based on Markov transition probabilities. Given a time series T , it is identified Q quantile bins and assigned each ti to the corresponding bin qbj (j ∈ [1, Q]), that divide the sorted series into equal subsets. A Q×Q weighted adjacency matrix W is constructed by counting transitions among quantile bins, employing a first-order Markov chain methodology along the temporal axis. wi,j represents the frequency at which a point in quantile qbj is followed by a point in quantile qbi . After normalization by ∑ j wi,j = 1, we have the Markov Transition Matrix, defined as in Equation 2.3: W =  w11|P (txϵqb1|tx−1ϵqb1) ... w1Q|P (txϵqb1|tx−1ϵqbQ ) w21|P (txϵqb2|tx−1ϵqb1) ... w2Q|P (txϵqb2|tx−1ϵqbQ ) ... ... ... wQ1|P (txϵqbQ |tx−1ϵqb1) ... wQQ|P (txϵqbQ |tx−1ϵqbQ )  (2.3) W is insensitive to the distribution of the time series T and temporal dependency on time steps xi. This results in information loss. Next, a Markov Transition Field (MTF) of dimensions N ×N is built per Equation 2.4: MTF =  wij|t1ϵqbi , t1ϵqbj ... wij|t1ϵqbi , tnϵqbj wij|t2ϵqbi , t1ϵqbj ... wij|t2ϵqbi , tnϵqbj ... ... ... wij|tnϵqbi , t1ϵqbj ... wij|tnϵqbi , tnϵqbj  (2.4) Chapter 2. Foundation 36 Here, MTFi,j is the probability of transitioning from quantile bin qbi to qbj . Specifically, MTFi,j||i−j|=k denotes the probability of transitioning between points with a time interval of k. • Recurrence Plots (RP) [53] A Recurrence Plot (RP) is a binary depiction of a time series, revealing temporal correlations. Given time series T , its trajectory T⃗ = (t⃗i, ⃗ti+drp , . . . , ⃗ti+(mrp−1)drp) is obtained for i ∈ 1, 2, . . . , n− (mrp − 1)drp, where drp is temporal delay and mrp is trajectory dimension. RPi,j becomes 1 if Euclidean distance between t⃗(i) and t⃗(j) is ≤ k. For X = n − (mrp − 1)drp, the Recurrence Plot matrix, X × X, is computed according to Equation 2.5: Ri,j = 1, if ||x⃗(i) − x⃗(j)|| ≤ k 0, else, (2.5) ∀i, j ∈ 1, 2, . . . , X 2.5.3 Deep Learning Methods as Image Feature Extractors Images hold diverse information like patterns, colors, textures, and shapes. Leveraging pre-trained neural networks via transfer learning has exhibited substantial potential in diverse domains due to robust generalization and feature learning [115]. In this research, we utilized five neural networks for feature extraction, pre-trained on ImageNet [46]. • CNN ResNet-152 ResNet [72] represents a Convolutional Neural Network series explicitly built for image recognition tasks. The ResNet-152 variant, characterized by its depth comprising up to 152 layers, effectively addresses the challenge of gradient vanishing existing in deep architectures by leveraging skip connections. These skip connections play a fundamental role in augmenting the flow of gradients during the backpropagation phase, promoting the efficient training of deep networks. The feature extraction output is sourced from the terminal’s fully connected layer of the network architecture, which has 2048 dimensions. • Vision Transformers Vision Transformers (ViT) [96] demonstrate particular efficacy in object detection and image classification tasks. Thanks to the self-attention mechanism, ViTs can simultaneously focus on essential parts of the image. The input image transforms patches, which are subsequently processed by an encoder to produce hidden states. A decoder component generates output tokens through layers of self-attention and feedforward Chapter 2. Foundation 37 networks. Masked self-attention within the decoder selectively focuses on preceding tokens. In the context of feature extraction, we utilize the CLS token, as delineated in the work by Kim et al. [95]. The obtained features contain 1280 dimensions. • VGG-19 VGG-19 [179] is a Convolutional Neural Network developed by the Visual Geometry Group (VGG). Composed of 19 layers, it features 16 convolutional layers stacked sequentially, followed by max-pooling layers for dimensionality reduction. The network closes with three fully connected layers. For feature extraction, the last fully connected layer before classification was utilized. The resulting features are 4096-dimensional. • EfficientNet V2 EfficientNetV2 [187] is a family of convolutional neural networks based on the EfficientNet architecture. Like its predecessor, EfficientNetV2 employs the compound scaling method to balance the network’s depth, width, and resolution. By adjusting these aspects, EfficientNetV2 can be resized efficiently to work well on different devices, even those with limited resources. While retaining core components such as depthwise separable convolutions, inverted residual blocks, and squeeze-and-excitation modules, EfficientNetV2 introduces optimizations and improved normalization layers to stabilize and accelerate the training process. Advanced training techniques are also employed, including learning rate schedules, data augmentation strategies, and regularization methods. The Large model variant was selected for feature extraction, utilizing the last average pooling layer as output, and the obtained features contain 1280 dimensions. • DIrNification with NOisy labels (DINO) V2 DINO V2 [134] is a self-supervised vision transformer-based model developed by Meta AI. The model is pre-trained based on the "Instance Discrimination with No Annotations"(INDA), meaning no labeled data is required, only a large dataset of unlabeled images. Richer representations are obtained as the model is trained directly on image data. DINO V2 has two main components: an encoder and discriminative self-supervised learning. The first is responsible for extracting visual features from images. It consists of multiple layers of Transformer blocks, each composed of a self-attention layer and a feed-forward network. This step generates a representation that captures crucial visual features of the image. The second is a Transformer-based discriminator consisting of multiple layers of Transformer blocks. The discriminator receives global representations of different augmented versions of the same image, and its task is to differentiate the different versions. The discriminator performance directly impacts the encoder training. Confusing original and augmented versions of an image means that the encoder needs to improve its feature extraction, as the features are not fully capturing the image’s identity. Also, if the discriminator struggles to differentiate unrelated images, there may be issues with the Chapter 2. Foundation 38 training, or the encoder needs to refine its visual understanding. Through this feedback loop, the encoder is helped to learn to produce robust features that capture the essence of the image and differentiate between different images and their variations. The raw model was used for feature extraction [74], generating features with 384 dimensions. 2.6 Unsupervised Distance Learning The feature vectors associated with dataset samples can vary in spatial distribution within the feature space, potentially leading to severe impacts on machine learning tasks. On the other hand, unsupervised distance learning algorithms can improve retrieval and machine learning effectiveness [146, 193] by enhancing sample representations based on their rearrangement in the feature or distance space. We follow the retrieval models proposed for image search [37] and re-ranking [142, 150]. Formally, let C = {T1, T2, . . . , Tn} be a collection composed of n time series. Let Dt be a time series descriptor, which can be defined by the tuple (ϵ, ρ), where ϵ : Ti −→ Rd is a function that extracts a feature vector vTi from a time series Ti, and ρ : Rd ×Rd −→ R+ is a function that computes the distance between two time series Ti and Tj based on their corresponding feature vectors, formally defined as ρ(ϵ(Ti), ϵ(Tj)) or simply ρ(i, j). The distance ρ(i, j) for each pair (i, j) can be computed to form a square matrix A of dimension n× n, where Aij = ρ(i, j). Based on the distance ρ, a ranked list τq is computed for each time series Tq present in the collection. The ranked list τq = (T1, T2, . . . , Tn) is a permutation of the collection C, where τq(i) is the rank of the time series Ti in τq. The rank is defined according to the distance measure, such that if τq(i) < τq(j), then ρ(q, i) < ρ(q, j). The set of ranked lists for each time series in C is defined as R = {τ1, τ2, . . . , τn}. The re-ranking task can be seen as recalculating the distance ρ using a more effective distance function ρr. The function ρr(i, j,R) aims to explore the contextual information present in the set R, analyzing the relationships existing in a k-neighborhood of the ranked lists and improving the effectiveness of distances between time series. Formally, ρr : C × C × R −→ R+ is a distance function between time series Ti and Tj that takes into account the contextual similarity information contained in the set R. [195]. Rank aggregation tasks can also be performed. Let Aa = {A1, A2, ..., An} a set of distance matrices from a dataset and Ra = {R1,R2, ...,Rn} a set of the corresponding ranked lists. The objective is utilizing the sets Aa and Ra to obtain a new distance matrix Âc, in which Âc = fa(Aa,Ra). With the new distance matrix Âc, it is possible to calculate a new ranked list Rc. Some rank aggregation methods do not utilize the set Aa, calculating directly the new ranking Rc, being characterized by Rc = fa(Ra). Figure 2.10 illustrates unsupervised learning methods’ ability to consider the dataset’s geometry. Query points, highlighted with triangles, are defined in each moon of Chapter 2. Foundation 39 the dataset, and the remaining points are assigned colors based on their closest query point. In (a), Euclidean Distance is used for distance calculation. It can be observed that the metric could not classify the data correctly as it does not consider the dataset’s structure. In (b), the Reciprocal kNN Graph+CCs method [143] is used to recalculate the distances, which can consider the dataset’s geometry and correctly classify the elements of the two moons. (a) Euclidean Distance (b) Unsupervised Distance Learning Figure 2.10 – Dataset TwoMoons: the points are colored considering the lower distance between the query points (triangles). In (a), it utilizes the Euclidean Distance, and in (b), it utilizes an Unsupervised distance learning method. Figure taken from [143] The ability to incorporate the complete dataset structure significantly influences machine learning tasks, particularly in retrieval scenarios. Numerous methods, initially designed for image retrieval, have extended their application to various domains, including audio [41], video [42], and time series [7, 170]. Below, four recent unsupervised distance learning methods used in this work are described. 2.6.1 Log-based Hypergraph of Ranking References (LHRR) Hypergraphs generalize graphs with hyperedges, which connect multiple vertices and can represent higher-order similarity relationships. The LHRR method uses a hypergraph model from ranking data, assigning hyperedge weights via a log-based function. Hyperedges provide contextual dataset representation, enhancing similarity measurement efficiency through the product of hyperedge similarities [146]. Chapter 2. Foundation 40 2.6.2 Rank Diffusion Process with Assured Convergence (RDPAC) RDPAC [69] is an unsupervised distance learning algorithm with low computational complexity that exploits a diffusion process considering only the top-k positions from ranked lists. The algorithm starts with a similarity measure based on ranking information, and then normalization is performed to improve the ranking symmetry. The third step is a rank diffusion process with assured convergence that requires a small number of iterations. A post-diffusion step is performed to explore the reciprocal rank information. 2.6.3 Breadth-First Search (BFS) Tree of Ranking References This algorithm constructs a tree using the breadth-first search on a graph created from dataset ranking references [148]. Starting from a query object root, the first level includes top-k similar objects according to rankings. Subsequent levels stem from top-k objects in the first level’s ranking references The BFS-Tree can represent similarity information between the query and its k neighbors in a dataset. Objects occurring at different levels of the structure possibly exhibit higher similarity, while images occurring infrequently may indicate non-relevant elements or noise [148]. 2.6.4 Rank Flow Embedding (RFE) RFE is an algorithm proposed for both retrieval and classification tasks, considering contextual information to obtain better results [193]. RFE comprises five steps. The first is a ranked list normalization considering a sigmoid score based on reciprocal ranked list positions. Then, a re-ranking step is performed utilizing a hypergraph structure. The obtained hypergraph is also utilized for computing embeddings. Then, another two steps are performed for re-ranking, utilizing Cartesian Product and Connected Components, which are defined based on the hypergraph structures. More effective embeddings are computed based on the element similarity and its identified Connected Components for semi-supervised classification tasks. Enhanced unsupervised retrieval and semi-supervised classification are achieved through stepwise improvement of rank-based similarity information. 2.7 Univariate Time Series Datasets This section introduces seven univariate time series datasets sourced from the UCR Archive [39], with several sizes, number of classes, time series lengths, and domains. The UCR Time Series Archive is the primary source of univariate and multivariate time series datasets with a wide range of domains and sizes. Not all datasets from UCR Time Series Archive were contemplated due to restricted computational resources and time. Chapter 2. Foundation 41 • Cylinder-Bell-Funnel (CBF) [167]: A simulated dataset with 930 elements of size 128, divided into 3 classes. Each class consists of standard normal noise plus an offset term that varies for each class. • ECG5000 [31]: Derived from a 20-hour-long ECG record called BIDMC Congestive Heart Failure Database (chfdb). It contains 5000 pre-processed heartbeats of length 140, randomly selected from the dataset "chf07", with five classes. • Yoga [207]: Composed of 3300 time series of size 426, generated from videos of actors transitioning between yoga poses. Each image was converted to a time series considering the distances between the actor’s contour and the image center. The task is to distinguish between male and female actors. • Electric Devices [12]: Contains 16637 time series of length 96, divided into 7 classes. The data represents daily power consumption measurements of devices from 187 households, including washing machines, ovens, dishwashers, kettles, immersion heaters, cold groups (fridge, freezer), and screen groups (computer, television). The goal is to classify each device based on its daily measurements. • GunPoint [156]: This dataset is based on the movement of drawing a gun from a holster, aiming at a target, and then returning the gun to the holster. The dataset contains two classes: "Gun-Draw,"where actors use a gun replica to perform the movement, and "Point,"where they use their index finger to point at the target. The time series is captured by mapping the X-axis coordinates of the centroid of the actor’s right hand. It consists of 200 time series, each with a length of 150. • Beef [5]: 60 samples of silverside beef were equally divided into five classes: one of pure beef, and the other four classes are contaminated samples with 20% w/w of tripe, liver, kidney, and heart, respectively. Meat samples were submitted to cooking in a microwave oven for a limited time in different power settings. From both cooked and uncooked samples, spectrograms were obtained in the same conditions, and a chemometric analysis was conducted considering 470 data points. The problem consists of distinguishing each class. • Rock [14]: Contains 70 rock samples from the ASTER spectral library. The series corresponds to the spectral reflectance as a function of wavelength in microns, not time. There are 4 types of rocks, containing time series of lengths of 2844, and the problem consists in distinguishing them. This Section described diverse univariate time series datasets, used in two different experiments. Chapter 5 presents multivariate time series datasets used in the experiments. Chapter 2. Foundation 42 2.8 Evaluation Measures The evaluation of the effectiveness of retrieval and machine learning tasks relies on evaluation measures [158, 163], which will be discussed in this Section. • Accuracy This metric, widely used to evaluate classification tasks, calculates the percentage of elements correctly labeled within a dataset that has been labeled [168]. The multi-class accuracy is defined by Equation 2.6: accuracy = corr_preds preds , (2.6) where corr_preds stands for the correct predictions in a labeled dataset, and preds is the total number of predictions. • Adjusted Mutual Information (AMI) AMI is a statistical measure, utilized to evaluate clustering tasks, that quantifies the similarities between two sets of clusterings, considering the Mutual Information (MI) score and correcting the chance agreement [199]. The AMI between two clusters is given as Equation 2.7: AMI(U, V ) = [MI(U, V ) − E(MI(U, V ))] [avg(H(U), H(V )) − E(MI(U, V ))] , (2.7) where E(MI(U, V )) is the expected mutual information and H(U) is the entropy of a cluster U . • Precision@k The Precision is widely used to evaluate retrieval tasks and aims to evaluate the proportion of relevant items among all the retrieved items in a depth k. It is defined as follows in Equation 2.8: P = TP@k k , (2.8) where TP@k represents the number of relevant retrieved items in a depth k of the ranked list. • Recall@k Recall is also utilized to evaluate retrieval tasks and measure the proportion of relevant instances retrieved. Recall is defined as follows in Equation 2.9: R = TP@k TP , (2.9) where TP@k stands for the number of relevant retrieved items in depth k of the ranked list, and TP is the total number of retrieved relevant items in a ranked list. Chapter 2. Foundation 43 • Mean Average Precision (mAP) Mean Average Precision (mAP) is a common metric for evaluating the effectiveness of ranked lists obtained in information retrieval tasks. This metric calculates the average of the Average Precision (AP) for each query [231]. The mAP is defined in Equation 2.10: MAP = ∑Q l=1 AP (ql) Q , (2.10) where Q is the number of queries and ql represents a query l. The Average Precision (AP), as defined in Equation 2.11, is the average precision for a recall varying from 0 to 100% [231]: AP = 1 Nr d∑ i=1 (ri 1 i∑ j=1 rj), (2.11) where Nr is the number of relevant items in a collection for a query q and ri represents the relevance (irrelevant = 0 or relevant = 1) of the i-th item in a ranked list of size d. 2.9 Software and Frameworks This Section briefly presents the main frameworks and software utilized in this work. Their main uses are also discussed. • UDLF [194]: This is a framework for unsupervised distance learning, implemented in C++, and containing diverse algorithms, including the ones explained in Section 2.6. It performs re-ranking and contextual rank aggregation starting from distance, similarity, or ranking matrices; • pyts [59]: This is a Python programming language library designed with tools for time series processing. It contains machine learning algorithms for time series, datasets, preprocessing, and utility tools; • Matplotlib [79]: This Python library is utilized for image and graphics generation and manipulation. Besides the graphic generation, we also utilize Matplotlib to generate the images described in Section 2.5.2; • Scikit Learn [140]: A complete machine learning library for Python. It contains machine learning algorithms, evaluation measurements, and pairwise distance measurements, among other tools; • NumPy [70]: A fundamental tool in Python for manipulating arrays and matrices; • Tensorflow [3] and PyTorch [138]: These are two Python libraries for deep learning algorithms. Mainly utilized to execute the neural networks described in Section 2.5.3. 44 3 Related Work Time series data contains diverse encoded information, and effectively analyzing them poses significant challenges. Additionally, the increasing production of time series datasets demands efficient management strategies. One way to work with time series datasets is through information retrieval tasks. In information retrieval problems applied to time series, two evaluation approaches depend on the available data. In problems involving a dataset with multiple time series, a single series is selected as a query, and other series are ranked in ascending order based on a distance measure, representing their similarity to the query [7]. For instance, in a database containing time series derived from satellite images for studying plant phenology, one might want to retrieve similar data to a given query series to identify the same plant species in different regions [7]. On the other hand, problems with a dataset containing a single time series involve selecting a segment of defined size and ranking segments closest to the query based on a distance measure [227]. For example, in a time series containing an electroencephalogram associated with eye states (open or closed), one might select a segment where the eyes are in a specific state and then retrieve more segments with the same eye state [227]. Over the years, information retrieval in time series has been approached in various ways and contexts. Numerous models have been proposed, ranging from general approaches for diverse data scopes to specific methods for applications such as analyzing time series obtained from satellite images captured over time [135], or techniques tailored for searching multivariate time series [227], for instance. Other tasks provide alternative perspectives on time series analysis. Tasks like clustering and classification allow grouping sets of time series considering data characteristics, revealing patterns within a dataset. This can be used to analyze changes in population mobility patterns caused by external factors [225], detect early-stage cancer [228], or predict disease outbreaks [71]. On the other hand, forecasting tasks aim to predict future values of a time series. Relevant applications include financial market predictions [223] and echocardiogram forecasting [157], which can identify abnormal heartbeat patterns and enable early medical intervention. The literature presents a variety of methods for time series analysis. Although information retrieval is a crucial task in data processing, it is a relatively unexplored area in the context of time series compared to labeling and forecasting tasks. Section 3.1 discusses models applied to univariate time series, while Section 3.2 presents methods for multivariate time series. Section 3.3 discusses techniques for processing time series Chapter 3. Related Work 45 generated from images and methods that encode time series as images. Finally, Section 3.4 briefly discusses the literature gaps filled by this work. 3.1 Univariate Time Series 3.1.1 Information Retrieval Analyzing information in time series efficiently and effectively is a challenging task. Over the years, various methods have been proposed to address and optimize this issue. Univariate time series are more common, and much of the existing research focuses on handling this data type. Comparative studies, such as the one proposed by Ding et al. [50], allow for evaluating and comparing various available time series retrieval methods. In this work, comparisons are made among 8 different time series representation methods, including DFT, DCT, DWT, PAA, APCA, SAX, CHEB, and IPLA, as well as 9 similarity metrics, including SpADe, TQuEST, ERP, EDR, LCSS, DTW, DISSIM, Swale, and Euclidean, across 38 univariate time series datasets. This study provides a wide range of results for various datasets. Furthermore, another challenging aspect in the field is dealing with extremely large datasets. Retrieving information from large datasets, not only in the context of time series but also for any multimedia data, is a computationally demanding task since comparing a query with all elements in the dataset is highly costly. In this regard, various algorithm optimizations are made to reduce the search time. In [155], several optimizations for distance metrics (such as Euclidean distance and DTW) are presented, enabling retrieval tasks to be performed much faster than other works developed during the same period. The algorithm DTW, which accomplishes retrieval quickly and delivers comparable results to the state of the art in the period, demonstrated to be consistent. Recent approaches propose tackling this issue differently, as seen in [172]. In this work, an adaptive model for time series information retrieval is proposed, based on relevance feedback, to retrieve a small number of series per query, considering large datasets. As searches are performed, the algorithm builds a collection of patterns existing in the time series, aiding in adapting the algorithm to perform more efficient searches. In addition to relevance feedback, other machine learning methods can be used to improve the efficiency and effectiveness of information retrieval results. For instance, in [137], a supervised dictionary learning method, called EO-BoW, is introduced for optimizing the Bag-of-Words (BoW) representation in Information Retrieval. The method utilizes an entropy-based optimization criterion, suitable for retrieval tasks rather than classification, following the cluster hypothesis. Experiments conducted in time series Chapter 3. Related Work 46 databases, and data from other domains, demonstrate that EO-BoW improves retrieval performance, and reduces encoding time and database storage requirements. Moreover, EO-BoW maintains its representation ability when used for retrieving objects from unseen classes during training. Chen et al. [30] propose the deep multi-task representation learning method (MTRL) for time series classification and retrieval tasks, leveraging supervised and unsupervised information. Supervised information for classification tasks aims to reduce intra-class differences and increase inter-class differences. Unsupervised information for retrieval tasks aims to preserve DTW distances between pairs. These tasks share networks, so the information obtained from one task can benefit the other. The results presented in the experiments outperform the other state-of-the-art methods at the time. Often, labeled information is not available, making supervised learning tasks infeasible. In [21], the problem of content-based time series retrieval (CBTSR) is addressed, which bears similarities to the proposal of this work. Using satellite sensor images, the goal is to retrieve data related to abrupt changes. For this purpose, features of the type "change vector analysis"(CVA) are extracted, and a clustering and spectral change vectors (SCV) based approach is used for retrieval: the elements to be retrieved have SVCs in the cluster to which the query belongs. This membership is verified using Euclidean distance, and other distance measures can also be utilized. The results, considering the top 20 retrieved elements, achieve precision rates between 89% and 99%. 3.1.2 Clustering Clustering tasks in time series data are also fundamental, and several approaches have been proposed. Many current clustering methods consider traditional distance measures, which do not take shape characteristics into account. In this context, algorithms that consider this property have become popular for time series data. The "Fractional Order Shape-based k-cluster (FrOKShape)"algorithm [109] utilizes a distance measure based on multivariate shape by normalized fractional order cross-correlation and the "DTW Barycenter Averaging (DBA)"measure for centroid calculation. The obtained results are comparable to the "KShape"algorithm, which is widely used for time series data. The centroid approach is similar to the popular "K-Means"algorithm in the literature, which performs well in various domains. However, the technique used to calculate centroids is not satisfactory for handling time series data. In this regard, Krishnan et al.[83] propose adaptations in the clustering algorithm, initializing centroids in the farthest neighbors, contrary to the traditional approach that initializes centroids randomly. Additionally, distance calculations are performed using the DTW algorithm. Finally, the clustering is done using Kohonen maps. This approach showed better performance and scalability than traditional methods. Other approaches, such as hierarchical clustering, can also be applied to time series data. By using a hierarchical clustering method based on Chapter 3. Related Work 47 tail dependence coefficients, the authors propose a linkage method based on the tail dependence between clusters merged at each iteration[43]. The proposed linkage method showed superior results compared to traditional approaches. Another relevant approach is presented in [153], where a framework for clustering is proposed, involving obtaining geometric representations of time series data and then performing clustering tasks. The best results were achieved using the UMAP and HDBSCAN methods. The work proposed in [8] evaluates traditional clustering algorithms with three proposed time series similarity measurements, using meteorological and synthetic data, and it is noteworthy that the results between the measurements are similar, and the clustering results are strongly correlated to the parameters. Kang et al. [91] propose the PFD_DTW_HC method for clustering time series. First, Hodrick Prescott (HP) filtering is applied to reduce noise and redundancy in the raw time series, then, the polynomial curve fitting (PCF) is applied to derive a globally continuous function fitting for time series. The time series is transformed into a multi-order derivative feature sequence. The designed polynomial function derivative features-based dynamic time warping (PFD_DTW) algorithm is performed to determine the distance between two equal or unequal granular length time series, and hierarchical clustering is applied. The proposed method has advantages in capturing trend information and interpretability and is more precise than other clustering alternatives, but is sensible to outliers and has high computational complexity. The clustering of time series data has various applications. For instance, the method proposed in [78] enables the identification of the condition of an aluminum reduction cell using clustering. The method utilizes fuzzy c-means in combination with the DTW distance for clustering the current states of the anodes, along with the pseudo resistance of an anode-cathode path. Finally, a clustering validation index is used to find the ideal number of clusters. Another notable work is the proposal of the "KDetect"algorithm [176], designed for unsupervised anomaly detection in cloud systems. Time series representing the CPU consumption of unknown services are used in a clustering algorithm, enabling anomaly detection. In the experiments, the precision and recall values exceeded 98%. Moreover, the clustering of time series data enabled the analysis of changes caused by the COVID-19 pandemic. Motivated by the analysis of pattern changes in mobility due to COVID-19, Zhang et al. [225] propose the use of the "K-Means"method with the DTW distance measure, achieving more satisfactory results, although with high execution time. 3.1.3 Classification Unlike clustering, which is an unsupervised learning method, classification tasks allow data labeling based on a pre-identified set. Among the challenges in classification problems is the selection of good features. Lee et al. [103] propose a representation method for open-ended univariate time series that is real-time, normalization-free, and Chapter 3. Related Work 48 parameter-tuning-free. Each data point of the time series is converted into a root mean squared error (RMSE) value based on Long Short-Term Memory (LSTM) and a Look-Back and Predict-Forward strategy on the fly. Experiments suggest that the approach is suitable and can be easily deployed on commodity computers and used as a pre-processing tool for real-time time series analysis, such as clustering and classification. Especially in the classification field, a model for selecting temporal features is proposed using a random forest for classification performed by a FCN (Fully Convolutional Network). Several experiments demonstrate the effectiveness of this method [85]. Building on the issue of features in time series data, another method called Adaptive Feature Fusion Network (AFFNet) was created. It combines temporal and data features extracted using a dynamic multiscale CNN model. Distance features are extracted from a prototype distance network model. Finally, a feature fusion model is utilized for classification, and the proposed method achieved superior results compared to the state-of-the-art for many tested datasets [203]. While the utilization