UNIVERSIDADE ESTADUAL PAULISTA
“JÚLIO DE MESQUITA FILHO”
Câmpus de Rio Claro

Programa de Pós-Graduação em Ciência da Computação

Bionda Rozin

Machine Learning and Information Retrieval
Techniques for Time Series Analysis

Rio Claro - SP

2024


UNIVERSIDADE ESTADUAL PAULISTA
“Júlio de Mesquita Filho”

Instituto de Geociências e Ciências Exatas
Câmpus de Rio Claro

Bionda Rozin

Machine Learning and Information Retrieval
Techniques for Time Series Analysis

Dissertação de Mestrado apresentada ao
Instituto de Geociências e Ciências Exatas
do Câmpus de Rio Claro, da Universidade
Estadual Paulista “Júlio de Mesquita Filho”,
como parte dos requisitos para obtenção do
título de Mestre em Ciência da Computação.

Orientador: Prof. Dr. Daniel Carlos
Guimarães Pedronette

Rio Claro - SP
2024


R893m
Rozin, Bionda

    Machine learning and information retrieval techniques for time

series analysis / Bionda Rozin. -- Rio Claro, 2024

    134 p. : il., tabs.

    Dissertação (mestrado) - Universidade Estadual Paulista (UNESP),

Instituto de Geociências e Ciências Exatas, Rio Claro

    Orientador: Daniel Carlos Guimarães Pedronette

    1. Aprendizado de Máquina. 2. Recuperação da Informação. 3.

Extração de Características. 4. Ranqueamento. 5. Analise de Series

Temporais. I. Título.

Sistema de geração automática de fichas catalográficas da Unesp. Dados fornecidos pelo autor(a).


UNIVERSIDADE ESTADUAL PAULISTA
“Júlio de Mesquita Filho”

Instituto de Geociências e Ciências Exatas
Câmpus de Rio Claro

Bionda Rozin

Machine Learning and Information Retrieval
Techniques for Time Series Analysis

Dissertação de Mestrado apresentada ao
Instituto de Geociências e Ciências Exatas
do Câmpus de Rio Claro, da Universidade
Estadual Paulista “Júlio de Mesquita Filho”,
como parte dos requisitos para obtenção do
título de Mestre em Ciência da Computação.

Comissão Examinadora
• Prof. Dr. Daniel Carlos Guimarães Pedronette (Orientador)

Instituto de Geociências e Ciências Exatas (IGCE)
Universidade Estadual Paulista - UNESP

• Prof. Dr. Diego Furtado Silva
Instituto de Ciências Matemáticas e de Computação (ICMC)
Universidade de São Paulo - USP

• Prof. Dr. Lucas C. Ribas
Departamento de Ciências da Computação e Estatística (DCCE)
Universidade Estadual Paulista - UNESP

Conceito: Aprovada.
Rio Claro (SP), 04 de setembro de 2024.


Acknowledgements

I am grateful for my life, health, and resilience.

I thank my parents and those who stand close to me, always encouraging me always
to strive for more.

I am extremely grateful to my advisor, Professor Daniel Pedronette, for all the
guidance and support.

I also extend my gratitude to Professor Ricardo da Silva Torres for agreeing to be
my advisor during my exchange in a foreign country.

Thanks to Emilio Bergamin Junior, Professor Fabricio Aparecido Breve, and
Professor Felipe Arruda Moura, for the collaborative work.

I appreciate the support from São Paulo State University (UNESP), professors and
colleagues from the research group.

I am also thankful to the Wageningen University & Research, for welcoming me as
an exchange student.

Grants #2022/01359-1 and #2023/08087-0, São Paulo Research Foundation
(FAPESP)


Resumo
Considerando o vasto domínio de aplicações de dados temporais, como o setor médico,
agrícola, financeiro e científico, por exemplo, exige-se cada vez mais a análise e
processamento desse tipo de dado. Tarefas como recuperação da informação, classificação
e agrupamento são cruciais para analisar séries temporais em diferentes contextos e
com diferentes objetivos. Recuperação da informação aplicadas em conjuntos de séries
temporais permitem a identificação de padrões e ranqueamento dos dados conforme a sua
semelhança, enquanto a classificação rotula séries temporais com base em um conjunto
de treinamento, e tarefas de clustering agrupam séries temporais com base em suas
similaridades. Ainda, há a classificação semi-supervisionada, que considera ambos os dados
rotulados e não-rotulados para classificar os dados. No geral, as tarefas de aprendizado
de máquina e recuperação da informação são extremamente dependentes de uma boa
representação computacional dos dados, gerando resultados mais eficazes e conclusões
mais assertivas em relação à tarefa executada. Neste cenário, um dos desafios é obter
uma boa representação computacional das séries temporais. Além disso, medidas de
similaridade geralmente consideram apenas a similaridade par a par, desconsiderando
informações importantes presentes na vizinhança dos itens analisados, no conjunto de dados.
O objetivo dessa pesquisa é aplicar tecnicas de aprendizado de maquina e recuperação
da informação para obter resultados efetivos em análises de séries temporais. Quatro
diferentes métodos são empregados e diferentes extratores de características são avaliados
em todas tarefas. Primeiro, um estudo comparativo de representação e ranqueamento de
séries temporais univariadas por meio de aprendizado contextual de distância baseado
em ranqueamento é conduzido em 10 conjuntos de dados diferentes, levando a ganhos
de mAP de até 31,78%. Dando sequência a esta linha de pesquisa, propomos a análise
de séries temporais multivariada processando cada dimensão da série individualmente e
utilizando métodos de agregação contextual de ranques para mesclar resultados e obter uma
representação de similaridade utilizada para recuperação e classificação, obtendo resultados
competitivos a dois métodos do estado da arte. Também é proposto um arcabouço baseado
em agrupamento para análise de dados baseada na codificação de gráficos temporais, onde
os dados são divididos usando critérios de segmentação temporal, e resultados altamente
interpretativos são alcançados neste arcabouço quando aplicado à análise de posse de bola
em partidas de futebol. Por último, é proposta uma classificação semi-supervisionada de
séries temporais univariadas utilizando métodos de representação por imagem e propagação
de rótulos, alcançando resultados semelhantes à classificação supervisionada.

Palavras-chave: Classificação; Recuperação da Informação; Agrupamento; Aprendizado
Semi-Supervisionado; Aprendizado de Maquina; Extração de Características;
Ranqueamento; Analise de Series Temporais.


Abstract
Due to the great applicability of time series in diverse scenarios, such as medicine,
agriculture, economics, and science, the analysis and processing of this kind of data
is demanding. Tools such as information retrieval, classification, and clustering are crucial
for analyzing time series in different contexts and with different objectives. Information
retrieval tasks in time series data can identify patterns and rank data by similarity. At
the same time, classification can label time series based on a training set, and clustering
can group time series based on their similarities. Also, semi-supervised classification
considers both labeled and unlabeled data to perform classification. In general, Machine
Learning and Information Retrieval tasks are extremely dependent on a good computational
representation of data, generating more effective results and assertive conclusions about the
performed task. In this scenario, one of the main challenges is to obtain good features from
Time Series. Also, similarity metrics usually consider only pairwise relations, not considering
important information in the neighborhood of the analyzed items in the dataset. The
objective of this research is to apply machine learning and information retrieval techniques
for obtaining effective results in time series analysis. Four different methods are employed,
and different feature extractors are evaluated in all tasks. First, a comparative study
of univariate time series representation and ranking through contextual ranked-based
distance learning is conducted in 10 different datasets, leading to mAP gains up to 31.78%.
Giving sequence to this research line, we propose multivariate time series analysis by
processing each dimension of the series individually and using contextual rank aggregation
methods to merge results and obtain a similarity representation used for retrieval and
classification, obtaining competitive results to two SOTA methods. A clustering-based
framework for data analysis based on temporal graph encoding is also proposed, where
data is split using time segmentation criteria, and highly interpretative results are reached
in this framework when applied to ball possession analysis in football matches. Last,
semi-supervised classification of univariate time series using imaging methods and label
propagation is proposed, reaching similar results to supervised classification.

Keywords: Classification; Information Retrieval; Clustering; Semi-Supervised Learning;
Machine Learning; Feature Extraction; Ranking; Time Series Analysis.


List of Figures

Figure 1.1 – Overview of goals and contributions of this work. . . . . . . . . . . . . 22
Figure 1.2 – Organization of the work . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Figure 2.1 – Univariate and Multivariate Time Series . . . . . . . . . . . . . . . . . 25
Figure 2.2 – Comparison between the Euclidean Distance alignment and DTW

alignment. Extracted from [36] . . . . . . . . . . . . . . . . . . . . . . 27
Figure 2.3 – Typical architecture of a content-based time series retrieval system.

Inspired in the content-based image retrieval system presented in [191] 28
Figure 2.4 – Feature extraction and distance calculation between two time series . . 30
Figure 2.5 – Lines xs, ys and angle α for the point pi (i = 100 and 0 < kb < 50). . . 32
Figure 2.6 – Time series represented by DFT. Extracted from [151] . . . . . . . . . 33
Figure 2.7 – Time series and its DTW representation. Extracted from [123] . . . . . 33
Figure 2.8 – Weights of the most representative kernels. Extracted from [152] . . . . 34
Figure 2.9 – Time series and its respective image representations . . . . . . . . . . . 35
Figure 2.10–Comparison between Euclidean Distance and Reciprocal kNN

Graph+CCs results in the dataset TwoMoons . . . . . . . . . . . . . . 39
Figure 4.1 – Pipeline employed in the comparative study. . . . . . . . . . . . . . . . 60
Figure 4.2 – Visual ranking results on Rock [14] dataset for DWT-Detail

representations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Figure 4.3 – Visual ranking results on Rock [14] dataset for BAS representations. . . 68
Figure 4.4 – Visual ranking results on Rock [14] dataset when the time series raw

values are used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Figure 4.5 – Visual ranking results on Rock [14] dataset for RP-ViT representations. 69
Figure 4.6 – Measurements for the Beef dataset, considering the ROCKET feature

extractor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Figure 4.7 – Measurements for GunPoint dataset, considering the RP+ResNet

feature extractor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Figure 4.8 – Comparision between datasets for GAF+ViT feature extractor. . . . . 71
Figure 5.1 – Proposal for Multivariate Time Series retrieval and classification . . . . 74
Figure 5.2 – Accuracy varying over the value of k. . . . . . . . . . . . . . . . . . . . 77
Figure 6.1 – Synetsis of our clustering-based framework for data analysis . . . . . . 81
Figure 6.2 – Generic pipeline for data characterization based graph visual rhythm

representations and transfer learning procedures. . . . . . . . . . . . . 82
Figure 6.3 – Graph changes over time. Red and Blue graphs correspond to different

teams, while the yellow point represents the ball. . . . . . . . . . . . . 83
Figure 6.4 – Example of Graph Visual Rhythm representation . . . . . . . . . . . . 85


Figure 6.5 – Statistics for clusters created based on the configuration, including the
use of Betweenness Centrality, DINO V2, UMAP, and HDBSCAN. . . 97

Figure 6.6 – Statistics for clusters created based on the configuration, including using
Local Efficiency, ResNet-152, UMAP, and HDBSCAN. . . . . . . . . . 98

Figure 6.7 – Statistics for clusters created based on the configuration, including the
use of Average Path Lenght, ResNet-152, UMAP, and Affinity Propagation. 99

Figure 7.1 – Proposed methodology for time series semi-supervised classification . . 101
Figure 7.2 – Results for the ElectricDevices dataset as a function of rl. Supervised

baselines were obtained from 8926 labeled points, corresponding to
rl ≈ 0.54. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Figure 7.3 – Results for the CBF dataset as functions of the number of nearest
neighbors k (up) and the rate of labeled data rl (down). Supervised
baselines were obtained with initially 30 labeled points, corresponding
to rl ≈ 0.03. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

Figure 7.4 – Results for the ECG5000 dataset as a function of rl. Supervised baselines
were obtained with initially 500 labeled points, corresponding to rl = 0.1.106

Figure 7.5 – Results for the Yoga dataset as a function of rl. Supervised baselines
were obtained with initially 300 labeled points, corresponding to rl ≈ 0.09.107

Figure 7.6 – Classification results as a function of k for the Yoga (up) and Electric
Devices (down) datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Figure 7.7 – Kernel density estimation of pairwise distances (left) and coefficient
of variation of pairwise similarities as a function of k (right) for CBF,
ECG5000, and Yoga. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109


List of Tables

Table 4.1 – mAP values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Table 4.2 – P@5 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Table 4.3 – P@10 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Table 4.4 – R@5 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Table 4.5 – R@10 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Table 5.1 – Retrieval results: P@10 . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Table 5.2 – Retrieval results: mAP . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Table 5.3 – Accuracy comparison results . . . . . . . . . . . . . . . . . . . . . . . . 78
Table 6.1 – Results for the Average Path Length graph measurement. . . . . . . . . 92
Table 6.2 – Results for the Betweenness Centrality graph measurement. . . . . . . . 92
Table 6.3 – Results for the Eccentricity graph measurement. . . . . . . . . . . . . . 93
Table 6.4 – Results for the Local Efficiency graph measurement. . . . . . . . . . . . 93
Table 6.5 – Results for the Vulnerability graph measurement. . . . . . . . . . . . . . 94


List of Abbreviations and Acronyms

ACC Color Autocorrelogram
AF AtrialFibrillation
AFFNet Adaptive Feature Fusion Network
AI Artificial Intelligence
AMI Adjusted Mutual Information
AP Affinity Propagation
APCA Adaptive Piecewise Constant Approximation
ARIMA Autoregressive integrated moving average
AWR ArticularyWordRecognition
BAS Beam Angle Statistics
BFS-Tree Breadth-First Search Tree of Ranking References
BIC Border/Interior Classification
BM BasicMotions
BOSS Bag-of-SFA-Symbols
BOVW Bag of Visual Word
Bow Bag-of-Words
CBR Content-Based Information Retrieval
CBTSR Content-based time series retrieval
CFD Contour Features Descriptor
CHEB Chebyshev Polynomials
CHI Calisnki-Harabasz Index
CLS Classification
CNN Convolutional Neural Network
conDetSEC constrained Deep embedding time SEries Clustering
CPU Central Processing Unit
CVA Change vector analysis
DB DBSCAN
DBA DTW Barycenter Averaging
DBI Davies-Bouldin Index
DBSCAN Density-Based Spatial Clustering of Applications with Noise
DCT Discrete Cosine Transform
Deep r-RSJBE Deep r-th root of Rank Supervised Joint Binary Embedding
DFT Discrete Fourier Transform
DHN-RAD Deep Hashing Network for Retrieval-based Anomaly Detection
Dim. Red. Dimensionality Reduction Method
DINO DIrNification with NOisy labels
DISSIM Dissimilarity metric
DNN Deep neural network
DTW Dynamic Time Warping
DUBCNs Deep Unsupervised Binary Coding Networks
DV2 DINO V2
DWT Discrete Wavelet Transform
ECG Electrocardiogram


EDR Edit Distance on Real sequence
EEG Electroencephalogram
EffNV2 EfficientNet V2
EO-BoW Entropy-optimized BoW
ERP Edit Distance with Real Penalty
ESN Echo state network
FCN Fully Convolutional Network
FF Fusion Features
FRF Functional Relation Field
FrOKShape Fractional Order Shape-based k-cluster
GAN Generative Adversarial Networks
GAF Gramian Angular Field
GRU Gated Recurrent Units
GRF Gaussian Random Fields
GVR Graph Visual Rhythm
HB Heartbeat
HC Hierarchical clustering
HDB HDBSCAN
HDBSCAN Hierarchical Density-Based Spatial Clustering of Applications with

Noise
HP Hodrick Prescott
IPLA Indexable Piecewise Linear Approximation
IR Information Retrieval
KL Kullback-Leibler
kNN k-Nearest Neighbors
LAS Local Activity Spectrum
LBP Local Binary Pattern
LCSS Longest Common Subsequence
LHRR Log-based Hypergraph of Ranking References
LSH Locality-sensitive hashing
LSTM Long Short-Term Memory
MALSTM-FCN Multivariate Attention Long Short-Term Memory Fully

Convolutional Network
mAP Mean Average Precision
MCFGNN Multichannel fusion graph neural network
MDCNet Mode Decomposition and 2D Convolutional Network
MESAMN Multiscale Echo Self-Attention Memory Network
MF-Net multi-feature-based network
MHAP Multivariate Highly Activated Period
MI MotorImagery
MLP Multilayer Perceptron
MTF Markov Transition Field
MTRL deep multi-task representation learning method
MTS Multivariate Time Series
MTSC Multivariate Time Series Classification
N NATOPS
Net Network
NILM Non-Intrusive Load Monitoring


NN Neural Network
NSNP nonlinear spiking neural P
NSST on-subsampled shearlet transform
NTM Neural Turing Machines
P@k Precision@k
PAA Piecewise Aggregate Approximation
PCA Principal Component Analysis
PCF Polynomial curve fitting
PD PenDigits
PFD_DTW Polynomial function derivative features-based dynamic time warping
PSEUDo Interactive Pattern Search in Multivariate Time Series with

Locality-Sensitive Hashing and Relevance Feedback
R@k Recall@k
RDPAC Rank Diffusion Process with Assured Convergence
RegEx Regular Expression
ResNet Residual Network
RFE Rank Flow Embedding
RMSE Root mean squared error
RN152 CNN ResNet-152
RobDecomp Robust series decomposition block
RobTF Robust trend forecasting block
ROCKET RandOm Convolutional KErnel Transform
RP Recurrence Plot
SAB Seasonal component adjustment block
SampEn Sample entropy
SAX Symbolic Aggregate approXimation
SCARNet Stacked Convolution sequence AutoRegressive encoding Network
SCV Spectral change vector
Sil Silhouette Score
SOTA State-of-the-art
SpADe Spatial Assembling Distance
SSL Semi-supervised learning
Swale Sequence Weighted Alignment model
SWJ StandWalkJump
TF-IDF Term Frequency — Inverse Document Frequency
TQuEST Threshold Queries
T-encoder Transformer-based
T-decoder Transformer-based decoder
TS Time Series
TCCT Tightly-coupled convolutional Transformer
t-SNE t-distributed Stochastic Neighbor Embedding
UDLF Unsupervised Distance Learning Framework
UMAP Uniform Manifold Approximation and Projection
UTBCN Unsupervised Transformer-based Binary Coding Networks
VGG Visual Geometry Group
ViT Vision Transformers


List of Symbols

D(x,y) Pairwise distance measure between two vectors x and y
i Generic counter
j Generic counter
k Generic neighborhood parameter
kc Number of clusters
cc Cluster center
T Univariate time series
M Multivariate time series
Rm×n Distance matrix of two time series of sizes m and n, respectively
d Number of dimensions
kr Number of convolutional kernels in ROCKET algorithm
I Unit line vector
K Neighborhood parameter
P Set of points in a contour
α Angle
kb Beam angle statistics distance parameter
n Number of points or dimensions
xs Line segment
ys Line segment
T̃ Polar coordinate representation of time series
W Markov Transition Matrix
qbj

Quantile bin
wi,j Represents the frequency at which a point in quantile qbj

is followed by
a point in quantile qbi

T⃗ Trajectory of time series
drp Temporal delay
mrp Trajectory dimension
corr_preds Correct predictions in a labeled dataset
preds Total number of predictions
U Cluster
V Cluster
H(U) Entropy of cluster U
MI(U, V ) Mutual Information between clusters U and V
E(MI(U, V )) Expected mutual information
TP True positives, i.e., number of relevant retrieved items in a ranked list
ql Query l
ri Represents the relevance (irrelevant = 0 or relevant = 1) of the i-th

item in a ranked list
Nr Number of relevant items in a collection for a query q
z Mean standard deviation of a dataset
n_coefs Number of Fourier coefficients to keep in DFT [58]
norm_mean If True, center each time series before scaling in DFT [58]
norm_std If True, scale each time series to unit variance in DFT [58]


wavelet Wavelet family, containing important information such as the wavelet
filters coefficients, which are used in DWT

mode Signal extrapolation mode
L Ranked List size
LMULT Multiplier of ranked list size in RDPAC algorithm
ti Timestamp
G Graph
V Set of nodes in a graph
s Node in a graph
t Node in a graph
v Node in a graph
d(s, t) Shortest path from node s to node t
a Average path length
cb(v) Betweenness centrality of node v
cbn(v) Normalized betweenness centrality
σ(s, t) Count of shortest paths from s to t
σ(s, t|v) Count of shortest paths from s to t passing through v
e(v) Eccentricity of node v
E(v) Global Efficiency of node v
deg(v) Degree of of node v
Eloc(v) Local Efficiency of node v
V (v) Vulnerability of node v
G Temporal Graph
F Graph Visual Rhythm
mic Mean intra-cluster distance
mnc Mean nearest-cluster distance for each sample
Si Average dissimilarity of cluster i to all other clusters
Sj Average dissimilarity of the neighboring cluster of cluster i to all other

clusters
d(Ci, Cj) Similarity between centroids of clusters i and j
DW Dispersion within clusters
DB Dispersion between clusters
Lt Lower triangular matrix of symmetric distance matrix
||Lt(tk)||F Frobenius norm of Lt for a time stamp tk
lij Point in the Lt matrix
m Embedding dimension
r Tolerance
Tm(i) Time series segments
B Number of cases where D[Tm(i), Tm(j)](i ̸= j) < r
C Number of cases where D[Tm+1(i), Tm+1(j)](i ̸= j) < r
p Probability of a result being equal or superior to an observed result
C Collection of time series
Dt Time series descriptor
ϵ Function that extracts a feature vector
vTi

Feature vector of a time series Ti

ρ(i, j) Function that computes the distance between two time series Ti and Tj

based on their corresponding feature vectors


A Matrix of dimension n× n, where Aij = ρ(i, j)
τq Ranked list of time series Tq

R Set of ranked lists for each time series in C
ρr(i, j,R) More effective distance function used in unsupervised distance learning

algorithms
Aa Set of distance matrices
Ra Set of ranked lists sets
fa(Aa,Ra) Contextual rank aggregation function
Âc Resulting distance matrix of fa(Aa,Ra)
Rc Resulting ranked list based on Âc

fa(Ra) Contextual rank aggregation function that does not use the set Aa

σ Parameter of Gaussian Similarity
wgi,j

Gaussian Similarity
Hf Cost function
Le Number of labeled elements
ψi,s Represents the probability of the i−th node of a graph belonging to the

s−th class
t Iteration marker
e upper bound for the difference between consecutive iterations
ŷi Unknown label of i−th node of a graph
D Dataset
rl Percentage of labeled elements


Contents

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2 Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 Goals and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2 FOUNDATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Information Representation . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Time Series Representation . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5.1 1D-Signal-based Feature Extractors . . . . . . . . . . . . . . . . . . . . . 32
2.5.2 Image-Based Time Series Description . . . . . . . . . . . . . . . . . . . . 34
2.5.3 Deep Learning Methods as Image Feature Extractors . . . . . . . . . . . . 36
2.6 Unsupervised Distance Learning . . . . . . . . . . . . . . . . . . . . . 38
2.6.1 Log-based Hypergraph of Ranking References (LHRR) . . . . . . . . . . . 39
2.6.2 Rank Diffusion Process with Assured Convergence (RDPAC) . . . . . . . . 40
2.6.3 Breadth-First Search (BFS) Tree of Ranking References . . . . . . . . . . . 40
2.6.4 Rank Flow Embedding (RFE) . . . . . . . . . . . . . . . . . . . . . . . . 40
2.7 Univariate Time Series Datasets . . . . . . . . . . . . . . . . . . . . . 40
2.8 Evaluation Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.9 Software and Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . 43

3 RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.1 Univariate Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1.1 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1.4 Univariate Time Series Prediction . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Multivariate Time Series . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.1 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.4 Multivariate Time Series Prediction . . . . . . . . . . . . . . . . . . . . . 53
3.3 Image Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 55


3.3.1 Images to Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.2 Time Series to Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4 Literature GAPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4 RE-RANKING AND REPRESENTATIONS FOR TIME SERIES
RETRIEVAL: A COMPARATIVE STUDY . . . . . . . . . . . . . . . 58

4.1 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.1 Implementation Details and Parameters Definition . . . . . . . . . . . . . . 61
4.2.2 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.1 Retrieval Results – Quantitative Analysis . . . . . . . . . . . . . . . . . . . 62
4.3.2 Retrieval Results – Qualitative Analysis . . . . . . . . . . . . . . . . . . . 67
4.3.3 Retrieval Results – Analysis of Cases of Success and Failure . . . . . . . . . 69
4.4 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5 A RANKED-BASED FRAMEWORK BASED ON MANIFOLD
LEARNING FOR MULTIVARIATE TIME SERIES RETRIEVAL AND
CLASSIFICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.1 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.2 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2.3 Impact of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2.4 Retrieval Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2.5 Classification and Comparison with Other Approaches . . . . . . . . . . . . 78
5.3 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6 FRAMEWORK FOR DATA ANALYSIS BY TEMPORAL GRAPH
ENCODING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.1 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.1.1 Temporal Graph Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.1.2 Graph Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.1.3 Graph Visual Rhythms (GVR) . . . . . . . . . . . . . . . . . . . . . . . . 85
6.1.4 Image Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.1.5 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1.6 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.2.1 Clustering Quality Measurements . . . . . . . . . . . . . . . . . . . . . . . 88
6.2.2 Domain Specific Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . 89


6.2.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.3.1 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.3.2 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.4 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7 SEMI-SUPERVISED TIME SERIES CLASSIFICATION THROUGH
IMAGE REPRESENTATIONS . . . . . . . . . . . . . . . . . . . . . 100

7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.2 Semi-Supervised Classification . . . . . . . . . . . . . . . . . . . . . . 102
7.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.3.1 Experimental Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.4.1 Classification results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.4.2 Statistical analysis of pairwise distances and similarities . . . . . . . . . . . 106
7.5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

8 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.1 Contributions and Research Questions . . . . . . . . . . . . . . . . . 110
8.2 Publications and International Fellowship . . . . . . . . . . . . . . . . 112
8.3 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115


19

1 Introduction

Analysing time series is not only of paramount importance, nowadays, but also
a challenging task. It depends on available labels, time series representation, similarity
measures, and results’ interpretability. This work proposes four different approaches that
contribute to these directions.

This work discusses and presents contributions to the field of time series analysis
by using machine learning and information retrieval techniques. This introductory chapter
gives an overview of this work and is organized as follows: Section 1.1 discusses the
motivations of the research. Section 1.2 presents the challenges and research questions
approached during the development of the work. Section 1.3 discusses the goals and
contributions of the study. Section 1.4 outlines the structure of this work, including a brief
description of each chapter’s content.

1.1 Motivation
Time series data finds extensive applications in our daily lives, spanning diverse

domains such as medicine, economics, and science. ECGs [200] and EEGs [169],
yearly sales [24], sensors [180], weather forecasts [73], bitcoin charts [130], and general
finances [223] are just a few examples of how time series data is employed. Formally, a
time series refers to a set of numerical values arranged chronologically, with each point
directly correlated to preceding points [149].

Considering the broad application in different domains, the development of tools for
studying and analyzing time series is of central relevance. Additionally, various facets can
be taken into account based on the desired objectives. From a computational perspective,
supervised and unsupervised learning algorithms process time series data, utilizing either
labeled or unlabeled information, while semi-supervised learning tasks combine both
approaches [197]. Semi-supervised learning algorithms stand out when labeled data is
scarce, considering that obtaining labels can be expensive and a time-consuming task.
Classification and clustering algorithms, for instance, enable the labeling or grouping,
respectively, of time series samples, where series with the same label (or group) exhibit
similarity, while those with different labels (groups) are distantly related [83, 13]. Such
techniques have proven to be effective in detecting cancer in early stages [228], identifying
mobility patterns in populations [225], and enhancing poultry welfare through the analysis
of chicken behaviors captured by 3-axis sensors [4]. Forecasting tasks involve predicting
future values of time series based on historical data, encompassing applications like stock
market variations [223] and even ECG predictions [157].


Chapter 1. Introduction 20

Despite the large number of successful applications, the effectiveness of machine
learning tasks is directly dependent on the features extracted from the data. The features
are intrinsically associated with the data representation in the space, impacting similarity
measures. Time series features consider various aspects for data representation, including
shape [10], temporal correlation [204], as well as bag-of-words based models [111] and
kernel-based models [45].

Data representation and similarity measures also profoundly impact the effectiveness
of another critical area: Information Retrieval (IR). IR applications generally involve
managing, indexing, and retrieving data based on a given query, which represents an
information need [120]. The retrieval process revolves around ranking the most similar
elements relative to a given query. Therefore, data representation and similarity measures
directly influence the effectiveness of information retrieval tasks, as different features can
generate varying similarities for the same query, and different similarity measures can
result in distinct ranked lists. Moreover, traditional measures such as Euclidean distance
solely consider pair-to-pair comparisons, disregarding neighboring interactions, which could
be leveraged to compute improved similarity measures [193].

Information Retrieval has diverse applications in the realm of time series analysis.
For instance, it can be employed in plant phenology studies to identify the same species
in different locations based on time series obtained from satellite images [7]. Similar
techniques can be utilized to study the dispersion of volcanic plume height, which poses
risks to airplanes [135]. Additionally, selecting a slice of a time series as a query enables
the identification of similar slices, as exemplified in the case of distinguishing open and
closed eyes based on EEG signals [227]. Building information retrieval methods capable
of reaching high-quality results is a challenging task. An option to consider the existing
relationships within a dataset beyond traditional pairwise similarities is the utilization of
unsupervised distance learning algorithms after the information retrieval process. These
algorithms can consider the relationships within a neighborhood and compute more efficient
distances based on such information, enabling the generation of improved ranked lists [193].

Machine Learning and Information Retrieval algorithms are crucial tools for time
series analysis, especially when strategies from both areas are combined. Our objectives
are to explore the applicability of these methodologies in diverse contexts of time series
analysis. Additionally, we aim to assess the impact of different time series representations
and contextual similarity measures on these tasks.

1.2 Research Challenges
While time series have a wide range of applicability, analyzing it is very challenging.

There are diverse research challenges related to exploiting the encoded information in time


Chapter 1. Introduction 21

series and how to analyze such data properly. Following, some topics related to this work
are discussed:

• Information encoded in the time series: There are diverse characteristics
intrinsic to time series, such as shapes, frequency, trend, seasonality, among others.
How can we extract meaningful features for a required task? Which features are
more suitable for a machine learning or information retrieval task?

• Impact of distance measurements: Diverse tasks depend on a similarity measure
which allows to compare different time series. How do we find distance measurements
that translate effectively the relationship between the data points?

• Explore 2D representations of time series: There are diverse methods to
represent time series as images [204, 53]. How the information contained in the
images can be explored?

• Explore contextual information for effective time series information
retrieval: How can we explore more global relationships between time series to
perform information retrieval?

• Exploit information of each dimension of a multivariate time series:
Multivariate time series are composed of n−dimensional data points and each time
series dimension carries its own information. How to effectively explore information
encoded in the dimensions of a multivariate series?

• Temporal data analysis with lack of ground truth: Some time series datasets
do not have labels, and clustering analysis emerges in these scenarios. How do we
properly interpret the clustering results?

• Effective use of labeled and non-labeled time series data: How to properly
explore supervised and unsupervised information in time series data?

1.3 Goals and Contributions
Our hypothesis is that information retrieval and machine learning techniques can

be explored for effective time series analysis, especially under conditions of scarcity or
absence of labeled data. Given the direct time series applications in real-world scenarios,
our main goal is to obtain effective results on time series analysis by applying unsupervised,
supervised, and semi-supervised learning approaches for time series clustering, classification,
and similarity search. Also, our secondary goal is to study the impact of different time
series feature extractors in the performed tasks.


Chapter 1. Introduction 22

Efficiency is also important in machine learning and information retrieval tasks, but
the experimental evaluation performed in this work only focused on effectiveness. Works
such as [146, 69, 145, 195] discuss algorithmic complexity and efficiency in re-ranking
tasks.

Figure 1.1 presents the goals and contributions and how they relate to proposed
approaches, considering different supervision strategies.

Time Series
Analysis

Chapter 6 - Framework for Data
Analysis by Temporal Graph

Encoding

Chapter 5 - A Ranked-Based
Framework based on Manifold
Learning for Multivariate Time

Series Retrieval and
Classification

Chapter 4 - Re-Ranking and
Representations for Time Series
Retrieval: A Comparative Study

Chapter 7 - Semi-Supervised
Time Series Classification

through Image Representations

Investigate time series
feature extractors

Contextual rank
aggregation methods for
multivariate time series

analysis

Contextual rank-based
distance learning

methods for univariate
time series retrieval

Unsupervised distance
learning methods as pre-

processing step for
classification

Semi-supervised
approaches for time
series classification

Supervised

Unsupervised

Semi-
Supervised

Comparative study on the impact of feature
extractors and unsupervised distance

learning methods for univariate time series
retrieval.

A Framework for data analysis based on
time segmentation criteria, resulting in

multivariate time series. Series are
represented as GVR images and feature

extraction, dimensionality reduction,
clustering, and results analysis are

performed.

Application of imaging methods and deep
learning approaches for time series

representation and use of label propagation
for semi-supervised classification.

Unsupervised framework
for data analysis using

temporal graph encoding

Supervision
Types 

Goals and
Contributions  Chapters Approach Overview

Task:

Clustering
Retrieval

Classification

Retrieval and Classification

Use of feature extractors and rank
aggregation techniques for multivariate time

series retrieval and classification.

Figure 1.1 – Overview of goals and contributions of this work.

The proposed approaches directly relate to the research challenges discussed before.
Following, an overview of each goal and contribution is given:

• Investigate time series feature extractors: Diverse feature extractors were
applied in time series for different tasks, i.e. classification, clustering, and information
retrieval. Both 1D and imaging methods were evaluated. Features were extracted
from the images using deep learning approaches.

• Employ contextual rank-based distance learning methods for univariate
time series retrieval: Pairwise distance measurements often ignore contextual
information available in the dataset. Four unsupervised distance learning methods
were applied in time series datasets for computing more global distance measurements
between points in a dataset, by considering neighborhood relationships, and
performing re-ranking.

• Employ contextual rank aggregation methods for multivariate time series
analysis: To fully explore information present in the dimensions of multivariate time


Chapter 1. Introduction 23

series, we process each dimension individually and use rank aggregation methods to
obtain a final result for multivariate time series retrieval.

• Use unsupervised distance learning methods as a pre-processing step for
classification: This topic is directly correlated to the previous one. The resulting
similarity measure defined by the rank aggregation step is considered for performing
a kNN classification in a multivariate time series dataset.

• Proposal of an unsupervised framework for data analysis using temporal
graph encoding: We propose a novel clustering-based framework based on time
segmentation criteria, encoding information as multivariate time series, using
domain-specific variables to evaluate and enhance the interpretability of results.
This framework was validated for ball possession analysis in football matches.

• Investigate the application of semi-supervised approaches for time series
classification: To explore the information available in both labeled and unlabeled
time series datasets, we employ the label propagation algorithm for time series
semi-supervised classification.

1.4 Organization
This work is organized according to the main contributions obtained in our research,

which were published or submitted to conferences and journals. Below, the contents of
each chapter are briefly described, indicating the associated publication or submission:

• Chapter 2 - Foundation: defines the theoretical foundation of the work, presenting
main concepts, and formal definitions;

• Chapter 3 - Related Work: presents the related work, relevant to the topics
approached in this work;

• Chapter 4 - Re-Ranking and Representations for Time Series Retrieval:
A Comparative Study: presents the comparative study of univariate time
series retrieval approaches, considering different data representation and contextual
re-ranking methods. The content of this chapter was submitted to a journal;

• Chapter 5 - A Ranked-Based Framework based on Manifold Learning for
Multivariate Time Series Retrieval and Classification: discusses a framework
for multivariate time series retrieval and classification. The content of this chapter
was submitted to a journal;

• Chapter 6 - Framework for Data Analysis by Temporal Graph Encoding:
presents a framework for temporal data analysis by considering time segmentation


Chapter 1. Introduction 24

criteria. Experiments were conducted considering ball possessions from football
matches. The content of this chapter is in the final process of writing and it is going
to be submitted to a journal;

• Chapter 7 - Semi-Supervised Time Series Classification through Image
Representations: approaches the developed work about semi-supervised time series
classification using image representations. The content of this chapter is available in a
paper published in the proceedings of the International Conference on Computational
Science and Its Applications (ICCSA 2023) [163];

• Chapter 8 - Conclusions: presents the relationships between the research questions
and contributions and discusses future works.

Figure 1.2 presents the organization of the work.

Time
Series

Dataset

Feature
Extractors

Imaging
Methods

Provided
to

Deep Learning
Models

Provided to

Features

Extract

Extract
Distance Measures

Distance Matrices

Used to
compute

Dimensionality
Reduction

Raw data

Time Segmentation
Criteria

Graph
Representation

Graph
Measurements

Split of dimensions
in Multivariate Time

Series

Is
performed

Is
performed

Clustering
Used for

Used to
Calculate

Generates

Chapter 6 - Framework
for Data Analysis by

Temporal Graph
Encoding

Approached in

Approached in

Journal

In
Submission

to

Graph
Represented

as

Label Propagation
Used in

Semi-Supervised
Classification

Performs

Chapter 7 - Semi-
Supervised Time Series
Classification through
Image Representations

Approached in

ICCSA 2023

Published

Contextual Rank-
based Distance

Learning

Contextual
Rank

Aggregation

Consider

New Distance
Matrices

Compute

Unsupervised Re-
ranking

Exploits

New Ranked Lists

Ranked Lists

Are
computed

Are
computed Are

correlated
Chapter 4 - Re-Ranking and
Representations for Time

Series Retrieval: A
Comparative Study

Journal

Approached
in

Submitted
to

Fusion-based
Ranking Generates

kNN Classification
Used in

Are
correlated

Chapter 5 - A Ranked-Based
Framework based on Manifold
Learning for Multivariate Time

Series Retrieval and Classification

Journal

Submitted
to

Approached
in

Approached
in

Chapter

Conference

Journal

Main Task

Unsupervised
Distance Learning

Distance Matrix

Ranked ListTerms, Concepts
and Other Tasks

Measures Dimension
Split

Task
Block

Figure 1.2 – Organization of the work


25

2 Foundation

In this chapter, the main theoretical aspects of this work are defined. In Section 2.1,
the overview of the time series is presented. Section 2.2 presents the principal concepts
related to information retrieval. Section 2.3 describes the information representation
concepts, while Section 2.4 presents the main concepts about machine learning. Section 2.5
presents examples of time series descriptors. In Section 2.6, the core concepts of
Unsupervised Distance Learning are introduced. In Section 2.7 are described some
univariate time series datasets, the primary evaluation metrics for information retrieval
and classification tasks are defined in Section 2.8, and the leading software and frameworks
utilized in this work are described in Section 2.9.

2.1 Time Series
Time series are data that describe the evolution of one or more variables over time,

considering equidistant periods [55]. Formally, a time series of size n can be defined as:

Ts = (t1, t2, ..., tn), ti ∈ Rd, d ∈ N∗ (2.1)

Time series composed of a single observed variable over time, i.e., d = 1, are called
univariate [55], and an example to consider is a patient’s heartbeat. Figure 2.1(a) illustrates
a univariate time series.

Similarly, multivariate time series are composed of variables that exhibit temporal
dependence [26], i.e. d > 1, such as data on gas concentrations in the atmosphere measured
every hour. Figure 2.1(b) illustrates a multivariate time series.

For instance, we will define a univariate time series as T and a multivariate time
series as M .

(a) Univariate Time Series
(b) Multivariate Time series. Extracted

from [178]

Figure 2.1 – Univariate and Multivariate Time Series


Chapter 2. Foundation 26

Considering the definitions, time series are complex structures capable of encoding
a wealth of information. As a result, time series have a wide range of applicability,
including the medical field [200], industrial sector [57], financial domain [130], among
others. Therefore, analyzing and representing these structures are tasks of paramount
importance and quite challenging.

Also, aligning two time series is a difficult task. The alignment provided by
traditional distance measures, such as the Euclidean Distance, usually does not provide the
optimum alignment. In this context, the Dynamic Time Warping (DTW) metric calculates
the optimum non-linear alignment between two time series of any size [175]. The value
returned by the DTW function represents the distance between two-time series with better
alignment. Formally, let the time series Tx = (tx1 , tx2 , ..., txm) and Ty = (ty1 , ty2 , ..., tyn) and
Rm×n a matrix containing the distances D(Tx, Ty)i,j of the sequences Tx and Ty, where D
is, usually, the Euclidean Distance. The objective is to find two sequences a and b, both
of size l ≥ max(m,n), in which the index of combination a(i) corresponds to the time
series Tx and b(i) corresponds to the time series Ty, where the total cost of combinations∑l

i=1 D(Tx, Ty)a(i),b(i) are minimized. The alignment (a, b) matches the following conditions:

1. a(1) = b(1) = 1;

2. a(l) = m, b(l) = n;

3. (a(i+ 1), b(i+ 1)) − (a(i), b(i)) ∈ {(1, 0), (1, 1), (0, 1)}.

In Figure 2.2, it is possible to observe the comparison between the Euclidean
Distance’s linear alignment and the DTW algorithm’s non-linear alignment.

2.2 Information Retrieval
Information retrieval is an area of computer science that deals with the

representation, storage, organization, and access of items from diverse datasets, such
as documents and web pages, for example [158]. An information retrieval problem involves
a search task defined according to pre-selected criteria, which should return relevant results,
where the organization and representation of the selected items facilitate user access.

The significant growth in data generation, such as audio, images, 3D models, and
other objects, makes searching a challenging problem [201]. Unlike text-based search, which
has mature results [158, 119], conducting textual searches for multimedia data can be
unsatisfactory, as textual descriptions may not capture all the characteristics of the given
data [201] and may potentially introduce ambiguities.

In this context, Content-Based Retrieval (CBR) systems enable comparisons
between similar data based solely on their content [201]. The data is computationally


Chapter 2. Foundation 27

Figure 2.2 – Comparison between the Euclidean Distance alignment and DTW alignment.
Extracted from [36]

represented as feature vectors, considering approaches such as texture, color, and shape
for images [107], Zero Crossing Rate, and Short-time energy, for example, for audio [124],
among other features for different types of media. In this way, extracting feature vectors
for an entire dataset and enabling comparisons with any given query is possible. For each
query, a result list is calculated containing the distances, in ascending order, between the
elements of the dataset and the query [107].

Figure 2.3 illustrates a typical architecture of a CBR general model for time series
application adapted from [191]. This architecture consists of (i) an interface, (ii) a query
processing module, and (iii) a dataset. In the interface, user interaction occurs, where
data is inputted, and query results are returned. Data characteristics are extracted in the
query processing module, and ranking tasks are performed based on the obtained features.
Lastly, the data and their corresponding feature vectors are stored in the dataset to avoid
recalculations.

The distances between the obtained feature vectors can be calculated in different
ways. Let x = (x1, x2, ..., xd), xi ∈ R and y = (y1, y2, ..., yd), yi ∈ R, where d represents the
number of dimensions, be two feature vectors that represent times series. The following


Chapter 2. Foundation 28

Time Series Database

Feature
Vectors

Time
Series

Interface

Query Specification VisualizationData Insertion

Query-processing
module

Ranking

Distance
Computation

Feature Vector
Extraction

Query
Pattern

Similar Time
Series

Figure 2.3 – Typical architecture of a content-based time series retrieval system. Inspired in the
content-based image retrieval system presented in [191]

distance metrics can be used:

• Euclidean: D(x,y) =
√∑d

i=1(xi − yi)2;

• Chebysev: D(x,y) = maxd
i=1|xi − yi|;

• Manhattan: D(x,y) = ∑d
i=1 |xi − yi|;

• Minkowsky: D(x,y) = [∑d
i=1 |xi − yi|p]1/p;

• Canberra: D(x,y) = ∑d
i=1

|xi−yi|
|xi|+|yi| ;

• Cosine: D(x,y) = 1 − x·y
||x||||y|| .


Chapter 2. Foundation 29

In general, the Euclidean distance is widely used to calculate distances between
elements [139]. However, despite the importance of distance metrics, the quality of the
retrieved information is primarily influenced by the choice of the used representations
(features).

2.3 Information Representation
Describing a specific type of data is a crucial step in machine learning tasks, as

the representation of the dataset directly influences the effectiveness of the performed
tasks [132]. D-dimensional feature vectors represent each element in a dataset.

Methods used for data description depend on the type of information encoded in
the data. Various algorithms can be considered for feature extraction in the context of
images. Shape extractors, such as Beam Angle Statistics [9], Contour Features Descriptor
(CFD) [141], and Aspect Shape Context [113], allow for the study of different contours
present in the image. Color-based methods, such as Color Autocorrelogram (ACC) [76] and
Border/Interior Classification (BIC) [182], take into account color histograms of the images.
Texture-based methods consider elements such as bricks or velvet, which produce variations
in surface appearance. Examples include Local Binary Patterns (LBP) [133] and Local
Activity Spectrum (LAS) [189]. Other methods, such as Bag of Visual Words (BOVW) [47]
and deep learning-based models [89], can be mentioned in the scope of image feature
extraction. In the case of text, representations such as Bag-of-Words, TF-IDF [158, 119],
and again, deep learning-based approaches [110] are some examples found in the literature.
Time series representation will be approached in Section 2.5.

Given the presented overview, it is evident that various characteristics can be
explored in multimedia datasets. On the other hand, different feature extractors can be
used, leading to diverse results in machine learning tasks. Figure 2.4 illustrates extracting
features from multimedia content and calculating the distance measure between the
obtained features, resulting in a ranking score. The chosen representation for a given data
directly influences the calculation of distance measures and, consequently, the ranking of
similarities among items in the dataset. Therefore, it is crucial to select features carefully,
considering the dataset’s characteristics and the executed tasks.

2.4 Machine Learning
Machine learning is an area of artificial intelligence (AI) that involves learning

algorithms capable of performing tasks or predictions without being explicitly set, only
considering the characteristics of a provided dataset, finding patterns, and relevant
information for a specific task [98], e.g., pattern recognition [224], sales prediction [211],


Chapter 2. Foundation 30

Multimedia Content Feature Extraction

Distance Calculation

0.3464

Resulting Score

x
y

D(x,y)

Figure 2.4 – Feature extraction and distance calculation between two time series

simultaneous machine translation [77], among others.

There are diverse categories of machine learning algorithms, and their choice
depends on the problem to be solved and data availability. This work explored Supervised,
Unsupervised, and Semi-Supervised Learning, which will be explained below.

• Supervised Learning

Supervised Learning algorithms are trained on labeled data, i.e., data with class
information, and make decisions based on unlabeled data, i.e., unknown data [98]. The
patterns in labeled data are considered to build a model capable of making predictions
about the unknown data. The main supervised learning tasks are classification, in which
the main goal is predicting the class of the elements, such as in image recognition [72],
and regression, where the objective is to predict continuous values, such as forecasting
about sales [54]. Examples of supervised algorithms are: Linear Regression [62], Support
Vector Machines [20], Convolutional Neural Networks [72], Decision Trees [185], among
others. Despite their popularity and highly accurate results, supervised tasks require
large amounts of quality labeled data and elevated training time, making their execution
impracticable in many situations.

• Unsupervised Learning


Chapter 2. Foundation 31

Unlike supervised algorithms, unsupervised learning tasks do not require labels,
information about data, or user intervention, using only intrinsic data characteristics, as
similarities between the objects [97], to perform the task. We can cite as examples of
unsupervised learning tasks the dimensionality reduction, simplifying data while keeping
its characteristics, and clustering, where data is grouped based on the similarities between
points [97], among others. Examples of unsupervised learning algorithms are Principal
Component Analysis (PCA) [90], for dimensionality reduction, and K-Means [116], for
clustering. There are unsupervised distance learning algorithms in information retrieval
tasks, exploring intrinsic characteristics and contextual information on datasets to find
more global similarity measures, in contrast to the traditional pairwise measurements [214].
Unsupervised learning is an excellent tool for unlabeled data but lacks a ground truth for
performance evaluation.

• Semi-supervised Learning

Semi-supervised learning combines concepts of both Supervised and Unsupervised
Learning [29, 229], utilizing information present in the labeled and unlabeled data. An
example is the semi-supervised classification, where the classifier is trained in a small
sample of labeled data, and the information contained in the unlabeled data is also
considered for performing the task. The main advantage of semi-supervised methods is the
capability to perform with a scarcity of labeled data. Semi-supervised methods are divided
into transductive and inductive methods. Transductive algorithms do not produce a model,
considering only samples found during the trainment [198] for labeling data. When new
data is added, it is necessary to perform the trainment again to consider the new samples.
Otherwise, as traditional supervised methods, inductive algorithms build a model during
the training, utilized for classifying new elements [198]. As examples of semi-supervised
algorithms, we can cite Label Propagation [230] and Semi-Supervised Support Vector
Machine (S3VM) [88]. Diverse assumptions are made about data distribution [29], being
considered individually or in groups, depending on the semi-supervised method [198]. The
assumptions are:

• Clustering Assumption: Data points belonging to the same cluster must belong
to the same class;

• Smoothness Assumption: Samples close to each other in the feature space tend
to belong to the same class, and the decision boundary must be placed in regions
with low data density;

• Manifold Assumption: Data must belong to the same class if placed close to each
other in a lower dimensional representation.


Chapter 2. Foundation 32

2.5 Time Series Representation
A high-quality data representation is essential for achieving effective results in

machine learning tasks. Obtaining suitable feature vectors from time series is also a
crucial step in content-based information retrieval tasks, as the arrangement of ranked lists
depends directly on this information. Data representation can consider shapes, frequencies,
pattern repetitions, temporal correlation between points, and other characteristics.

Several methods for feature extraction in time series are briefly described below.
These methods were selected based on the variability of feature extraction approaches in
time series.

2.5.1 1D-Signal-based Feature Extractors

Effective machine learning relies on robust data representation, particularly for
time series. Feature extraction from time series is pivotal for accurate information retrieval,
encompassing various properties, such as shapes, frequencies, pattern repetitions, and
temporal correlations.

• Beam Angle Statistic

The Beam Angle Statistics (BAS) algorithm [10, 164] is a shape descriptor that
identifies concavities and convexities in contours and extracts related features. The method,
including in prior research [215, 192], has demonstrated promise [192]. By considering
a parameter kb, a contour described by points P = (p1, p2, . . . , pn) undergoes traversal.
Features are generated based on the angle αi, formed by the intersection of line segments
xs (connecting points (i − kb, pi−kb

) and (i, pi)) and ys (connecting points (i, pi) and
(i + kb, pi+kb

)), where i = kb, kb + 1, kb + 2, . . . , n − kb. This yields the representation
BAS = (α1, α2, . . . , αn−2kb

) for any contour. Figure 2.5 presents an example of the BAS
execution.

Figure 2.5 – Lines xs, ys and angle α for the point pi (i = 100 and 0 < kb < 50).


Chapter 2. Foundation 33

• Discrete Fourier Transform (DFT)

The Discrete Fourier Transform (DFT) preserves point count while transforming a
signal. It dissects time series into basis functions, where early functions capture trends
and later ones account for noise. Each function possesses a complex Fourier coefficient.
The initial coefficients approximate the time series [171]. Figure 2.6 shows an example of
DFT representation.

Figure 2.6 – Time series represented by DFT. Extracted from [151]

• Discrete Wavelet Transform (DWT)

The Discrete Wavelet Transform (DWT) breaks time series into diverse frequencies
across scales, reducing dimensions and noise [28]. It yields coarse-grained approximation
and fine-grained detail coefficients through specific wavelet functions involving convolution
and downsampling. Figure 2.7 shows a DWT representation.

Figure 2.7 – Time series and its DTW representation. Extracted from [123]

• RandOm Convolutional KErnel Transform (ROCKET)

ROCKET [45] is a time series classifier that can be used for feature extraction. This
method reached impressive results for many time series datasets and is computationally
inexpensive. The method is based on convolutional kernels generated randomly. It generates
kr convolutional kernels, extracting two features per kernel: global max pooling and the


Chapter 2. Foundation 34

proportion of positive values. Then, a linear classifier is trained on the features obtained
for the classification task. ROCKET excels in efficiency, boasting lower computational
complexity than alternatives, needing just one hyperparameter evaluation, the number of
kernels kr. Figure 2.8 presents the weights of the most representative kernels, based on
mutual information, for executing the algorithm.

Figure 2.8 – Weights of the most representative kernels. Extracted from [152]

The descriptors were selected based on many criteria. DWT and DFT were chosen
as representatives of traditional approaches, i.e., well-established methods in the field [27].
ROCKET is a recent algorithm that can be used for time series classification and feature
extraction. Still, to the best of our knowledge, this work was the first to apply it in
information retrieval tasks. The use of the BAS algorithm is based on [192]. BAS represents
the family of methods that rely on shape-based characterization.

2.5.2 Image-Based Time Series Description

The representation of times series through images which are subsequently exploited
for feature extraction is a promising approach [204, 53], as demonstrated in several
applications, ranging from phenology [60] to land cover analysis studies [122, 49, 48]. In
those methods, time series patterns are encoded in bi-dimensional representations, and
pre-trained models (e.g., based on transfer learning procedures relying on Convolutional
Neural Networks and Transformer-based models) are used for feature extraction. In the
following, we discuss the approaches used for obtaining the bi-dimensional representations
and the deep learning approaches employed for feature extraction. Figure 2.9 presents a
time series and its respective 2-dimensional representations.

• Gramian Angular Fields (GAF) [204]


Chapter 2. Foundation 35

(a) Time series (b) GAF (c) MTF (d) RP

Figure 2.9 – Time series and its respective image representations

This method generates an image based on polar coordinates of a time series T .
It transforms normalized series T into T̃ using polar coordinates in [−1, 1]. Using T̃ , a
Gramian Angular Field Matrix (N ×N) forms, depicted in Equation 2.2:

GAF = T̃ ′T̃ −
√
I − T̃ 2

′√
I − T̃ 2 (2.2)

Here, I is a unit line vector. Each point GAFi,j in the Gramian Angular Field
Matrix calculates the trigonometric sum of angles between corresponding points in T̃ ,
representing the temporal correlation between different time intervals in the series.

• Markov Transition Fields (MTF) [204]

This method creates time series images based on Markov transition probabilities.
Given a time series T , it is identified Q quantile bins and assigned each ti to the
corresponding bin qbj

(j ∈ [1, Q]), that divide the sorted series into equal subsets. A
Q×Q weighted adjacency matrix W is constructed by counting transitions among quantile
bins, employing a first-order Markov chain methodology along the temporal axis. wi,j

represents the frequency at which a point in quantile qbj
is followed by a point in quantile

qbi
. After normalization by ∑

j wi,j = 1, we have the Markov Transition Matrix, defined as
in Equation 2.3:

W =


w11|P (txϵqb1|tx−1ϵqb1) ... w1Q|P (txϵqb1|tx−1ϵqbQ

)
w21|P (txϵqb2|tx−1ϵqb1) ... w2Q|P (txϵqb2|tx−1ϵqbQ

)
... ... ...

wQ1|P (txϵqbQ
|tx−1ϵqb1) ... wQQ|P (txϵqbQ

|tx−1ϵqbQ
)

 (2.3)

W is insensitive to the distribution of the time series T and temporal dependency
on time steps xi. This results in information loss. Next, a Markov Transition Field (MTF)
of dimensions N ×N is built per Equation 2.4:

MTF =


wij|t1ϵqbi

, t1ϵqbj
... wij|t1ϵqbi

, tnϵqbj

wij|t2ϵqbi
, t1ϵqbj

... wij|t2ϵqbi
, tnϵqbj

... ... ...

wij|tnϵqbi
, t1ϵqbj

... wij|tnϵqbi
, tnϵqbj

 (2.4)


Chapter 2. Foundation 36

Here, MTFi,j is the probability of transitioning from quantile bin qbi
to qbj

.
Specifically, MTFi,j||i−j|=k denotes the probability of transitioning between points with a
time interval of k.

• Recurrence Plots (RP) [53]

A Recurrence Plot (RP) is a binary depiction of a time series, revealing temporal
correlations. Given time series T , its trajectory T⃗ = (t⃗i, ⃗ti+drp , . . . , ⃗ti+(mrp−1)drp) is obtained
for i ∈ 1, 2, . . . , n− (mrp − 1)drp, where drp is temporal delay and mrp is trajectory
dimension. RPi,j becomes 1 if Euclidean distance between t⃗(i) and t⃗(j) is ≤ k. For
X = n − (mrp − 1)drp, the Recurrence Plot matrix, X × X, is computed according to
Equation 2.5:

Ri,j =

1, if ||x⃗(i) − x⃗(j)|| ≤ k

0, else,
(2.5)

∀i, j ∈ 1, 2, . . . , X

2.5.3 Deep Learning Methods as Image Feature Extractors

Images hold diverse information like patterns, colors, textures, and shapes.
Leveraging pre-trained neural networks via transfer learning has exhibited substantial
potential in diverse domains due to robust generalization and feature learning [115].
In this research, we utilized five neural networks for feature extraction, pre-trained on
ImageNet [46].

• CNN ResNet-152

ResNet [72] represents a Convolutional Neural Network series explicitly built for
image recognition tasks. The ResNet-152 variant, characterized by its depth comprising
up to 152 layers, effectively addresses the challenge of gradient vanishing existing in deep
architectures by leveraging skip connections. These skip connections play a fundamental
role in augmenting the flow of gradients during the backpropagation phase, promoting
the efficient training of deep networks. The feature extraction output is sourced from the
terminal’s fully connected layer of the network architecture, which has 2048 dimensions.

• Vision Transformers

Vision Transformers (ViT) [96] demonstrate particular efficacy in object detection
and image classification tasks. Thanks to the self-attention mechanism, ViTs can
simultaneously focus on essential parts of the image. The input image transforms patches,
which are subsequently processed by an encoder to produce hidden states. A decoder
component generates output tokens through layers of self-attention and feedforward


Chapter 2. Foundation 37

networks. Masked self-attention within the decoder selectively focuses on preceding tokens.
In the context of feature extraction, we utilize the CLS token, as delineated in the work
by Kim et al. [95]. The obtained features contain 1280 dimensions.

• VGG-19

VGG-19 [179] is a Convolutional Neural Network developed by the Visual Geometry
Group (VGG). Composed of 19 layers, it features 16 convolutional layers stacked
sequentially, followed by max-pooling layers for dimensionality reduction. The network
closes with three fully connected layers. For feature extraction, the last fully connected
layer before classification was utilized. The resulting features are 4096-dimensional.

• EfficientNet V2

EfficientNetV2 [187] is a family of convolutional neural networks based on the
EfficientNet architecture. Like its predecessor, EfficientNetV2 employs the compound
scaling method to balance the network’s depth, width, and resolution. By adjusting these
aspects, EfficientNetV2 can be resized efficiently to work well on different devices, even
those with limited resources. While retaining core components such as depthwise separable
convolutions, inverted residual blocks, and squeeze-and-excitation modules, EfficientNetV2
introduces optimizations and improved normalization layers to stabilize and accelerate the
training process. Advanced training techniques are also employed, including learning rate
schedules, data augmentation strategies, and regularization methods. The Large model
variant was selected for feature extraction, utilizing the last average pooling layer as output,
and the obtained features contain 1280 dimensions.

• DIrNification with NOisy labels (DINO) V2

DINO V2 [134] is a self-supervised vision transformer-based model developed
by Meta AI. The model is pre-trained based on the "Instance Discrimination with No
Annotations"(INDA), meaning no labeled data is required, only a large dataset of unlabeled
images. Richer representations are obtained as the model is trained directly on image
data. DINO V2 has two main components: an encoder and discriminative self-supervised
learning. The first is responsible for extracting visual features from images. It consists
of multiple layers of Transformer blocks, each composed of a self-attention layer and a
feed-forward network. This step generates a representation that captures crucial visual
features of the image. The second is a Transformer-based discriminator consisting of
multiple layers of Transformer blocks. The discriminator receives global representations of
different augmented versions of the same image, and its task is to differentiate the different
versions. The discriminator performance directly impacts the encoder training. Confusing
original and augmented versions of an image means that the encoder needs to improve
its feature extraction, as the features are not fully capturing the image’s identity. Also, if
the discriminator struggles to differentiate unrelated images, there may be issues with the


Chapter 2. Foundation 38

training, or the encoder needs to refine its visual understanding. Through this feedback
loop, the encoder is helped to learn to produce robust features that capture the essence of
the image and differentiate between different images and their variations. The raw model
was used for feature extraction [74], generating features with 384 dimensions.

2.6 Unsupervised Distance Learning
The feature vectors associated with dataset samples can vary in spatial distribution

within the feature space, potentially leading to severe impacts on machine learning tasks.
On the other hand, unsupervised distance learning algorithms can improve retrieval and
machine learning effectiveness [146, 193] by enhancing sample representations based on
their rearrangement in the feature or distance space. We follow the retrieval models
proposed for image search [37] and re-ranking [142, 150]. Formally, let C = {T1, T2, . . . , Tn}
be a collection composed of n time series. Let Dt be a time series descriptor, which can be
defined by the tuple (ϵ, ρ), where ϵ : Ti −→ Rd is a function that extracts a feature vector
vTi

from a time series Ti, and ρ : Rd ×Rd −→ R+ is a function that computes the distance
between two time series Ti and Tj based on their corresponding feature vectors, formally
defined as ρ(ϵ(Ti), ϵ(Tj)) or simply ρ(i, j). The distance ρ(i, j) for each pair (i, j) can be
computed to form a square matrix A of dimension n× n, where Aij = ρ(i, j). Based on
the distance ρ, a ranked list τq is computed for each time series Tq present in the collection.
The ranked list τq = (T1, T2, . . . , Tn) is a permutation of the collection C, where τq(i) is
the rank of the time series Ti in τq. The rank is defined according to the distance measure,
such that if τq(i) < τq(j), then ρ(q, i) < ρ(q, j). The set of ranked lists for each time series
in C is defined as R = {τ1, τ2, . . . , τn}. The re-ranking task can be seen as recalculating
the distance ρ using a more effective distance function ρr. The function ρr(i, j,R) aims
to explore the contextual information present in the set R, analyzing the relationships
existing in a k-neighborhood of the ranked lists and improving the effectiveness of distances
between time series. Formally, ρr : C × C × R −→ R+ is a distance function between time
series Ti and Tj that takes into account the contextual similarity information contained in
the set R. [195].

Rank aggregation tasks can also be performed. Let Aa = {A1, A2, ..., An} a set of
distance matrices from a dataset and Ra = {R1,R2, ...,Rn} a set of the corresponding
ranked lists. The objective is utilizing the sets Aa and Ra to obtain a new distance matrix
Âc, in which Âc = fa(Aa,Ra). With the new distance matrix Âc, it is possible to calculate
a new ranked list Rc. Some rank aggregation methods do not utilize the set Aa, calculating
directly the new ranking Rc, being characterized by Rc = fa(Ra).

Figure 2.10 illustrates unsupervised learning methods’ ability to consider the
dataset’s geometry. Query points, highlighted with triangles, are defined in each moon of


Chapter 2. Foundation 39

the dataset, and the remaining points are assigned colors based on their closest query point.
In (a), Euclidean Distance is used for distance calculation. It can be observed that the
metric could not classify the data correctly as it does not consider the dataset’s structure.
In (b), the Reciprocal kNN Graph+CCs method [143] is used to recalculate the distances,
which can consider the dataset’s geometry and correctly classify the elements of the two
moons.

(a) Euclidean Distance (b) Unsupervised Distance Learning

Figure 2.10 – Dataset TwoMoons: the points are colored considering the lower distance between
the query points (triangles). In (a), it utilizes the Euclidean Distance, and in (b),
it utilizes an Unsupervised distance learning method. Figure taken from [143]

The ability to incorporate the complete dataset structure significantly influences
machine learning tasks, particularly in retrieval scenarios. Numerous methods, initially
designed for image retrieval, have extended their application to various domains, including
audio [41], video [42], and time series [7, 170].

Below, four recent unsupervised distance learning methods used in this work are
described.

2.6.1 Log-based Hypergraph of Ranking References (LHRR)

Hypergraphs generalize graphs with hyperedges, which connect multiple vertices and
can represent higher-order similarity relationships. The LHRR method uses a hypergraph
model from ranking data, assigning hyperedge weights via a log-based function. Hyperedges
provide contextual dataset representation, enhancing similarity measurement efficiency
through the product of hyperedge similarities [146].


Chapter 2. Foundation 40

2.6.2 Rank Diffusion Process with Assured Convergence (RDPAC)

RDPAC [69] is an unsupervised distance learning algorithm with low computational
complexity that exploits a diffusion process considering only the top-k positions from
ranked lists. The algorithm starts with a similarity measure based on ranking information,
and then normalization is performed to improve the ranking symmetry. The third step is a
rank diffusion process with assured convergence that requires a small number of iterations.
A post-diffusion step is performed to explore the reciprocal rank information.

2.6.3 Breadth-First Search (BFS) Tree of Ranking References

This algorithm constructs a tree using the breadth-first search on a graph created
from dataset ranking references [148]. Starting from a query object root, the first level
includes top-k similar objects according to rankings. Subsequent levels stem from top-k
objects in the first level’s ranking references The BFS-Tree can represent similarity
information between the query and its k neighbors in a dataset. Objects occurring at
different levels of the structure possibly exhibit higher similarity, while images occurring
infrequently may indicate non-relevant elements or noise [148].

2.6.4 Rank Flow Embedding (RFE)

RFE is an algorithm proposed for both retrieval and classification tasks, considering
contextual information to obtain better results [193]. RFE comprises five steps. The first
is a ranked list normalization considering a sigmoid score based on reciprocal ranked
list positions. Then, a re-ranking step is performed utilizing a hypergraph structure.
The obtained hypergraph is also utilized for computing embeddings. Then, another
two steps are performed for re-ranking, utilizing Cartesian Product and Connected
Components, which are defined based on the hypergraph structures. More effective
embeddings are computed based on the element similarity and its identified Connected
Components for semi-supervised classification tasks. Enhanced unsupervised retrieval and
semi-supervised classification are achieved through stepwise improvement of rank-based
similarity information.

2.7 Univariate Time Series Datasets
This section introduces seven univariate time series datasets sourced from the UCR

Archive [39], with several sizes, number of classes, time series lengths, and domains. The
UCR Time Series Archive is the primary source of univariate and multivariate time series
datasets with a wide range of domains and sizes. Not all datasets from UCR Time Series
Archive were contemplated due to restricted computational resources and time.


Chapter 2. Foundation 41

• Cylinder-Bell-Funnel (CBF) [167]: A simulated dataset with 930 elements of
size 128, divided into 3 classes. Each class consists of standard normal noise plus an offset
term that varies for each class.

• ECG5000 [31]: Derived from a 20-hour-long ECG record called BIDMC
Congestive Heart Failure Database (chfdb). It contains 5000 pre-processed heartbeats of
length 140, randomly selected from the dataset "chf07", with five classes.

• Yoga [207]: Composed of 3300 time series of size 426, generated from videos
of actors transitioning between yoga poses. Each image was converted to a time series
considering the distances between the actor’s contour and the image center. The task is to
distinguish between male and female actors.

• Electric Devices [12]: Contains 16637 time series of length 96, divided into 7
classes. The data represents daily power consumption measurements of devices from 187
households, including washing machines, ovens, dishwashers, kettles, immersion heaters,
cold groups (fridge, freezer), and screen groups (computer, television). The goal is to
classify each device based on its daily measurements.

• GunPoint [156]: This dataset is based on the movement of drawing a gun from a
holster, aiming at a target, and then returning the gun to the holster. The dataset contains
two classes: "Gun-Draw,"where actors use a gun replica to perform the movement, and
"Point,"where they use their index finger to point at the target. The time series is captured
by mapping the X-axis coordinates of the centroid of the actor’s right hand. It consists of
200 time series, each with a length of 150.

• Beef [5]: 60 samples of silverside beef were equally divided into five classes:
one of pure beef, and the other four classes are contaminated samples with 20% w/w of
tripe, liver, kidney, and heart, respectively. Meat samples were submitted to cooking in
a microwave oven for a limited time in different power settings. From both cooked and
uncooked samples, spectrograms were obtained in the same conditions, and a chemometric
analysis was conducted considering 470 data points. The problem consists of distinguishing
each class.

• Rock [14]: Contains 70 rock samples from the ASTER spectral library. The series
corresponds to the spectral reflectance as a function of wavelength in microns, not time.
There are 4 types of rocks, containing time series of lengths of 2844, and the problem
consists in distinguishing them.

This Section described diverse univariate time series datasets, used in two different
experiments. Chapter 5 presents multivariate time series datasets used in the experiments.


Chapter 2. Foundation 42

2.8 Evaluation Measures
The evaluation of the effectiveness of retrieval and machine learning tasks relies on

evaluation measures [158, 163], which will be discussed in this Section.

• Accuracy

This metric, widely used to evaluate classification tasks, calculates the percentage
of elements correctly labeled within a dataset that has been labeled [168]. The multi-class
accuracy is defined by Equation 2.6:

accuracy = corr_preds
preds

, (2.6)

where corr_preds stands for the correct predictions in a labeled dataset, and preds is the
total number of predictions.

• Adjusted Mutual Information (AMI)

AMI is a statistical measure, utilized to evaluate clustering tasks, that quantifies
the similarities between two sets of clusterings, considering the Mutual Information (MI)
score and correcting the chance agreement [199]. The AMI between two clusters is given
as Equation 2.7:

AMI(U, V ) = [MI(U, V ) − E(MI(U, V ))]
[avg(H(U), H(V )) − E(MI(U, V ))] , (2.7)

where E(MI(U, V )) is the expected mutual information and H(U) is the entropy of a
cluster U .

• Precision@k

The Precision is widely used to evaluate retrieval tasks and aims to evaluate the
proportion of relevant items among all the retrieved items in a depth k. It is defined as
follows in Equation 2.8:

P = TP@k
k

, (2.8)

where TP@k represents the number of relevant retrieved items in a depth k of the ranked
list.

• Recall@k

Recall is also utilized to evaluate retrieval tasks and measure the proportion of
relevant instances retrieved. Recall is defined as follows in Equation 2.9:

R = TP@k
TP

, (2.9)

where TP@k stands for the number of relevant retrieved items in depth k of the ranked
list, and TP is the total number of retrieved relevant items in a ranked list.


Chapter 2. Foundation 43

• Mean Average Precision (mAP)

Mean Average Precision (mAP) is a common metric for evaluating the effectiveness
of ranked lists obtained in information retrieval tasks. This metric calculates the average
of the Average Precision (AP) for each query [231]. The mAP is defined in Equation 2.10:

MAP =
∑Q

l=1 AP (ql)
Q

, (2.10)

where Q is the number of queries and ql represents a query l. The Average Precision
(AP), as defined in Equation 2.11, is the average precision for a recall varying from 0 to
100% [231]:

AP = 1
Nr

d∑
i=1

(ri

1

i∑
j=1

rj), (2.11)

where Nr is the number of relevant items in a collection for a query q and ri represents
the relevance (irrelevant = 0 or relevant = 1) of the i-th item in a ranked list of size d.

2.9 Software and Frameworks
This Section briefly presents the main frameworks and software utilized in this

work. Their main uses are also discussed.

• UDLF [194]: This is a framework for unsupervised distance learning, implemented in
C++, and containing diverse algorithms, including the ones explained in Section 2.6.
It performs re-ranking and contextual rank aggregation starting from distance,
similarity, or ranking matrices;

• pyts [59]: This is a Python programming language library designed with tools
for time series processing. It contains machine learning algorithms for time series,
datasets, preprocessing, and utility tools;

• Matplotlib [79]: This Python library is utilized for image and graphics generation
and manipulation. Besides the graphic generation, we also utilize Matplotlib to
generate the images described in Section 2.5.2;

• Scikit Learn [140]: A complete machine learning library for Python. It contains
machine learning algorithms, evaluation measurements, and pairwise distance
measurements, among other tools;

• NumPy [70]: A fundamental tool in Python for manipulating arrays and matrices;

• Tensorflow [3] and PyTorch [138]: These are two Python libraries for deep
learning algorithms. Mainly utilized to execute the neural networks described in
Section 2.5.3.


44

3 Related Work

Time series data contains diverse encoded information, and effectively analyzing
them poses significant challenges. Additionally, the increasing production of time series
datasets demands efficient management strategies. One way to work with time series
datasets is through information retrieval tasks.

In information retrieval problems applied to time series, two evaluation approaches
depend on the available data. In problems involving a dataset with multiple time series, a
single series is selected as a query, and other series are ranked in ascending order based
on a distance measure, representing their similarity to the query [7]. For instance, in a
database containing time series derived from satellite images for studying plant phenology,
one might want to retrieve similar data to a given query series to identify the same plant
species in different regions [7]. On the other hand, problems with a dataset containing a
single time series involve selecting a segment of defined size and ranking segments closest
to the query based on a distance measure [227]. For example, in a time series containing
an electroencephalogram associated with eye states (open or closed), one might select a
segment where the eyes are in a specific state and then retrieve more segments with the
same eye state [227].

Over the years, information retrieval in time series has been approached in various
ways and contexts. Numerous models have been proposed, ranging from general approaches
for diverse data scopes to specific methods for applications such as analyzing time series
obtained from satellite images captured over time [135], or techniques tailored for searching
multivariate time series [227], for instance.

Other tasks provide alternative perspectives on time series analysis. Tasks
like clustering and classification allow grouping sets of time series considering data
characteristics, revealing patterns within a dataset. This can be used to analyze changes
in population mobility patterns caused by external factors [225], detect early-stage
cancer [228], or predict disease outbreaks [71]. On the other hand, forecasting tasks
aim to predict future values of a time series. Relevant applications include financial market
predictions [223] and echocardiogram forecasting [157], which can identify abnormal
heartbeat patterns and enable early medical intervention.

The literature presents a variety of methods for time series analysis. Although
information retrieval is a crucial task in data processing, it is a relatively unexplored
area in the context of time series compared to labeling and forecasting tasks. Section 3.1
discusses models applied to univariate time series, while Section 3.2 presents methods
for multivariate time series. Section 3.3 discusses techniques for processing time series


Chapter 3. Related Work 45

generated from images and methods that encode time series as images. Finally, Section 3.4
briefly discusses the literature gaps filled by this work.

3.1 Univariate Time Series

3.1.1 Information Retrieval

Analyzing information in time series efficiently and effectively is a challenging task.
Over the years, various methods have been proposed to address and optimize this issue.
Univariate time series are more common, and much of the existing research focuses on
handling this data type.

Comparative studies, such as the one proposed by Ding et al. [50], allow for
evaluating and comparing various available time series retrieval methods. In this work,
comparisons are made among 8 different time series representation methods, including
DFT, DCT, DWT, PAA, APCA, SAX, CHEB, and IPLA, as well as 9 similarity metrics,
including SpADe, TQuEST, ERP, EDR, LCSS, DTW, DISSIM, Swale, and Euclidean,
across 38 univariate time series datasets. This study provides a wide range of results for
various datasets.

Furthermore, another challenging aspect in the field is dealing with extremely large
datasets. Retrieving information from large datasets, not only in the context of time series
but also for any multimedia data, is a computationally demanding task since comparing a
query with all elements in the dataset is highly costly. In this regard, various algorithm
optimizations are made to reduce the search time. In [155], several optimizations for
distance metrics (such as Euclidean distance and DTW) are presented, enabling retrieval
tasks to be performed much faster than other works developed during the same period.
The algorithm DTW, which accomplishes retrieval quickly and delivers comparable results
to the state of the art in the period, demonstrated to be consistent. Recent approaches
propose tackling this issue differently, as seen in [172]. In this work, an adaptive model for
time series information retrieval is proposed, based on relevance feedback, to retrieve a
small number of series per query, considering large datasets. As searches are performed,
the algorithm builds a collection of patterns existing in the time series, aiding in adapting
the algorithm to perform more efficient searches.

In addition to relevance feedback, other machine learning methods can be used
to improve the efficiency and effectiveness of information retrieval results. For instance,
in [137], a supervised dictionary learning method, called EO-BoW, is introduced for
optimizing the Bag-of-Words (BoW) representation in Information Retrieval. The method
utilizes an entropy-based optimization criterion, suitable for retrieval tasks rather than
classification, following the cluster hypothesis. Experiments conducted in time series


Chapter 3. Related Work 46

databases, and data from other domains, demonstrate that EO-BoW improves retrieval
performance, and reduces encoding time and database storage requirements. Moreover,
EO-BoW maintains its representation ability when used for retrieving objects from unseen
classes during training. Chen et al. [30] propose the deep multi-task representation learning
method (MTRL) for time series classification and retrieval tasks, leveraging supervised and
unsupervised information. Supervised information for classification tasks aims to reduce
intra-class differences and increase inter-class differences. Unsupervised information for
retrieval tasks aims to preserve DTW distances between pairs. These tasks share networks,
so the information obtained from one task can benefit the other. The results presented in
the experiments outperform the other state-of-the-art methods at the time.

Often, labeled information is not available, making supervised learning tasks
infeasible. In [21], the problem of content-based time series retrieval (CBTSR) is addressed,
which bears similarities to the proposal of this work. Using satellite sensor images, the goal
is to retrieve data related to abrupt changes. For this purpose, features of the type "change
vector analysis"(CVA) are extracted, and a clustering and spectral change vectors (SCV)
based approach is used for retrieval: the elements to be retrieved have SVCs in the cluster
to which the query belongs. This membership is verified using Euclidean distance, and
other distance measures can also be utilized. The results, considering the top 20 retrieved
elements, achieve precision rates between 89% and 99%.

3.1.2 Clustering

Clustering tasks in time series data are also fundamental, and several approaches
have been proposed. Many current clustering methods consider traditional distance
measures, which do not take shape characteristics into account. In this context, algorithms
that consider this property have become popular for time series data. The "Fractional
Order Shape-based k-cluster (FrOKShape)"algorithm [109] utilizes a distance measure
based on multivariate shape by normalized fractional order cross-correlation and the "DTW
Barycenter Averaging (DBA)"measure for centroid calculation. The obtained results are
comparable to the "KShape"algorithm, which is widely used for time series data. The
centroid approach is similar to the popular "K-Means"algorithm in the literature, which
performs well in various domains. However, the technique used to calculate centroids is
not satisfactory for handling time series data. In this regard, Krishnan et al.[83] propose
adaptations in the clustering algorithm, initializing centroids in the farthest neighbors,
contrary to the traditional approach that initializes centroids randomly. Additionally,
distance calculations are performed using the DTW algorithm. Finally, the clustering
is done using Kohonen maps. This approach showed better performance and scalability
than traditional methods. Other approaches, such as hierarchical clustering, can also
be applied to time series data. By using a hierarchical clustering method based on


Chapter 3. Related Work 47

tail dependence coefficients, the authors propose a linkage method based on the tail
dependence between clusters merged at each iteration[43]. The proposed linkage method
showed superior results compared to traditional approaches. Another relevant approach
is presented in [153], where a framework for clustering is proposed, involving obtaining
geometric representations of time series data and then performing clustering tasks. The
best results were achieved using the UMAP and HDBSCAN methods. The work proposed
in [8] evaluates traditional clustering algorithms with three proposed time series similarity
measurements, using meteorological and synthetic data, and it is noteworthy that the
results between the measurements are similar, and the clustering results are strongly
correlated to the parameters. Kang et al. [91] propose the PFD_DTW_HC method for
clustering time series. First, Hodrick Prescott (HP) filtering is applied to reduce noise and
redundancy in the raw time series, then, the polynomial curve fitting (PCF) is applied to
derive a globally continuous function fitting for time series. The time series is transformed
into a multi-order derivative feature sequence. The designed polynomial function derivative
features-based dynamic time warping (PFD_DTW) algorithm is performed to determine
the distance between two equal or unequal granular length time series, and hierarchical
clustering is applied. The proposed method has advantages in capturing trend information
and interpretability and is more precise than other clustering alternatives, but is sensible
to outliers and has high computational complexity.

The clustering of time series data has various applications. For instance, the method
proposed in [78] enables the identification of the condition of an aluminum reduction
cell using clustering. The method utilizes fuzzy c-means in combination with the DTW
distance for clustering the current states of the anodes, along with the pseudo resistance
of an anode-cathode path. Finally, a clustering validation index is used to find the ideal
number of clusters. Another notable work is the proposal of the "KDetect"algorithm [176],
designed for unsupervised anomaly detection in cloud systems. Time series representing
the CPU consumption of unknown services are used in a clustering algorithm, enabling
anomaly detection. In the experiments, the precision and recall values exceeded 98%.
Moreover, the clustering of time series data enabled the analysis of changes caused by the
COVID-19 pandemic. Motivated by the analysis of pattern changes in mobility due to
COVID-19, Zhang et al. [225] propose the use of the "K-Means"method with the DTW
distance measure, achieving more satisfactory results, although with high execution time.

3.1.3 Classification

Unlike clustering, which is an unsupervised learning method, classification tasks
allow data labeling based on a pre-identified set. Among the challenges in classification
problems is the selection of good features. Lee et al. [103] propose a representation
method for open-ended univariate time series that is real-time, normalization-free, and


Chapter 3. Related Work 48

parameter-tuning-free. Each data point of the time series is converted into a root mean
squared error (RMSE) value based on Long Short-Term Memory (LSTM) and a Look-Back
and Predict-Forward strategy on the fly. Experiments suggest that the approach is suitable
and can be easily deployed on commodity computers and used as a pre-processing tool
for real-time time series analysis, such as clustering and classification. Especially in the
classification field, a model for selecting temporal features is proposed using a random forest
for classification performed by a FCN (Fully Convolutional Network). Several experiments
demonstrate the effectiveness of this method [85]. Building on the issue of features in
time series data, another method called Adaptive Feature Fusion Network (AFFNet) was
created. It combines temporal and data features extracted using a dynamic multiscale
CNN model. Distance features are extracted from a prototype distance network model.
Finally, a feature fusion model is utilized for classification, and the proposed method
achieved superior results compared to the state-of-the-art for many tested datasets [203].

While the utilization