Classificação de genótipos de soja quanto os seus atributos fisiológicos usando aprendizagem de máquina e diferentes sensores espectrais

Santos, Regimar Garcia dos [UNESP]

Classificação de genótipos de soja quanto os seus atributos fisiológicos usando aprendizagem de máquina e diferentes sensores espectrais

Arquivos

santos_rg_dr_ilha.pdf (2.09 MB)

santos_rg_dr_ilha.pdf

Data

2026-01-06

Autores

Santos, Regimar Garcia dos

Orientador

Teodoro, Paulo Eduardo

Coorientador

Teodoro, Larissa Pereira Ribeiro

Pós-graduação

Agronomia - FEIS

Editor

Universidade Estadual Paulista (Unesp)

Tipo

Tese de doutorado

Direito de acesso

Acesso aberto

Arquivos

santos_rg_dr_ilha.pdf (2.09 MB)

santos_rg_dr_ilha.pdf

Resumo

Resumo (inglês)

High‐precision phenotyping combined with machine learning algorithms enables a more efficient exploration of soybean genetic variability. By reducing the time and subjectivity of evaluations, this approach accelerates the selection of superior genotypes and enhances breeding programs. Chapter 1 presents a critical review of the literature, situating the research topic within the current state of the art and establishing the theoretical foundations that underpin this dissertation. In chapter 2, 32 soybean genotypes were classified based on physiological variables using a VIS–NIR sensor (400~825 nm). The reflectance data were grouped into 20 representative bands, and measurements included net photosynthesis, internal CO₂ concentration, stomatal conductance, and transpiration. Genotypes were clustered by k‑means into two groups, which were then used as output variables in the machine‑learning models. The algorithms were tested with continuous wavelength inputs and band averages and evaluated using the percentage of correct classifications and F‑score. Results indicated that the set of 32 genotypes split into one cluster with 20 genotypes and another with 12; cluster 2 exhibited higher mean stomatal conductance, CO₂ concentration and transpiration, whereas cluster 1 showed slightly higher photosynthesis. In classification, support vector machine (SVM) and logistic regression achieved higher accuracy when the full spectrum was used; the J48 algorithm performed best with the band averages. The superior performance of J48 with aggregated data indicates that the choice of input type influences algorithm efficiency. Chapter 3 evaluated 32 F₃ soybean populations with a hyperspectral spectroradiometer (350~2500 nm). Physiological traits were measured 60 days after emergence, and the same leaves were evaluated spectrally. The data were analyzed both as continuous spectra and as aggregated bands. Cluster 1 showed higher photosynthesis and water‑use efficiency, whereas cluster 2 displayed higher stomatal conductance and transpiration. In classification, the continuous spectrum outperformed the aggregated bands. J48 and REPTree achieved the highest accuracies and F‑scores, followed by SVM and neural networks; Random Forest and logistic regression exhibited lower performance.

Resumo (inglês)

High‐precision phenotyping combined with machine learning algorithms enables a more efficient exploration of soybean genetic variability. By reducing the time and subjectivity of evaluations, this approach accelerates the selection of superior genotypes and enhances breeding programs. Chapter 1 presents a critical review of the literature, situating the research topic within the current state of the art and establishing the theoretical foundations that underpin this dissertation. In chapter 2, 32 soybean genotypes were classified based on physiological variables using a VIS–NIR sensor (400~825 nm). The reflectance data were grouped into 20 representative bands, and measurements included net photosynthesis, internal CO₂ concentration, stomatal conductance, and transpiration. Genotypes were clustered by k‑means into two groups, which were then used as output variables in the machine‑learning models. The algorithms were tested with continuous wavelength inputs and band averages and evaluated using the percentage of correct classifications and F‑score. Results indicated that the set of 32 genotypes split into one cluster with 20 genotypes and another with 12; cluster 2 exhibited higher mean stomatal conductance, CO₂ concentration and transpiration, whereas cluster 1 showed slightly higher photosynthesis. In classification, support vector machine (SVM) and logistic regression achieved higher accuracy when the full spectrum was used; the J48 algorithm performed best with the band averages. The superior performance of J48 with aggregated data indicates that the choice of input type influences algorithm efficiency. Chapter 3 evaluated 32 F₃ soybean populations with a hyperspectral spectroradiometer (350~2500 nm). Physiological traits were measured 60 days after emergence, and the same leaves were evaluated spectrally. The data were analyzed both as continuous spectra and as aggregated bands. Cluster 1 showed higher photosynthesis and water‑use efficiency, whereas cluster 2 displayed higher stomatal conductance and transpiration. In classification, the continuous spectrum outperformed the aggregated bands. J48 and REPTree achieved the highest accuracies and F‑scores, followed by SVM and neural networks; Random Forest and logistic regression exhibited lower performance.

Palavras-chave

Fenotipagem fisiológica, Aprendizagem de máquina, Sensores hiperespectrais, Classificação de genótipos

Idioma

Português

Citação

SANTOS, Regimar Garcia dos. Classificação de genótipos de soja quanto os seus atributos fisiológicos usando aprendizagem de máquina e diferentes sensores espectrais. 2025. Tese (Doutorado) - Universidade Estadual Paulista (UNESP), Faculdade de Engenharia, Ilha Solteira, 2025.