SÃO PAULO STATE UNIVERSITY JABOTICABAL CAMPUS COFFEE CLASSIFICATION ACCORDING TO ITS DETACHMENT FORCE: A DECISION TREE-BASED APPROACH Mariana Dias Meneses Agricultural Engineer 2023 SÃO PAULO STATE UNIVERSITY JABOTICABAL CAMPUS COFFEE CLASSIFICATION ACCORDING TO ITS DETACHMENT FORCE: A DECISION TREE-BASED APPROACH Mariana Dias Meneses Advisor: Prof. Dr. Rouverson Pereira da Silva Co-advisors: Prof. Dr. Welington Gonzaga do Vale Prof. Dr. Glauco de Souza Rolim Dissertation submitted to the College of Agricultural and Veterinary Sciences - UNESP, Jaboticabal Campus, as part of the requirements for obtaining the title of MSc. in Agronomy (Soil Sciences). 2023 M543c Meneses, Mariana Dias Coffee classification according to its detachment force: a decision tree-based approach / Mariana Dias Meneses. -- Jaboticabal, 2023 33 f. Dissertação (mestrado) - Universidade Estadual Paulista (Unesp), Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal Orientador: Rouverson Pereira da Silva 1. Aprendizado do computador. 2. Café Colheita. 3. Maturação. I. Título. Sistema de geração automática de fichas catalográficas da Unesp. Biblioteca da Faculdade de Ciências Agrárias e Veterinárias, Jaboticabal. Dados fornecidos pelo autor(a). Essa ficha não pode ser modificada. AUTHOR’S CURRICULUM INFORMATION MARIANA DIAS MENESES daughter of Maria da Conceição Dias Meneses and Aguinaldo José de Meneses, born on December 12th, 1995, in Aracaju, Sergipe, Brazil. Graduated from Federal University of Sergipe in agriculture engineering in 2021. During the graduation she participated for two years in the study group in mechanization, automation, and agricultural instrumentation (GEMAIA). In 2021/2 she started her master’s degree in Agronomy (Soil Sciences) and focused on machine learning techniques for improve mechanized harvest. “It’s a dangerous business Frodo, going out your door. You step onto the road, an if you don’t keep your feet, there’s no knowing where you might be swept off to.” J.R.R. Tolkien ACKNOWLEDGEMENTS Firstly, I would like to thank my grandmother for all years encouraging me to be an independent and successful woman. Your absence is felt deeply in my heart, and I will miss you forever. Thank you for your unconditional love. I would like to thank my family, firstly my parents Conceição and Aguinaldo for supporting my dreams and for helping me to get as far as possible. Every day you sacrificed many things so that I could conquer my dreams. You have given me strength and believed on me when I needed the most. Also, to my “sister” Marília. Thank you for your sweet smile and your soft heart, you are the purest and kindest human in this world. To achieve this professional victory my advisors were indispensable. To Dr. Rouverson Pereira da Silva for trust in my work, for the precious talks to help me in be a better researcher and human. To my co-advisor Dr. Glauco Rolim that helped me in my data analysis. Achieve the master’s degree would not be possible without my co-advisor and friend, Dr. Welington Gonzaga do Vale, who show me the pleasure to be a researcher and that support my decisions in pursue this career. Finally, I would like to thank my friends Laura, Jamile, Eduarda, Vanessa, João, and Malvs for the good talks and laughs that make me fell home. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. i SUMARY Chapter 1 – General introduction ...................................................................... 1 1.1Introduction .............................................................................................. 1 1.2Literature rereview .................................................................................... 1 1.2.1 Coffee: global trends ......................................................................... 1 1.2.2 Coffee plant uneven maturation ........................................................ 2 1.2.3 Mechanized harvesting ..................................................................... 3 1.2.4 Machine Learning: a subfield of Artificial Intelligence ....................... 5 1.2.5 Decision Tree Classifier .................................................................... 6 1.2.6 Classification performance evaluation .............................................. 8 References .................................................................................................... 9 Chapter 2 – COFFEE CLASSIFICATION ACCORDING TO ITS DETACHMENT FORCE: A DECISION TREE-BASED APPROACH .................................................. 15 1. Introduction ............................................................................................. 15 2. Materials and Methods ............................................................................ 17 2.1 Experimental Field ............................................................................. 17 2.2 Detachment Force ............................................................................. 19 2.3 Descriptive Analysis ........................................................................... 20 2.4 Cluster Analysis ................................................................................. 20 2.5 Creation of the conditional for Machine Learning............................... 21 2.6 Decision Tree Classifier and Confusion Matrix .................................. 21 3. Results and Discussion ........................................................................... 22 3.1 Descriptive Analysis ........................................................................... 22 3.2 Grouping the cultivars ........................................................................ 23 3.3 Decision Tree Classifier and Confusion Matrix .................................. 27 4. Conclusion .............................................................................................. 31 ii References .................................................................................................. 31 iii COFFEE CLASSIFICATION ACCORDING TO ITS DETACHMENT FORCE: A DECISION TREE-BASED APPROACH ABSTRACT – The world coffee consumption demands high-efficiency crop systems. Consumers appreciate flavor and aroma in this beverage, characteristics that are game-changing in coffee value. A key role to improve this production chain, mechanized harvesting fails in provide coffee fruits selectivity. It means that the industry receives fruits with astringent flavor or fermentation. Because coffee plant has uneven maturation, i.e., green, cherry, and dry fruits, and the harvester settings are generalist, the fruits are detached regardless their maturation stage. The use of Machine Learning techniques improves the traditional agriculture to a digital one, its use in mechanized harvesting enhances selectivity of the coffee fruits. Overall, the present study aimed to classify the coffee fruit detachment force using a Decision Tree Classifier. The experiment was conducted in two field in the Brazilian state of Minas Gerais. A dynamometer was used to measure the detachment force of 23 coffee cultivars. The cultivars were grouped using a cluster algorithm and a Decision Tree classified each group according to the detachment force. The Decision Tree obtained a mean Matthews Correlation Coefficient of 0.81, proving its efficiency in classify the detachment force. Therefore, we proved that Decision Tree can power the mechanized harvesting as a tool to more accurate decision-making settings. Keywords: Machine Learning, Selective Mechanized Harvesting, Uneven Maturation 1 CHAPTER 1 – General introduction 1.1 INTRODUCTION Coffee is one of the most famous and popular beverages in the world, brewed from roasted beans the beverage is consumed around the world and Brazil is the larger producer and exporter of the commodity. The harvest is one of the most decisive moments of the crop production and the success on this operation would ensure the quality of the beverage. Currently, mechanized harvesting relies on detaching fruits regardless of its maturity stage, due to the tree uneven maturation. Otherwise, it is imperative to detach mainly the cherry fruit to maintain the high quality of the beverage, characterizing a selective harvesting. Because machine learning provides ways to overcome challenges by an in- depth association of multi-source data, coffee harvesting would be rethink by a robust analysis of the interaction between the plant uneven maturation and harvester mechanical dynamics. It will be the first step towards an effective yet applicable selective mechanized harvesting. Among the results, the Machine Learning methods can make it possible to: understand the physiological characteristics of the coffee tree, the influence of the environment, and the exact force needed to the green fruits stayed in the tree and the cherry fruits be detached. This type of study makes it possible the integration of automation devices to adapt the coffee harvester and make it capable of performing a selective harvest. Therefore, our purpose was to use a Machine Learning algorithm to classify the detachment force of coffee fruits. In Chapter 1, we present a brief literature review on the global utilization of coffee, concept of coffee uneven maturity, mechanized harvesting, Machine Learning, decision tree algorithm, and measures to evaluate a classification algorithm. In Chapter 2, we describe the classification of the detachment force using a decision tree algorithm and its classification quality applied to 23 cultivars. 1.2 LITERATURE REVIEW 1.2.1 Coffee: global trends The world coffee production (Arabica and Robusta) for 2022/23 is estimated in 10,368 million tons an increase of 396.0 million tons from the last year, and its consumption is expected in 10,074 million tons powered by European Union, United 2 States, and Brazil demand (USDA, 2022). It is forecast that Brazil will remain the leader in production and export with estimative of 3,756 million tons and 2,202 million tons, respectively, with Minas Gerais and Espírito Santo being the lager producers and exporters of the coffee (USDA, 2022). The coffee consumers describe emotional experiences when drinking it e.g., happiness, pleasure, comfort and calm (BHUMIRATANA; ADHIKARI; CHAMBERS, 2014) yet the most prominent use is for energy and focus enhance. Coffee intake behavior can vary by country and culture, while the consumption is associated with health care e.g., weight control, prevention of cancer, Parkinson’s and Alzheimer’s disease (PHAN; CHAMBERS, 2016; RAMALHO; SOARES, 2018). Beyond the beverage, the coffee residues generated in the post-harvest process has multiple uses which include but are not limited to: bioethanol production, livestock feeding, antioxidants, vitamin E and organic fertilizer (ANDRADE et al., 2022). 1.2.2 Coffee plant uneven maturation Environmental specifications to ensure Arabica coffee perfectly growing involves temperatures between 18 and 22o C, rainfall of 1200 to 1800 mm, and altitudes between 1600 to 2800 m. Colombia and some Brazilian regions have this environmental characteristics (ADHIKARI et al., 2020). For these regions, the reproductive period is stimulated by short-days, and the flowering and fruiting occur irregularly over several months (RAKOCEVIC et al., 2020). In non-equatorial regions as in Brazilian state of Minas Gerais, the coffee plant has more the one blossoming period, with different physiological development of the floral buds within each branch, as a consequence, on the same tree the emergence and development of the fruits occur at different moments, resulting in uneven fruit maturation (DAMATTA et al., 2007). In the fruit maturation, several modifications happen in the coffee plant geometrical, physiological, and chemical characteristics. Because fruit-peduncle elasticity modulus change, the green fruits have different resistance than cherry and dry ones (TINOCO et al., 2014). As maturation progress, the increasing of enzymatic activity responsible for the cell walls degradation reduce the elasticity modulus of the peduncle structure and its resistance. Otherwise, the peduncle of green fruit has a higher elasticity modulus due 3 to the amount of nutrient flowing through the structure to stimulate its development (COELHO et al., 2015). This uneven maturation can be intensified by the sunlight. The incidence of light act as an activator of flowering, stimulating the transition from the vegetative to reproductive stage (KAMI et al., 2010). 1.2.3 Mechanized harvesting Mechanized harvesting it’s a solution for the growers who pursue for cost reduction, time saving, and yield increasing (HOSHYARMANESH et al., 2017). The mechanized harvesting by vibration its applied in several agriculture crops e.g., olives, apricot, almond, coffee and citrus (HOSHYARMANESH et al., 2017; LI; LEE; HSU, 2011). Vibration harvesting works by applying a frequency of a mechanical movement in the plant. It movement must be adjusted close to the peduncle’s natural frequency to overcome the structure resistance and detach the fruit from the plant (ZHOU et al., 2022). For coffee crop, harvesters have two cylinders with rods in their length to promote the detachment of the coffee fruits (SILVA et al., 2020). The harvester passes over the coffee plants, the rods oscillate and produce a vibration on it, causing the fruit detachment (FURRIEL et al., 2022). In an attempt to make a selective mechanized harvesting, growers adjust the harvester according to the crop and area characteristic (DE SOUZA et al., 2020). The most common changes in harvester settings are ground speed, frequency and vibration rods amplitude (FERREIRA JÚNIOR et al., 2020). Other occasional changes depends on plant special feature e.g., for the first harvest or for smaller plants, the machine height must be decreased to enable harvesting in the lower parts of the coffee plant (SANTINATO et al., 2015). However, due to the dynamic properties of the coffee plant, the adjustments made do not ensure the selectivity (FERREIRA JÚNIOR et al., 2020). Methods using image acquisition and machine learning techniques seek to improvement coffee mechanized harvesting. Rosas et al., (2022) used a modified MAPIR Survey 3W camera in a Phantom 4 Pro drone to capture imagens from five coffee fields with four different cultivars. The authors identify the green and cherry fruits and the suitable field for harvesting using nine vegetation indices extracted from the imagens. The indices with more relevance for the harvest were selected using a 4 Principal Component Analysis. As a result, the CRI, MCARI1, and MTVI1 indices were considered suitable for discriminate the plants with green fruits from the plants with cherry ones. However, the canopy volume and crop yield influenced in the results. With a smartphone, Bazame et al., (2022) capture imagens from coffee plants to identify the green, cherry, and dry fruits. The imagens were saved with a resolution of 72 dpi. The authors used YOLO algorithms with six different network sizes to detect and classify the fruits. The best accuracy was obtained for the larger network size (800 X 800 pixels), more than 70% of correct classification. All models detected the cherry fruits easier than the green ones, the coffee leaves interfered for the detection of this last one. The use of filming cameras are also a way to describe the coffee characteristics. Villibor et al., (2016) used a Casio, Exilim EX-FH20, and an electromagnetic shaker to identify the coffee modal parameters: damped oscillation period, damping ratio, undamped and damped natural frequencies, damping coefficient, and stiffness of the system. Using a coordinates system (s,t) in mathematical equations, the authors recorded the movement of green and cherry fruits and its fruit-peduncle behavior in different frequencies. Confusion matrixes was used to classify the operation performance. It was possible to identify which frequency cause the peduncle breaking, consequently the natural frequency of the green and cherry fruits. Bazame et al., (2021) developed and implement an algorithm to detect and classify coffee fruits during the harvest. The authors used a video camera with 1920x1080 resolution in the spout of the coffee harvester to record the fruits harvested. A YOLO neural networks and a k-means clustering algorithm was used to detect and classify the fruits in three classes: green, cherry, and dry. The performance of the classification was higher than 80%. Despite the accuracy of the method, the tool is better used for pos harvest analysis, not for the pre-harvest decision making. Therefore, the other methods of imagen capture make possible the creation of pre- harvest maps to identify the coffee maturation variability in the field. However, the use of these studies to improve harvester settings remains in theoretical field, still not suitable to provide the selective harvesting. 5 1.2.4 Machine Learning: a subfield of Artificial Intelligence Conceived as the science and engineering of making intelligent machines, the Artificial Intelligence (AI) is a tool that give the computers the ability to solve complex tasks faster and more accurately that any human being (HAMET; TREMBLAY, 2017). As a subfield of the AI, Machine Learning (ML) use mathematical models to solve multidisciplinary problems, and to provide a system the ability to learn and to perform a task by analyzing a dataset behavior (SHARMA; KUMAR, 2017). Overall, the algorithm interacts with the dataset to stablish complex relations between the variables (JANIESCH; ZSCHECH; HEINRICH, 2021). In ML the data is used as set of attributes to describe a phenomenon, regardless of being a categorical, numerical or a binary feature (LIAKOS et al., 2018). The learning process can be classified in two types: supervised, and unsupervised. In supervised learning, input variables are used as a training dataset to predict a quantitative output in a testing data. Otherwise, for the unsupervised models the algorithm must learn from a data without any previous training input (SHARMA; KUMAR, 2017). For these two types of learning, the algorithms are classified as: classification, regression, and clustering. The classification models work as a supervised learning, assigning a label to the dataset whose label are unknown. The process happens in two phases: training the algorithm using a labeled dataset, then testing using a unlabeled dataset (NIKAM, 2015). The regression models are also a supervised learning that predict an output according to a input (LIAKOS et al., 2018). Otherwise, the cluster analysis is a unsupervised method that group the data in categories considering their differences or similarities (MCINTOSH; SHARPE; LAWRIE, 2010). Farmers need to collect data of sensors, cameras, and online services to progress from the traditional agriculture to smart agriculture. For this purpose, the ML is used to process these amount of data and help to achieve the four pillars of smart farming: optimal resources management, conservation of the ecosystem, adequate service, and utilization of modern technologies (BENOS et al., 2021). The use of ML in smart farming had benefited farmers along the production chain, making the agriculture sustainable, reducing costs, optimizing the use of natural resources and inputs. There are several applications of ML in agriculture e.g., crop disease and yield 6 prediction, reducing of water waste, soil classification, crop monitoring and tracking, and phenotyping (PALLATHADKA et al., 2021). 1.2.5 Decision Tree Classifier Largely used because of the easy way to understand its results, the Decision Tree Classifier (DT) relies on a series of tests that compare numeric attributes with categorical ones to label classes. The DT is a supervised ML model that split the data domain (node) in two sub nodes. They have more information than the first one with different weights to make the new decision. The ideal architecture of these models is based on splitting the tree in subsets to gain information (SUTHAHARAN, 2016). The hyperparameters are chosen and the DT starts the classification process (Figure 1). Thereafter, the performance of the classification is measured by several metrics. Figure 1. Behavior of the DT for split the data and training and testing dataset. The DT architecture is structured in nodes and branches. The tree starts with the first test made by the algorithm that splits the root node into two more, the decision nodes. They represent another two tests to split the dataset and after that the final decision generate the leaf node (Figure 2). These nodes are connected by the branches. The left ones represents the true decision while the right branches is the 7 false decision (SONG; LU, 2015). Each node has the specificity of the category to be classified (CHARBUTY; ABDULAZEEZ, 2021). Figure 2. Decision Tree Classifier architecture. The decisions are made in the nodes, first with the root one and after the decision nodes are created until the final classification in the leaf nodes. Despite the easy understanding and high performance, the overfitting of the DT constrains the tree applicability (GARCÍA LEIVA et al., 2019). The overfitting phenomenon is the difference among the testing and training performance of supervised algorithms, resulting in a high error because of the generalization of the data (YEOM et al., 2018). To avoid this event, the adjustments of the DT hyperparameters e.g., index, max depth, and number of components, can be a solution by limiting the tree growing (NIE; ZHU; LI, 2020; YUVARAJ et al., 2021). In agriculture, the DT algorithms can assist in coffee leaf rust disease identification, climate modelling 8 for greenhouse optimization, machinery classification. (CAI et al., 2022; MARIN et al., 2021; RAJESWARI; SUTHENDRAN, 2019; REHMAN et al., 2019). 1.2.6 Classification performance evaluation The ability to correctly classify the class label of an unknown data is measured by accuracy models. The usual way is the Confusion Matrix (CM), a two-dimensions matrix composed by the true class of the object and the predicted one (DENG et al., 2016). The rows represent the true class labels and the columns the predicted ones (Figure 3). A testing dataset is used, and the matrix report the number of True positives (TP), False Negative (FN), False Positive (FP), and True Negative (TN) (CAELEN, 2017). A measure of the classification performance is obtained by comparing the true labels with the predicted by the algorithm (TANGIRALA, 2020). Figure 3. Confusion Matrix. The rows represent the actual label of the data and the columns the label predicted by the algorithm. For the performance evaluation several metrics can be use e.g., F1, Matthews Correlation Coefficient (MCC), and accuracy (MARKOULIDAKIS et al., 2021). The F1 and accuracy are popular metrics. However, considering an imbalanced dataset, with more positive or negative samples, these two metrics fails in consider the relation between the positive and negative cases. Otherwise, MCC does not reduce the capability to correctly identify the classes (CHICCO; TÖTSCH; JURMAN, 2021). The MCC evaluate the predicted labels according to the true ones. It range from -1 to +1, 9 perfect class misclassification and perfect class classification, respectively (CHICCO; JURMAN, 2020). REFERENCES ADHIKARI, M. et al. A Review of Potential Impacts of Climate Change on Coffee Cultivation and Mycotoxigenic Fungi. Microorganisms, v. 8, n. 10, p. 1625, out. 2020. ANDRADE, M. C. et al. Crop residues: applications of lignocellulosic biomass in the context of a biorefinery. Frontiers in Energy, v. 16, n. 2, p. 224–245, 1 abr. 2022. BARROS, M. M. DE et al. Use of classifier to determine coffee harvest time by detachment force. Revista Brasileira de Engenharia Agrícola e Ambiental, v. 22, p. 366–370, maio 2018. BAZAME, H. C. et al. Detection, classification, and mapping of coffee fruits during harvest with computer vision. Computers and Electronics in Agriculture, v. 183, p. 106066, 1 abr. 2021. BAZAME, H. C. et al. Detection of coffee fruits on tree branches using computer vision. Scientia Agricola, v. 80, 12 set. 2022. BENOS, L. et al. Machine Learning in Agriculture: A Comprehensive Updated Review. Sensors, v. 21, n. 11, p. 3758, jan. 2021. BHUMIRATANA, N.; ADHIKARI, K.; CHAMBERS, E. The development of an emotion lexicon for the coffee drinking experience. Food Research International, Coffee – Science, Technology and Impacts on Human Health. v. 61, p. 83–92, 1 jul. 2014. BRANDÃO, I. R. et al. Physiological and ultrastructural analysis reveal the absence of a defined abscission zone in coffee fruits. Bragantia, v. 75, p. 386–395, 6 out. 2016. Brazil: Coffee Semi-annual. Disponível em: . Acesso em: 17 jan. 2023. CAELEN, O. A Bayesian interpretation of the confusion matrix. Annals of Mathematics and Artificial Intelligence, v. 81, n. 3, p. 429–450, 1 dez. 2017. CAI, W. et al. A method for modelling greenhouse temperature using gradient boost decision tree. Information Processing in Agriculture, v. 9, n. 3, p. 343–354, 1 set. 2022. CHARBUTY, B.; ABDULAZEEZ, A. Classification Based on Decision Tree Algorithm for Machine Learning. Journal of Applied Science and Technology Trends, v. 2, n. 01, p. 20–28, 24 mar. 2021. CHICCO, D.; JURMAN, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, v. 21, n. 1, p. 6, 2 jan. 2020. 10 CHICCO, D.; TÖTSCH, N.; JURMAN, G. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Mining, v. 14, n. 1, p. 13, 4 fev. 2021. COELHO, A. L. DE F. et al. Determinação das propriedades geométricas, físicas e mecânicas do sistema fruto-pedúnculo-ramo do cafeeiro. Revista Brasileira de Engenharia Agrícola e Ambiental, v. 19, p. 286–292, mar. 2015. Coffee: World Markets and Trade. Disponível em: . Acesso em: 5 dez. 2022. DALMAIJER, E. S.; NORD, C. L.; ASTLE, D. E. Statistical power for cluster analysis. BMC Bioinformatics, v. 23, n. 1, p. 205, 31 maio 2022. DAMATTA, F. M. et al. Ecophysiology of coffee growth and production. Brazilian Journal of Plant Physiology, v. 19, p. 485–510, dez. 2007. DE SOUZA, G. S. et al. Mechanized harvesting of “Conilon” coffee clones. Pesquisa Agropecuaria Brasileira, v. 55, 2020. DENG, X. et al. An improved method to construct basic probability assignment based on the confusion matrix for classification problem. Information Sciences, v. 340–341, p. 250–261, 1 maio 2016. FERREIRA JÚNIOR, L. DE G. et al. Characterization of the coffee fruit detachment force in crop subjected to mechanized harvesting. jan. 2018. FERREIRA JÚNIOR, L. DE G. et al. Dynamic behavior of coffee tree branches during mechanical harvest. Computers and Electronics in Agriculture, v. 173, p. 105415, 1 jun. 2020. FURRIEL, G. P. et al. Acoustics applied in the development of equipment for precision agriculture: Coffee handling and harvesting. Computers and Electronics in Agriculture, v. 198, p. 106981, 1 jul. 2022. GARCÍA LEIVA, R. et al. A Novel Hyperparameter-Free Approach to Decision Tree Construction That Avoids Overfitting by Design. IEEE Access, v. 7, p. 99978–99987, 2019. GODINHO, J. DE D. et al. The best moment to carry out the selective harvest of coffee fruits. Agronomy Journal, v. 114, n. 6, p. 3297–3305, 2022. HAMET, P.; TREMBLAY, J. Artificial intelligence in medicine. Metabolism, Insights Into the Future of Medicine: Technologies, Concepts, and Integration. v. 69, p. S36– S40, 1 abr. 2017. 11 HOSHYARMANESH, H. et al. Numerical and experimental vibration analysis of olive tree for optimal mechanized harvesting efficiency and productivity. Computers and Electronics in Agriculture, v. 132, p. 34–48, 1 jan. 2017. JANIESCH, C.; ZSCHECH, P.; HEINRICH, K. Machine learning and deep learning. Electronic Markets, v. 31, n. 3, p. 685–695, 1 set. 2021. KAMI, C. et al. Chapter Two - Light-Regulated Plant Growth and Development. Em: TIMMERMANS, M. C. P. (Ed.). Current Topics in Developmental Biology. Plant Development. [s.l.] Academic Press, 2010. v. 91p. 29–66. KAZAMA, E. H. et al. Methodology for selective coffee harvesting in management zones of yield and maturation. Precision Agriculture, v. 22, n. 3, p. 711–733, 2021. KOTSIANTIS, S. B. Decision trees: a recent overview. Artificial Intelligence Review, v. 39, n. 4, p. 261–283, 1 abr. 2013. LANGE, L.; HEDDERICH, M. A.; KLAKOW, D. Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels. arXiv, , 4 nov. 2019. Disponível em: . Acesso em: 23 nov. 2022 LI, P.; LEE, S.; HSU, H.-Y. Review on fruit harvesting method for potential use of automatic fruit harvesting systems. Procedia Engineering, PEEA 2011. v. 23, p. 351– 366, 1 jan. 2011. LIAKOS, K. G. et al. Machine Learning in Agriculture: A Review. Sensors, v. 18, n. 8, p. 2674, ago. 2018. MARIN, D. B. et al. Detecting coffee leaf rust with UAV-based vegetation indices and decision tree machine learning models. Computers and Electronics in Agriculture, v. 190, p. 106476, 1 nov. 2021. MARKOULIDAKIS, I. et al. Multiclass Confusion Matrix Reduction Method and Its Application on Net Promoter Score Classification Problem. Technologies, v. 9, n. 4, p. 81, dez. 2021. MCINTOSH, A. M.; SHARPE, M.; LAWRIE, S. M. 9 - Research methods, statistics and evidence-based practice. Em: JOHNSTONE, E. C. et al. (Eds.). Companion to Psychiatric Studies (Eighth Edition). St. Louis: Churchill Livingstone, 2010. p. 157– 198. NIE, F.; ZHU, W.; LI, X. Decision Tree SVM: An extension of linear SVM for non-linear classification. Neurocomputing, v. 401, p. 153–159, 11 ago. 2020. NIKAM, S. S. A Comparative Study of Classification Techniques in Data Mining Algorithms. Oriental Journal of Computer Science and Technology, v. 8, n. 1, p. 13–19, 30 abr. 2015. OLIVEIRA, B. R. DE [UNESP. A luz solar e a agressividade da colheita de café afetam a qualidade da operação? 26 mar. 2021. 12 PALLATHADKA, H. et al. IMPACT OF MACHINE learning ON Management, healthcare AND AGRICULTURE. Materials Today: Proceedings, 22 jul. 2021. PANHALKAR, A. R.; DOYE, D. D. Optimization of decision trees using modified African buffalo algorithm. Journal of King Saud University - Computer and Information Sciences, v. 34, n. 8, Part A, p. 4763–4772, 1 set. 2022. PÉREZ-ORTIZ, M. et al. A Review of Classification Problems and Algorithms in Renewable Energy Applications. Energies, v. 9, n. 8, p. 607, ago. 2016. PHAN, U. T. X.; CHAMBERS, E. Motivations for choosing various food groups based on individual foods. Appetite, v. 105, p. 204–211, 1 out. 2016. PRIYA, R.; RAMESH, D. ML based sustainable precision agriculture: A future generation perspective. Sustainable Computing: Informatics and Systems, v. 28, p. 100439, 1 dez. 2020. RAJESWARI, S.; SUTHENDRAN, K. C5.0: Advanced Decision Tree (ADT) classification model for agricultural data analysis on cloud. Computers and Electronics in Agriculture, v. 156, p. 530–539, 1 jan. 2019. RAKOCEVIC, M. et al. The vegetative growth assists to reproductive responses of Arabic coffee trees in a long-term FACE experiment. Plant Growth Regulation, v. 91, n. 2, p. 305–316, 1 jun. 2020. RAMALHO, M. E. O.; SOARES, N. M. CAFÉ E SEUS BENEFÍCIOS. Revista Interface Tecnológica, v. 15, n. 1, p. 285–292, 30 jun. 2018. REAY, D. Climate-Smart Coffee. Em: REAY, D. (Ed.). Climate-Smart Food. Cham: Springer International Publishing, 2019. p. 93–104. REHMAN, T. U. et al. Current and future applications of statistical machine learning algorithms for agricultural machine vision systems. Computers and Electronics in Agriculture, v. 156, p. 585–605, 1 jan. 2019. ROSAS, J. T. F. et al. Coffee ripeness monitoring using a UAV-mounted low-cost multispectral camera. Precision Agriculture, v. 23, n. 1, p. 300–318, 2022. SÁGIO, S. A. et al. Identification and expression analysis of ethylene biosynthesis and signaling genes provides insights into the early and late coffee cultivars ripening pathway. Planta, v. 239, n. 5, p. 951–963, 1 maio 2014. SANTIN, M. R. et al. CICLO DE MATURAÇÃO E FORÇA DE DESPRENDIMENTO DOS FRUTOS DE CAFÉ CONILON EM CULTIVO IRRIGADO NO CERRADO. p. 5, 2015. SANTINATO, F. et al. Colheita mecanizada do café em lavouras de primeira safra. Revista Brasileira de Engenharia Agrícola e Ambiental, v. 19, p. 1215–1219, dez. 2015. 13 SENINDE, D. R.; CHAMBERS, E. Coffee Flavor: A Review. Beverages, v. 6, n. 3, p. 44, set. 2020. SHARMA, D.; KUMAR, N. A Review on Machine Learning Algorithms, Tasks and Applications. v. 6, p. 2278–1323, 1 out. 2017. SILVA, C. A. DA et al. Análise experimental em um cilindro de varetas de uma colhedora de café para diagnóstico de falha. ForScience, v. 8, n. 2, p. e00632– e00632, 6 out. 2020. SOARES, L. DOS S. et al. Interaction between climate, flowering and production of dry coffee (Coffea arabica L.) in Minas Gerais. Coffee Science - ISSN 1984-3909, v. 16, p. e161786–e161786, 15 jun. 2021. SONG, Y.; LU, Y. Decision tree methods: applications for classification and prediction. Shanghai Archives of Psychiatry, v. 27, n. 2, p. 130–135, 25 abr. 2015. SOUZA, G. S. D. et al. FORÇA DE DESPRENDIMENTO DE FRUTOS DE CAFÉ CONILON. Pensar Acadêmico, v. 16, n. 1, p. 6, 2018. SUTHAHARAN, S. Science of Information. Em: SUTHAHARAN, S. (Ed.). Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning. Integrated Series in Information Systems. Boston, MA: Springer US, 2016. p. 1–13. TANGIRALA, S. Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm*. International Journal of Advanced Computer Science and Applications, v. 11, n. 2, 2020. TINOCO, H. A. et al. Finite element modal analysis of the fruit-peduncle of Coffea arabica L. var. Colombia estimating its geometrical and mechanical properties. Computers and Electronics in Agriculture, v. 108, p. 17–27, 1 out. 2014. TINOCO, H. A.; PEÑA, F. M. Finite Element Analysis of Coffea arabica L. var. Colombia Fruits for Selective Detachment Using Forced Vibrations. Vibration, v. 1, n. 1, p. 207–219, set. 2018. TKACZYNSKI, A. Segmentation Using Two-Step Cluster Analysis. Em: DIETRICH, T.; RUNDLE-THIELE, S.; KUBACKI, K. (Eds.). Segmentation in Social Marketing: Process, Methods and Application. Singapore: Springer, 2017. p. 109–125. VILLIBOR, G. P. et al. Determination of modal properties of the coffee fruit-stem system using high speed digital video and digital image processing. Acta Scientiarum. Technology, v. 38, n. 1, p. 41–48, 1 jan. 2016. VILLIBOR, G. P. et al. Dynamic behavior of coffee fruit-stem system using modeling of flexible bodies. Computers and Electronics in Agriculture, v. 166, p. 105009, 1 nov. 2019. 14 YEOH, L.; NG, K. S. Future Prospects of Spent Coffee Ground Valorisation Using a Biorefinery Approach. Resources, Conservation and Recycling, v. 179, p. 106123, 1 abr. 2022. YEOM, S. et al. Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting. 2018 IEEE 31st Computer Security Foundations Symposium (CSF). Anais... Em: 2018 IEEE 31ST COMPUTER SECURITY FOUNDATIONS SYMPOSIUM (CSF). jul. 2018. YUVARAJ, N. et al. Automatic detection of cyberbullying using multi-feature based artificial intelligence with deep decision tree classification. Computers & Electrical Engineering, v. 92, p. 107186, 1 jun. 2021. ZHOU, J. et al. Finite element explicit dynamics simulation of motion and shedding of jujube fruits under forced vibration. Computers and Electronics in Agriculture, v. 198, p. 107009, 1 jul. 2022. 15 CHAPTER 2 – COFFEE CLASSIFICATION ACCORDING TO ITS DETACHMENT FORCE: A DECISION TREE-BASED APPROACH Abstract: Coffee aroma and flavor dramatically depends on fruit maturity stage. Because of its climacteric nature, coffee tree has multiple maturity stages at the same time. Although high-efficient, mechanized systems promote total harvesting to reach green, cherry, and dry fruits. It reduces coffee beverage quality by mixing different fruits. Therefore, it is imperative to develop methods and tools towards selective mechanized harvesting. The aim of the present study was to classify coffee cultivars according to fruit detachment force using a decision tree-based approach. It would reduce the subjectivity of the harvester settings and make it possible to prioritize cherry fruits during harvesting. Detachment force data were collected from 23 coffee cultivars. Three stages of maturity were analyzed: green, cherry, and dry, considering the West Side (WSP) East Side (SEP) of the plant. Algorithms and data analysis were developed using Python programming language. A cluster analysis was used to reduce dimensionality and group the cultivars according to the detachment force. Then, a decision tree was adjusted to classify input data in four force classes. The IPR 100 has the cultivar with the highest detachment force in three of the four groups. The Matthews Coefficient from the decision tree classification had an average of 0.86 and 0.81 for test and training dataset, respectively. The method proves useful to classify the fruit detachment force from several cultivars and contributes for future systems embedded in mechanized harvesting. Keywords: Machine Learning, Selective Mechanized Harvesting, Uneven Maturation. 1. Introduction Coffee is one of the most consumed beverage in the world and its consumption tends to increase (YEOH; NG, 2022). Brazil is a leader of production among other strategic countries mostly located between the Cancer and Capricorn tropics (REAY, 2019). Although strongly promoted as an energy enhancement drink, coffee intake has health benefits, e.g., cardiovascular disease prevention, Parkinson's and Alzheimer's prevention (RAMALHO; SOARES, 2018). 16 The coffee beverage is highly appreciated for its aroma and flavor. However, the crop has uneven maturity due to its climacteric nature (SÁGIO et al., 2014). At harvest time, the coffee tree has fruits with different periods of maturity. Such multi- stage fruits impair the beverage quality, as harvesting green fruits will provide an astringent flavor and the senescent ones are susceptible to fermentation (SENINDE; CHAMBERS, 2020). Moreover, the maturation dynamics dramatically changes the fruit-peduncle system and, therefore, its detachment resistance. Overall, the more immature the fruit, the more force is used to harvest it (COELHO et al., 2015). It complex, multi-factor condition challenges the mechanized harvesting. A common coffee harvester consists of detaching the fruit-peduncle by a set of ceaseless vibrating rods embedded in a roller. The vibration transmitted through the system must overcome the fruit-peduncle resistance (KAZAMA et al., 2021). However, the system promotes the total harvesting. Because the physical-chemical properties from fruit- peduncle varies for each maturity stage, the mechanized harvesting use a maximum vibration. Therefore, every fruit regardless the maturity stage is harvested. It completely conflicts with the quality standards described before. The coffee physical characteristics can be analyzed with the help of mathematical models and simulations; however, the results do not reach the practical field and are not aimed at the use of harvesters (TINOCO et al., 2014; TINOCO; PEÑA, 2018). Moreover, it is unknown whether a large group of cultivars have the same fruit- peduncle response. The human decision is a catalyst here by wrong adjusting the amount of force in selective mechanized harvesting. A machine learning-based model would open pathways towards selective mechanized harvesting and fill this gap in the literature. Machine learning supervised models for classification have an enormous number of applications. These algorithms establishes complex relationships between observed and predicted data to emerge a robust model able to classify a unknowing input (PÉREZ-ORTIZ et al., 2016). A decision tree algorithm is an example of a supervised machine learning method, with the advantage of an easy understanding, evaluation and a high performance with small training dataset when compare with others algorithms (PANHALKAR; DOYE, 2022). In agriculture, decision trees have 17 several applications, e.g., identify crop disease, crop yield calculation, market evaluation and crop performance in different environments (PRIYA; RAMESH, 2020). Therefore, in this study the aim was to classify the detachment force of a large group of coffee cultivars using a decision tree. A framework to classify detachment force would timely support decision-making in harvester adjustments. Combined with the machine's embedded systems it would be possible to choose the speed and vibration settings to achieve the necessary force to harvest the cherry fruits of the chosen cultivar. 2. Material and Methods 2.1 Experimental Field The experiment was conducted from last week of May to the first week of June 2022, in the municipalities of Araxá (Field 1) and Carmo do Paranaíba (Field 2), southwest region of Minas Gerais state (Figure 1), important coffee producing areas in Brazil. The areas had elevation of 1,001 m and 1,106 m, respectively. In both locations the cultivars were planted in January 2020. Experimental plots were distributed across the areas. The plants analyzed were arranged in a single row in north-south direction, with inter-row spacing of 4.0 meters and intra-row spacing of 0.5 meters regardless the field. Field 1 had drip irrigation with different applications during the growing period, 100% and 50% irrigation during the cycle. Otherwise, Field 2 was not irrigated (Table 1). 18 Figure 1. Geographical location of the two fields. Araxá (Field 1) is approximately 130 km away from Carmo do Paranaíba (Field 2). Monthly precipitation, maximum and minimum temperature for: a. Field 1 (Cwa) and b. Field 2 (Aw) during the year of 2021. Climate zones described according to Köppen-Geiger. Table 1. Coffee cultivars present in the experimental fields and its irrigation system and maturity period. Coffee cultivar Field Maturity period 1 – Irrigation 2 - Rainfed Fully Irrigated Partially irrigated Acauãma X Medium - Late Acauã JCG X Medium – Late Arara X X Late Araçari X Medium Asa Branca X X Medium Azulão X Early – Medium Beija Flor X Early Catuaí Vermelho X X Medium – Late Catiguá X Medium – Late IAC 125 RN X Early - Medium IAC 3439-4 X Early – Medium IAC 4520 X Early – Medium IAC 4932 X Early – Medium IAC SH3 X Early – Medium IPR 100 X X Late IPR 103 X X Late IPR 105 X X Late IPR 106 X X Late IPR 107 X X Early 19 Table 1. Coffee cultivars present ... (Cont.) Coffee cultivar Field Maturity period 1 – Irrigation 2 - Rainfed Fully Irrigated Partially irrigated IPR 108 X X Late Palma 2 X Late Palma 3 X Late Siriema AS 1 X Early 2.2 Detachment Force Using a dynamometer (DD-500, Instrutherm), the detachment forces (F) of the fruit-peduncle system were measured. The device supports a maximum of 49.03 N, with an accuracy of 0.4% (+/- 1 digit). By positioning the hook on the peduncle and pulling it, the force is measured using Hooke’s law (Figure 2). Figure 2. Method to measure the detachment force: a. The plants were randomly chosen and the fruit in the different maturation stages, b. The dynamometer hook was positioned on the fruit and the fruit was pulled and, c. The data of the detachment 20 force, the force class, maturity stage and coffee cultivar were used as input in the decision tree algorithm. For each cultivar, four random plants were chosen. Two fruits for each stage of maturity (green, cherry, and dry) were randomly selected from each side of the plant in eight replicates, totalizing 48 repetitions per cultivar. The sides were classified in West Side (WSP) and East Side (ESP) of the plant (GODINHO et al., 2022), the data was collected in the morning with sunlight exposure for ESP and the WSP was on the shade. An equation was used to determinate the difference between the detachment force of coffee in cherry stage from the green one, used as default measure in each plant: ∆𝐹 (%) = [ 𝐹𝑔 − 𝐹𝑐 𝐹𝑔 ] 100 Where, F is the difference of the detachment force (%), Fg is the green fruit detachment force (N), Fc is the cherry fruit detachment force (N). 2.3 Descriptive Analysis A descriptive analysis was performed in Python (version 3.9.7) for F to obtain the mean, first, second and third quartiles (Q1, Q2 and Q3), minimum and maximum, difference between quartiles and difference between the means for the cherry and green maturity stages. 2.4 Cluster Analysis Cluster analysis was performed for dimensionality reduction, clustering of cultivars, and group evaluation (DALMAIJER; NORD; ASTLE, 2022). Dimensionality reduction and clustering were performed using the mean force of the coffee cultivars as a parameter, regardless of their maturity or sunlight exposure face. The groups were selected from the hierarchical indentation of the cluster in which the cultivars were most similar (TKACZYNSKI, 2017). The analyses were performed using the scikit-learn library (version 1.1) in the Jupyterlab interface in Python (version 3.9.7) programming language. 21 2.5 Creation of the conditional for Machine Learning With all force data, a conditional using the three quartile values was created to classify the coffee cultivars by force class. Four force classes were created, and the cultivars could be labeled as: If F < Q1; Force class I; If Q1 <= F < Q2; Force class II; If Q2 <= F < Q3; Force class III; If F > Q3; Force class IV. 2.6 Decision Tree Classifier and Confusion Matrix A Decision Tree (DT) was used for classify the cultivars by previously trained classes (PÉREZ-ORTIZ et al., 2016; SHARMA; KUMAR, 2017). The DT classify the data of each group by choosing the best interaction among the attributes analyzed by the algorithm, the nodes presents in the tree structure confirm or deny the function (KOTSIANTIS, 2013). In addition to the force classes, the factors maturity and cultivars were also analyzed as categorical variables to classify the force. Because DT works with this type of variable, the classification was possible (TANGIRALA, 2020). The scikit-learn library (version 1.1) in Python (version 3.9.7) programming language was used to build the DT algorithm. The dataset was randomly divided in training and testing parameters with 70% and 30% of the dataset of each group, with a number of samples of 46, 16, 8 and, 29 respectively. The criterion, number of decision layers (depth) and components were selected from the GridSearchCV function that generated the appropriate hyperparameters for the algorithm for each time. To evaluate the classification quality of the training and testing plots, the Matthews Correlation Coefficient (MCC) was used. From the tests of true positives, false positives, false negatives and true negatives, the coefficient generates values from -1 to 1, perfect misclassification and perfect classification, respectively (CHICCO; JURMAN, 2020). Overall, the classification is true positive if the observed data is a positive label and the predicted is also positive. Otherwise, the classification is a false positive if the observed data is a negative label and the predicted is a positive. 22 Moreover, confusion matrixes were used to illustrate the algorithm's performance to classify each group. 3. Results and Discussion 3.1 Descriptive Analysis The average detachment force for green fruits was 25% higher than the cherry ones (Table 2). Therefore, the maturation progression is inversely proportional to the fruit detachment force, regardless of cultivars (SANTIN et al., 2015; SOUZA et al., 2018). The progression on fruit maturation incites the loss of resistance of the fruit- peduncle system, which changes the detachment force (BRANDÃO et al., 2016). For green fruits, the amount of nutrients that pass through the peduncle to stimulate fruit development gives the structure a higher elastic modulus and increases resistance (COELHO et al., 2015). As maturation progress, the increase in enzymatic activity degrades the cell wall of the peduncle and decreases the elastic modulus, i.e., ability to deform without breaking, generating a decrease in the structure’s strength (VILLIBOR et al., 2019). Table 2. Descriptive analysis of the detachment forces (N) for the three periods of coffee maturity. Maturity Stage Mean Min Q1 Q2 Q3 Max Q3-Q1 ∆F Green 4.67 0.76 2.81 3.76 5.70 15.37 2.89 25% Cherry 3.74 0.19 2.21 3.04 4.80 16.33 2.59 - Dry 1.18 0.03 0.62 0.95 1.49 12.96 0.87 - Min – Minimum force; Max – Maximum force; Q1 – First quartile; Q2 – Second quartile; Q3 – Third quartile; Q3-Q1 – Difference between the third and first quartile; ∆F – Difference between the average detachment forces of green and cherry fruits. Solar irradiance enhances photosynthetic activity and the progression from the vegetative stage to flowering stage by acting as an activator of regulators (KAMI et al., 2010). The incidence of sunlight on different sides of the plant can intensify the irregularity in flowering, which results in different periods of fruit maturity on the coffee plant (SOARES et al., 2021). Therefore, the detachment force may vary for maturity periods and for the WSP and ESP. In cherry fruit, the detachment force was lower for ESP compared to WSP (Table 3). While cherry fruit required 28% less force than green fruit in ESP, the difference decreased to 21% in WSP, which confirms previous results that described this fruit-exposure dynamic (KAZAMA et al., 2021). 23 Table 3. Descriptive analysis of the detachment forces (N) for the three periods of maturation on the WSP and sunlight exposure ESP sides. Maturity Stage Mean Min Q1 Q2 Q3 Max Q3- Q1 ∆F Green (WSP) 4.61 0.76 2.81 3.76 5.70 15.37 2.89 21% Cherry (WSP) 3.79 0.40 2.14 3.10 4.65 16.33 2.51 Green (ESP) 4.73 0.76 2.58 3.75 6.09 14.90 3.51 28% Cherry (ESP) 3.68 0.19 2.25 3.00 4.80 12.33 2.56 Min – Minimum force; Max – Maximum force; Q1 – First quartile; Q2 – Second quartile; Q3 – Third quartile; Q3-Q1 – Difference between the third and de first quartile; ESP – East Side of the Plant; WSP – West Side of the plant. The distinction between the group’s benefits mechanized harvesting. Because selective mechanized harvesting relies on picking only cherry fruits, it will open pathways to adapt the harvester according to the force of detachment. A way is the adjustment of the harvesting rods position (FERREIRA JÚNIOR et al., 2018) or changing the plant exposition time i.e., harvester ground speed. The characterization should evaluate the difference between green and cherry fruits, because the more prominent this difference, the greater the contribution for selective mechanized harvesting (Oliveira, 2021). Therefore, the condition of ESP would be more suitable for this activity. 3.2 Grouping the cultivars The cluster analysis resulted in two indentations, A and B, and four groups of cultivars (Figure 3). The groups were named as 1 (n = 22), 2 (n = 8), 3 (n = 4) and 4 (n = 14). The Group 2 had the lowest variation of F for each cultivar, while Group 1 had the highest variation (Figure 4). The highest F mean per group was: IPR 100 (Group 1, 2 and 4) and IAC 4520 (Group 3). Otherwise, the lowest was: Siriema AS1 (Group 1), IPR 107 (Group 2), IPR 106 (Group 3) and IAC 3439-4 (Group 4) (Figure 4). In all the groups the green fruits had the highest F, with the Group 4 with the highest F mean and Group 2 with the lowest (Figure 5). 24 Figure 3. Classification of the groups according to the Cluster Analysis. 25 Figure 4. Boxplot of the distribution of F for all cultivars in a. Group1, b. Group 2, c. Group 3 and d. Group 4. 26 Figure 5. Boxplot of the distribution of F for a. Group 1, b. Group 2, c. Group 3 and d. Group 4 determined by the cluster analysis. The geographical location strongly conditioned the separation. Indentation A included the coffee cultivars from the Field 1, except for IAC 3439-4 and IAC 4520. The other cultivars, from the Field 2, were placed in indentation B. Although present in both areas and under the same irrigation management, cultivars such as IPR 103, IPR 107, IPR 108, Asa Branca, Arara and Catuaí Vermelho were not included in the same group, reaffirming the strong influence of the geographical location. 27 Clustering was also conditioned on variables intrinsic to plant development e.g., maturation stage, common ancestry, although not used as input data. Overall, cultivars that shared one or more agronomic or management characteristics were likely to be grouped. An important characteristic in each variety is the maturity period. Indentations A and B comprised mostly medium to late maturing plants, except for IPR 107 and Siriema, which are early ones. The family of origin of the cultivars may also have contributed to the increased homogeneity in the groups. Most cultivars analyzed have a common ancestral, originated from crossbreeding Catuaí Vermelho-Icatu IAC or Catuaí Vermelho-Mundo Novo. The exceptions are Beija Flor and Azulão (originated from Catucaí) and Arara (originated from Obatã IAC 1669-20). In Group 3, the cultivars IAC 3439-4 and IAC 4520, originated from similar crossings, just as there is similarity between the cultivars IPR 106 and IPR 103. For Group 4, the similarity of the cultivars is on crossings of origin, which are: Catuaí Vermelho-Icatu IAC, Catuaí Vermelho-Mundo Novo, excluding only the variety Arara with origin from Obatã IAC 1669-20. 3.3 Decision Tree Classifier and Confusion Matrix The DT algorithm had a different number of max depths for each group (Figure 6). The training dataset of Group 1 obtained the best MCC, while Group 3 achieved the lowest coefficients, with emphasis on the ESP standing out, however, still with a positive correlation of 0.65. It is worth emphasizing that the best coefficients were achieved in WSP, except for Group 4 where ESP had the best coefficient (Table 4). 28 Figure 6. Decision Tree for the training samples for the WSP and ESP for: a. Group 1, b. Group 2, c. Group 3 and, d. Group 4. 29 Table 4. Matthews Coefficient for the training and test plots of each group for the non- exposure (NES) and sunlight exposure (SES) sides. Group Matthews Coefficient Training Test WSP ESP WSP ESP 1 1 1 0.91 0.93 2 0.85 0.78 0.86 0.72 3 0.80 0.65 0.67 0.65 4 0.88 0.92 0.87 0.88 ESP – East Side of the Plant; WSP – West Side of the plant. For the test plots the behavior was similar, with the highest coefficients in Group 1 and the lowest in Group 3. The small number of cultivars in this group was a limiting factor for the results obtained by the MCC to be above acceptable (LANGE; HEDDERICH; KLAKOW, 2019). In general, the test pilot had a lower coefficient when compared to training, due to the classification of samples into different force classes than the true ones (Figure 7). 30 Figure 7. Confusion matrix for the test samples for the WSP and ESP for: a. Group 1, b. Group 2, c. Group 3 and, d. Group 4. The use of the classifier algorithm allowed the classification according with the force required to detach the fruits. The quality of the classification can vary according to the discrepancy between the detachment forces of green and cherry fruits, the 31 greater this difference, the lower the possibility of errors in the classification (BARROS et al., 2018). In other applications of DT in coffee culture it is possible to identify leaf rust disease. Using several vegetation indices as variables, leaf samples with four levels of disease severity the DT identify the plants with the disease in the early and later stages (MARIN et al., 2021). The DT for the F classification can be use with other methods to improve the selective mechanized harvest. With the introduction of methods that allows the harvester to apply different vibrations, the DT can provide different levels of force to be applied by the detaching rods, considering the difference of necessary force caused by de sunlight exposure. Otherwise, the present study needs to be improved with more numerical variables, number of samples per group, more maturity data e.g., percentage of green, cherry fruits and the level of maturation of each fruit. 4. Conclusion The machine learning approach successfully classify the class detachment force of coffee fruits. The lower classification quality in some groups does not invalidate the method. The insertion of new numerical variables for the factors, maturity stage and percentage of fruits in the different periods, can be used as new inputs to improve the algorithm. References BARROS, M. M. DE et al. Use of classifier to determine coffee harvest time by detachment force. Revista Brasileira de Engenharia Agrícola e Ambiental, v. 22, p. 366–370, maio 2018. BRANDÃO, I. R. et al. Physiological and ultrastructural analysis reveal the absence of a defined abscission zone in coffee fruits. Bragantia, v. 75, p. 386–395, 6 out. 2016. CHICCO, D.; JURMAN, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, v. 21, n. 1, p. 6, 2 jan. 2020. COELHO, A. L. DE F. et al. Determinação das propriedades geométricas, físicas e mecânicas do sistema fruto-pedúnculo-ramo do cafeeiro. Revista Brasileira de Engenharia Agrícola e Ambiental, v. 19, p. 286–292, mar. 2015. DALMAIJER, E. S.; NORD, C. L.; ASTLE, D. E. Statistical power for cluster analysis. BMC Bioinformatics, v. 23, n. 1, p. 205, 31 maio 2022. 32 FERREIRA JÚNIOR, L. DE G. et al. Characterization of the coffee fruit detachment force in crop subjected to mechanized harvesting. jan. 2018. GODINHO, J. DE D. et al. The best moment to carry out the selective harvest of coffee fruits. Agronomy Journal, v. 114, n. 6, p. 3297–3305, 2022. KAMI, C. et al. Chapter Two - Light-Regulated Plant Growth and Development. Em: TIMMERMANS, M. C. P. (Ed.). Current Topics in Developmental Biology. Plant Development. [s.l.] Academic Press, 2010. v. 91p. 29–66. KAZAMA, E. H. et al. Methodology for selective coffee harvesting in management zones of yield and maturation. Precision Agriculture, v. 22, n. 3, p. 711–733, 2021. KOTSIANTIS, S. B. Decision trees: a recent overview. Artificial Intelligence Review, v. 39, n. 4, p. 261–283, 1 abr. 2013. LANGE, L.; HEDDERICH, M. A.; KLAKOW, D. Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels. arXiv, , 4 nov. 2019. Disponível em: . Acesso em: 23 nov. 2022 MARIN, D. B. et al. Detecting coffee leaf rust with UAV-based vegetation indices and decision tree machine learning models. Computers and Electronics in Agriculture, v. 190, p. 106476, 1 nov. 2021. OLIVEIRA, B. R. DE [UNESP. A luz solar e a agressividade da colheita de café afetam a qualidade da operação? 26 mar. 2021. PANHALKAR, A. R.; DOYE, D. D. Optimization of decision trees using modified African buffalo algorithm. Journal of King Saud University - Computer and Information Sciences, v. 34, n. 8, Part A, p. 4763–4772, 1 set. 2022. PÉREZ-ORTIZ, M. et al. A Review of Classification Problems and Algorithms in Renewable Energy Applications. Energies, v. 9, n. 8, p. 607, ago. 2016. PRIYA, R.; RAMESH, D. ML based sustainable precision agriculture: A future generation perspective. Sustainable Computing: Informatics and Systems, v. 28, p. 100439, 1 dez. 2020. RAMALHO, M. E. O.; SOARES, N. M. CAFÉ E SEUS BENEFÍCIOS. Revista Interface Tecnológica, v. 15, n. 1, p. 285–292, 30 jun. 2018. REAY, D. Climate-Smart Coffee. Em: REAY, D. (Ed.). Climate-Smart Food. Cham: Springer International Publishing, 2019. p. 93–104. SÁGIO, S. A. et al. Identification and expression analysis of ethylene biosynthesis and signaling genes provides insights into the early and late coffee cultivars ripening pathway. Planta, v. 239, n. 5, p. 951–963, 1 maio 2014. 33 SANTIN, M. R. et al. CICLO DE MATURAÇÃO E FORÇA DE DESPRENDIMENTO DOS FRUTOS DE CAFÉ CONILON EM CULTIVO IRRIGADO NO CERRADO. p. 5, 2015. SENINDE, D. R.; CHAMBERS, E. Coffee Flavor: A Review. Beverages, v. 6, n. 3, p. 44, set. 2020. SHARMA, D.; KUMAR, N. A Review on Machine Learning Algorithms, Tasks and Applications. v. 6, p. 2278–1323, 1 out. 2017. SOARES, L. DOS S. et al. Interaction between climate, flowering and production of dry coffee (Coffea arabica L.) in Minas Gerais. Coffee Science - ISSN 1984-3909, v. 16, p. e161786–e161786, 15 jun. 2021. SOUZA, G. S. D. et al. FORÇA DE DESPRENDIMENTO DE FRUTOS DE CAFÉ CONILON. Pensar Acadêmico, v. 16, n. 1, p. 6, 2018. TANGIRALA, S. Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm*. International Journal of Advanced Computer Science and Applications, v. 11, n. 2, 2020. TINOCO, H. A. et al. Finite element modal analysis of the fruit-peduncle of Coffea arabica L. var. Colombia estimating its geometrical and mechanical properties. Computers and Electronics in Agriculture, v. 108, p. 17–27, 1 out. 2014. TINOCO, H. A.; PEÑA, F. M. Finite Element Analysis of Coffea arabica L. var. Colombia Fruits for Selective Detachment Using Forced Vibrations. Vibration, v. 1, n. 1, p. 207–219, set. 2018. TKACZYNSKI, A. Segmentation Using Two-Step Cluster Analysis. Em: DIETRICH, T.; RUNDLE-THIELE, S.; KUBACKI, K. (Eds.). Segmentation in Social Marketing: Process, Methods and Application. Singapore: Springer, 2017. p. 109–125. VILLIBOR, G. P. et al. Dynamic behavior of coffee fruit-stem system using modeling of flexible bodies. Computers and Electronics in Agriculture, v. 166, p. 105009, 1 nov. 2019. YEOH, L.; NG, K. S. Future Prospects of Spent Coffee Ground Valorisation Using a Biorefinery Approach. Resources, Conservation and Recycling, v. 179, p. 106123, 1 abr. 2022. a892dd1ec25afdf0cc5a5eca38c3b0eecb93fffde8beae421d0472da7ccbdc7c.pdf a892dd1ec25afdf0cc5a5eca38c3b0eecb93fffde8beae421d0472da7ccbdc7c.pdf a892dd1ec25afdf0cc5a5eca38c3b0eecb93fffde8beae421d0472da7ccbdc7c.pdf