A new computer vision-based approach to aid the diagnosis of Parkinson’s disease Clayton R. Pereira a, Danilo R. Pereira b, Francisco A. Silva b, João P. Masieiro b, Silke A.T. Weber c, Christian Hook d, João P. Papa e,* a Department of Computing, Federal University of São Carlos, Brazil b University of Western São Paulo, Brazil c Department of Ophthalmology and Otorhinolaryngology, São Paulo State University, Brazil d Ostbayerische Technische Hochschule, Germany e Department of Computing, São Paulo State University, Brazil A R T I C L E I N F O Article history: Received 2 March 2016 Received in revised form 9 June 2016 Accepted 11 August 2016 A B S T R A C T Background and Objective: Even today, pointing out an exam that can diagnose a patient with Parkinson’s disease (PD) accurately enough is not an easy task. Although a number of tech- niques have been used in search for a more precise method, detecting such illness and measuring its level of severity early enough to postpone its side effects are not straight- forward. In this work, after reviewing a considerable number of works, we conclude that only a few techniques address the problem of PD recognition by means of micrography using computer vision techniques. Therefore, we consider the problem of aiding automatic PD di- agnosis by means of spirals and meanders filled out in forms, which are then compared with the template for feature extraction. Methods: In our work, both the template and the drawings are identified and separated au- tomatically using image processing techniques, thus needing no user intervention. Since we have no registered images, the idea is to obtain a suitable representation of both tem- plate and drawings using the very same approach for all images in a fast and accurate approach. Results: The results have shown that we can obtain very reasonable recognition rates (around ≈67%), with the most accurate class being the one represented by the patients, which out- numbered the control individuals in the proposed dataset. Conclusions: The proposed approach seemed to be suitable for aiding in automatic PD di- agnosis by means of computer vision and machine learning techniques. Also, meander images play an important role, leading to higher accuracies than spiral images. We also observed that the main problem in detecting PD is the patients in the early stages, who can draw near-perfect objects, which are very similar to the ones made by control patients. © 2016 Elsevier Ireland Ltd. All rights reserved. Keywords: Parkinson’s disease Pattern recognition Micrography * Corresponding author. Department of Computing, São Paulo State University, Brazil. Fax: +55-14-3103-6079. E-mail address: papa@fc.unesp.br (J.P. Papa). http://dx.doi.org/10.1016/j.cmpb.2016.08.005 0169-2607/© 2016 Elsevier Ireland Ltd. All rights reserved. c om pu t e r m e thod s and p r og r am s i n b i om ed i c i n e 1 3 6 ( 2 0 1 6 ) 7 9 – 8 8 journal homepage: www.int l .e lsevierheal th .com/ journals /cmpb mailto:papa@fc.unesp.br http://www.intl.elsevierhealth.com/journals/cmpb http://crossmark.crossref.org/dialog/?doi=10.1016/j.cmpb.2016.08.005&domain=pdf 1. Introduction Parkinson’s disease (PD) is a degenerative, chronic, and pro- gressive illness that may cause tremors, slowness of movement, muscle stiffness, and changes in speech and writing skills due to the neurological disorder [1]. PD was first described by the English physician James Parkinson [2], with its symptoms being well-known in the scientific community. However, to diag- nose Parkinson’s disease with a reliable recognition rate in its early stages is still unheard of. Moreover, it is not straightfor- ward to establish the PD level soon after its diagnosis. Parkinson’s disease occurs when nerve cells that produce dopamine are destroyed, a process that is performed slowly, thus characterizing the progression of this disease. With the absence of such a substance, the nerve cells can no longer send messages properly, causing many other symptoms such as de- pression, sleep disturbances, memory impairment and autonomic nervous system disorders. In some cases, Parkin- son’s disease may be trigged by hereditary causes [1]. In the last decades, some works attempted at designing so- lutions to aid PD diagnosis. Expert systems based on machine learning techniques have been employed to this purpose, showing promising results [3]. Generally, these works are signal analysis-oriented, which means one can use the patient’s voice to assess the level of the illness [4,5], since the voice capabil- ity is gradually compromised by PD. Little et al. [4], for instance, presented a dataset composed of biomedical voice measure- ments from 31 male and female subjects, of which 23 patients were diagnosed with PD and 8 were healthy subjects. The authors introduced a new measure of dysphonia called Pitch Period Entropy, which seems to be more robust in identifying changes in the speech, since approximately 90% of PD pa- tients exhibit some form of vocal impairment [6,7]. In the work conducted by Zhao et al. [8], five patients and seven healthy individuals were used to recognize Parkinson’s disease by means of voice analysis. In order to fulfill this purpose, voices of the patients were recorded using an Isomax EarSet E60P5L microphone; the recording sessions lasted around 25 minutes each, and the authors used a total of 50 pre- recorded prompts consisting of emotional sentences spoken by a professional actress. Tsanas et al. [9] evaluated different algorithms based on dysphonia measures aiming at PD rec- ognition. A total of 132 acoustic features were initially used for further feature selection, and the authors concluded that the dysphonia information and the existing features end up helping PD recognition. Harel et al. [10] claimed that PD symptoms are detectable up to five years prior to clinical diagnosis, and symp- toms presented in speech include reduced loudness, increased vocal tremor, and breathiness. In their work, the authors used a dataset of the National Center for Voice and Speech, which comprises 263 phonations from 43 subjects (17 females and 26 males, of which 10 were healthy controls and 33 were di- agnosed with PD). Since one of the first manifestation of Parkinson’s Disease is the deterioration of handwriting, the micrography (a writing exam) is another approach widely used for the diagnosis of Par- kinson’s disease [11]. This technique is considered an objective measure, since a PD patient possibly features the reduction of calligraphy size, as well as the hand tremors. Nowadays, this procedure is often conducted by filling out some specific forms. Rosenblum et al. [12] suggested that writing exams can be used to distinguish PD patients from healthy individuals.The authors employed the following methodology to support their assump- tion: 20 PD patients and 20 control individuals were asked to write their names and addresses in a piece of paper attached to a digital table. Further, for each stroke, the mean pressure and velocity were measured in order to compute spatial and temporal information. The authors presented very good rec- ognition rates, with 97.5% of the participants classified correctly (100% of the control individuals, and 95% of PD patients). Later on, Drotár et al. [13] claimed that movement during handwrit- ing of a text consists not only from the on-surface movements of the hand, but also from the in-air trajectories performed when the hand moves in the air from one stroke to the next. The authors demonstrated the assessment of in-air hand move- ments during sentence handwriting has a higher impact than the pure evaluation of on surface movements, leading to clas- sification accuracies of 84% and 78%, respectively. Machine learning-based techniques have also been applied to help automatic PD recognition. Spadotto et al. [14], for in- stance, introduced the Optimum-Path Forest (OPF) [15,16] classifier to the aforementioned context. Later on, Spadotto et al. [17] proposed an evolutionary-based approach to select the most discriminative set of features in order to improve PD recogni- tion rates. Gharehchopogh and Mohammadi [18] used Artificial Neural Networks with Multi-Layer Perceptron to diagnose the effects caused by Parkinson’s disease. Pan et al. [19] analyzed the performance of Support Vector Machines with Radial Basis Function in order to compare the onset of tremor in patients with Parkinson’s disease. Hariharan et al. [20] developed a new feature weighting method using Model-based clustering (Gauss- ian mixture model) in order to enrich the discriminative ability of the dysphonia-based features, thus achieving 100% of clas- sification accuracy. Recently, Peker et al. [21] used sound- based features and complex-valued neural networks to aid PD diagnosis as well. However, although many works deal with voice- and speech- driven information, there is a large number of writing exams out there that can give us valuable information about the de- velopment of Parkinson’s Disease, since it is cheaper and easier to acquire such sort of exam. Moreover, most hospitals and clinics have writing exams by hand only, which means they need to be digitized prior to information extraction. Usually, the patients are asked to draw spirals and meanders, which are then compared against the templates.Very recently, Pereira et al. [22] proposed to extract features from writing exams using image processing techniques, achieving around 79% of recog- nition rates, which is considered very reasonable. The authors also designed and made available a dataset called “HandPD” with all images and features extracted.1 However, they em- ployed “spirals” drawings only. In this paper, we extended the work of Pereira et al. [22] by presenting the following contributions: (i) a deeper analysis and explanation about the feature extraction process, as well as a tremor-based feature is also analyzed; (ii) we considered both spirals and meanders for the classification process; and (iii) we 1 http://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/. 80 c om pu t e r m e thod s and p r og r am s i n b i om ed i c i n e 1 3 6 ( 2 0 1 6 ) 7 9 – 8 8 http://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd/ also extended “HandPD” dataset with images and features from meanders. Since we are committed with science, we also made available to the readers this new dataset, and we believe it can serve as a basis for future research regarding Parkinson’s Disease diagnosis. The proposed approach is innovative in the sense that we can extract both the template and the draw- ings of each patient automatically, thus having no user intervention. The remainder of this paper is organized as follows. Section 2 presents the methodology employed to design the dataset and the approach proposed to extract visual features from the handwriting exams. Section 3 states the experimental results and discussion, and Section 4 states conclusions and future works. 2. Materials and methods In this section, we present the dataset built in this work, as well as the proposed approach to extract visual features from the exams. 2.1. HandPD dataset The HandPD dataset was collected at the Faculty of Medicine of Botucatu, São Paulo State University, Brazil. It is composed of images extracted from handwriting exams of 92 individu- als, divided in two groups: (i) the first one contains 18 exams of healthy people, named control group, with 6 male subjects and 12 female individuals; (ii) the second group contains 74 exams of people affected with Parkinson’s disease, named patient group, having 59 male and 15 female subjects. There- fore, 80.44% of the dataset is composed of patients, and 19.56% is composed of control individuals. Although the dataset is un- balanced, it is easier to achieve similar proportions by adding more control individuals than patients. The control group is composed of 16 right-handed and 2 left-handed individuals, with an average age of 44.22 ± 16.53 years. In regard to the patient group, we have 69 right-handed and 5 left-handed individuals with an averageage of 58.75 ± 7.51 years. Therefore, one can observe that the dataset is not age- biased, which provides an interesting scenario for learning purposes. In fact, most patients are considerably older than 60 years, since Parkinson’s disease usually gets worse within this age group. On the other hand, the dataset is heterogeneous enough to contain a 38-year-old male patient as well. In order to compose the dataset, each subject is asked to fill a form in order to fulfill some task, such as drawing circles, spirals and meanders. Fig. 1a displays an exam of a 56-year- old male patient, in which we can observe the tremor inherent to Parkinson’s disease. Note that the patient is required to perform 6 distinct activities (Fig. 1a–f), which consist in the rep- etition of several operations in accordance with certain drawings. However, the analysis of the images will be focused on tasks c and d only, which are related to drawing 4 spirals and 4 meanders according to the template. Fig. 1b depicts an empty form, in which one can observe the templates regard- ing spirals and meanders. After being filled out, the forms are digitized for the further extraction of spirals and meanders. Such step is performed by hand, where each drawing is cropped to its minimum bound- ing box (or close to it). Soon after, the cropped spiral and meander images are numbered a follows: 1, 2, 3, 4 concern- ing the spirals from left to right, and 5, 6, 7, 8 concerning the meanders from left to right.Therefore, the entire dataset is com- posed of 736 images labeled in two groups: patients (296) and control (72). Also, the dataset comprises 368 images from each drawing, i.e., spirals and meanders. The reader can refer to the HandPD home-page for more technical details about organi- zation of the dataset. 2.2. Feature extraction from visual description In this section, we describe the methodology used to extract the features and keypoints from spiral and meander forms. In order to fulfill this task, we split the proposed methodology in two stages: (i) image preprocessing and (ii) the feature ex- traction. In the first stage (Section 2.2.1), we design an approach to automatically separate the handwritten trace (HT) from the exam template (ET), considering both spirals and meanders, since the images are not registered to each other. Soon after, in the second stage (Section 2.2.2), we used the HT and ET ex- tracted from images to compute the visual features. 2.2.1. Handwritten trace and exam template In order to extract both HT and ET, we merged some classical image processing techniques such as blurring filters and math- ematical morphology, with the process of extracting either HT or ET contours performed separately. Since the images were digitized, we applied a preprocessing step to reduce noise and undesirable artifacts by means of a 5 × 5 mean filter.2 Later on, we extracted the exam template by applying a thresholding in the smoothened image, aiming to obtain a binary mask M IET i ( ) . This step is performed as follows: M I R I G I B I ET i i i i ( ) = ( ) < ∧ ( ) < ∧ ( ) <⎧ ⎨ ⎩ 0 100 100 100 1 if otherwise, (1) where Ri(I), Gi(I) and Bi(I) stand for the value of pixel i of the input image I considering the channels “Red”, “Green” and “Blue”, respectively. If Equation 1 is satisfied, the foreground (ET) pixels will be set to 0 (“black” color), and the background pixels will be set to 1 (“white” color), as displayed in Fig. 3. Since the ET in the original image is supposed to be black or near- black (the original—empty—form is colorless), it is reasonable to assume low brightness values for such pixels when looking for the form itself. Finally, we applied an opening operation (erosion followed by a dilation) to guarantee a fully con- nected ET. Fig. 2 shows the proposed pipeline for the ET extraction. In regard to the HT extraction step, we employed a similar methodology to the one used to extract the ET, but now with some additional steps and a different thresholding method, since both HT and the background are blue-colored due to the digitation process. First, we applied a 5 × 5 mean filter fol- lowed by a 5 × 5 median filter to smooth the image in order 2 Notice that the size of this convolutional kernel was set up empirically. 81c om pu t e r m e thod s and p r og r am s i n b i om ed i c i n e 1 3 6 ( 2 0 1 6 ) 7 9 – 8 8 to reduce noise and small artifacts, mainly those around the HT’s borders (once again, both filter sizes were determined em- pirically). Further, the filtered image F is thresholded using the following equation: M F R F G F R F B F G B F HT i i i i i i i i ( ) = ( ) − ( ) < ∧ ( ) − ( ) < ∧ ∧ − < 255 40 40 40 if otherrwise, ⎧ ⎨ ⎪ ⎩ ⎪ (2) Fig. 1 – Handwriting exams (a) filled out by a 56-year-old PD patient, and (b) an empty exam with the templates. Fig. 2 – Image processing steps concerning ET extraction. 82 c om pu t e r m e thod s and p r og r am s i n b i om ed i c i n e 1 3 6 ( 2 0 1 6 ) 7 9 – 8 8 where Fi represents the brightness of pixel i. The intuitive idea behind this step is to remove pixels with quasi-similar values for the three channels (i.e., background pixels), and to main- tain pixels with considerable differences between the channels (foreground—HT—pixels). Fig. 3 shows the proposed pipeline for the HT extraction3. As a matter of fact, the fixed thresh- olds employed in this work obtained better results than some automatic approaches. Although they may not work well with images acquired from a different procedure to the one used here, they seemed to be very suitable concerning the proto- col adopted in this work. Fig. 4 illustrates a spiral and a meander image and their cor- responding ET and HT extracted using the proposed methodology. One can observe the quality of both template and trace extracted from the images. 2.2.2. Feature extraction The feature extraction step aims at describing both HT and ET, and then to compare them in order to evaluate the “amount of difference” between the two images. In fact, this differ- ence among images is computed over points sampled at the 3 Notice that the value 255 in Equation 2 stands for the triplet (255, 255, 255), since we have an RGB image as the result of thresholding operation. Fig. 3 – Image processing steps concerning HT extraction. Fig. 4 – Spiral and meander images and their corresponding HT and ET extracted using the proposed methodology for a (a) spiral and a (b) meander. 83c om pu t e r m e thod s and p r og r am s i n b i om ed i c i n e 1 3 6 ( 2 0 1 6 ) 7 9 – 8 8 very same positions considering HT and ET images. At each point, we extracted a set of features that will represent the whole template or handwritten trace. First, we need a concise and compact representation of both HT and ET, which is ac- complished here by means of the skeleton of the thresholded images. Therefore, we extracted the skeleton of HT and ET images based on the Zhang–Suen thinning algorithm [23], which consists of two parallel routines: (i) to remove the south-east boundary points and the north-west corner points, and (ii) to remove the north-west boundary points and the south-east corner points. Fig. 5 depicts the thinning result of the spiral and meander templates, as well as the handwritten trace. Even after the pre-processing step, the template and hand- written images may contain small discontinuities (blue lines in Fig. 6(a)).Therefore, we need to select the sample points from the template and handwritten spiral/meander very carefully. As such, points in regions that contain discontinuities should be discarded. This phase is crucial, since it has a consider- able influence in the feature extraction step, which may affect the learning process as well. In regard to the selection of sampled points, we trace 360 rays4 from the center of the spiral/meander to the image borders. For this task, we created two empty lists: (i) tem- plate points and (ii) handwritten points.The ray tracing process begin from the more extern point of the spiral or the meander. For each ray, we capture its intersections with the template and the handwritten trace, and if this ray intercepts only one of the images, this point is discarded; otherwise, the pair of points is inserted in their respective list of points (template or handwritten). Therefore, with the aforementioned proce- dure, we can guarantee a fair sampling by considering only points presented in both images. Fig. 6(b) shows a thinned meander with overlapped traces (template and handwrit- ten), as well as the highlighted sampling points obtained by means of the proposed fair sampling process. 4 Notice that the value 360 was obtained empirically, since this amount of sampling points has showed a good trade-off between efficiency and accuracy. Fig. 5 – Thinning of HT and ET using Zhang–Suen algorithm. Fig. 6 – Sampling process: (a) a certain region with discontinuities, and (b) the proposed fair sampling process. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) 84 c om pu t e r m e thod s and p r og r am s i n b i om ed i c i n e 1 3 6 ( 2 0 1 6 ) 7 9 – 8 8 Further in the sampling process, we then extract nine numeric features from each skeleton (i.e., HT and ET) by mea- suring the statistical differences between them. However, prior to the feature description, we introduce to the reader the defi- nition of “radius” of a spiral or meander point, which is basically the length of the straight line that connects this point to the center of the spiral or meander, as displayed in Fig. 7. The “red” point stands for the spiral’s or meander’s center, being some random (“white”) points connected to the thinned spiral (skel- eton) through the arrows with straight lines. A brief description of each feature is given below: • f1: Root Mean Square (RMS) of the difference between HT and ET radius. The RMS is computed as follows: RMS = −( ) = ∑1 2 1n r rHT i ET i i n , (3) where n is the number of sample points drawn from each HT and ET skeleton, and rHT i and rET i denote the HT and ET radius considering the i-th sampled point, respectively. • f2: the maximum difference between HT and ET radius, i.e.: Δmax i HT i ET ir r= −{ }argmax ; (4) • f3: the minimum difference between HT and ET radius, i.e.: Δmin i HT i ET ir r= −{ }argmin ; (5) • f4: the standard deviation of the differences between HT and ET radius; • f5: Mean Relative Tremor (MRT): Pereira et al. [22] proposed this quantitative evaluation to measure the “amount of tremor” of a given individual’s HT, being defined as the mean difference between the radius of a given sample and its d left-nearest neighbors. The MRT is computed as follows: MRT = − − − + = ∑1 1 n d r rET i ET i d i d n , (6) where d is the displacement of the sample points used to compute the radius difference.5 The following three fea- tures are computed based on the relative tremor r rET i ET i d− − +1 ; • f6: the maximum ET; • f7: the minimum ET; • f8: the standard deviation of ET values; • f9: the number of times the difference between HT and ET radius changes from negative to positive, or vice-versa. Finally, the features were normalized as follows: f f i i i i ′ = − μ σ , (7) where fi′ denotes the normalized version of feature fi, and µi and σi stand for the average and standard deviation of feature fi, i = 1, 2, …, 9. 3. Experiments and results In this section, we present the experimental results to access the robustness of the proposed dataset and feature extrac- tion approach.6 Also, we evaluate three pattern recognition techniques: Naïve Bayes (NB), Optimum-Path Forest (OPF), and Support Vector Machines with Radial Basis Function (SVM- RBF). Note that the kernel parameters concerning SVM are optimized through cross-validation. In regard to OPF, we used LibOPF [24], and with respect to NB and SVM, we used scikit- learn [25]. In order to evaluate the proposed approach, we performed three different rounds of experiments. The first one (Section 3.1) uses 75% of the dataset for training purposes and the re- maining 25% for the classification phase. However, instead of partitioning the dataset randomly, we created four subsets in order to guarantee that each individual will be represented in the dataset with its 3 spirals/meanders, with the remaining one being used for classification purposes. In this experi- ment, the spiral- and meander-based datasets are used individually. In the second experiment (Section 3.2), we decided to conduct a cross-validation procedure with 20 runnings. Now, we no longer guarantee each individual will be represented in both training and test sets. In the third round (Section 3.3), we conducted some experiments in order to check whether we can benefit from the learning process over spirals and mean- ders in one single approach, i.e. by using them together. Finally, we present a discussion about the experiments, as well as some insights about this research. 3.1. Experiment 1 Since each individual contains four spirals/meanders in the datasets, we employed a constrained hold-out approach to 5 In this work, we used d = {1, 3, 5, 7, 10, 15, 20}, with d = 10 being the one that maximized the PD recognition rate. 6 The proposed dataset and the extracted features are available at http://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd. Fig. 7 – Some random points and the straight lines representing their connections with the spiral’s and meander’s center point. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) 85c om pu t e r m e thod s and p r og r am s i n b i om ed i c i n e 1 3 6 ( 2 0 1 6 ) 7 9 – 8 8 http://wwwp.fc.unesp.br/~papa/pub/datasets/Handpd guarantee that each of them will be represented in both train- ing and testing sets concerning both spiral- and meander- based datasets. Tables 1 and 2 display the mean recognition rates considering all four configurations of training and test sets for the spiral- and meander-based datasets, respectively. One can observe that NB obtained the best global results con- cerning Spiral dataset, while SVM achieved the best results over Meander dataset. Notice that the global accuracy is the one pro- posed by Papa et al. [15], which considers unbalanced datasets, while the recognition rates per class (i.e. Control and Patient groups) are computed using the standard approach (the ratio between correct classifications and the total number of samples for that specific class). The values in bold stand for the most accurate ones considering the standard deviation only. As afore- mentioned, in this round of experiments we used a similar approach to a 4-fold cross-validation, but we ensured that we have three drawings for the same individual for training pur- poses. Therefore, we can guarantee all individuals are represented in both training and test sets. However, as we have only four accuracy values to compute the mean recognition rates and their standard deviation, we did not employ any robust statistical evaluation in this experiment. Curiously, a different behavior considering each classifier and both datasets can be observed. Note that OPF obtained better results over meander dataset when compared to the spirals dataset, while NB holds the opposite situation. Such situ- ation motivated us to consider a bag-of-classifiers in order to check whether a combination among all classifiers will make the results better or not (Section 3.3). 3.2. Experiment 2 In this section, we consider a cross-validation procedure with 20 runnings to assess the robustness of the proposed ap- proach under a different scenario. Therefore, we can no longer guarantee each individual will be represented in both train- ing and test sets, but we can obtain more conclusive results by means of the Wilcoxon signed-rank statistical test [26]. In this work, we used a significance of 0.05. Tables 3 and 4 present the mean recognition rates considering spiral- and meander- based datasets, respectively. Once again, we can observe results very similar to the ones obtained in the previous section. The values in bold stand for the most accurate techniques con- sidering the aforementioned statistical evaluation. Since the dataset is dominated by patients, all classifiers achieved better recognition rates for that class, except for NB considering spiral and meander datasets. In fact, with respect to this classifier, the accuracy rates per class were similar to each other considering the spiral dataset, but considerably dis- tinct with respect to meanders. NB seemed to better manage control individuals, but we can also observe the highest stan- dard deviations for this classifier as well. 3.3. Experiment 3 In this section, we conducted an experiment to check whether we can benefit from information learned from both draw- ings. We used a standard majority voting for each classifier, and in case of ties, we opted to use the classification given by the meanders dataset, since it has been the most accurate (Section 3.2). Table 5 presents the mean accurate rates for each class, as well as the global accuracy. Notice that we used the very same sets employed in the first experiment (Section 3.1), since we can guarantee that both spiral and meander ana- lyzed at a given time step of the classification algorithm come from the same individual. The results evidenced that one may not benefit from the combined information between spirals and meanders, since the results are now worse than the ones obtained with me- anders only. The main problem is related to the inconsistency among samples from the control and patient groups. That means we can not observe that different drawings can help each other since we have inconsistencies at the very same exam for different patients. In the next section we discuss such state- ments in more details. Table 1 – Experimental results considering the spiral- based dataset. OPF NB SVM Control group 31.94% ± 5.32 62.50% ± 5.32 2.78% ± 5.56 Patient group 76.35% ± 3.22 69.26% ± 7.18 99.66% ± 0.68 Global 54.15% ± 3.58 65.88% ± 4.57 51.22% ± 2.91 Table 2 – Experimental results considering the meander-based dataset. OPF NB SVM Control group 34.72% ± 8.33 20.83% ± 41.67 36.11% ± 9.62 Patient group 85.81% ± 4.20 79.73% ± 33.34 96.62% ± 2.59 Global 60.27% ± 4.02 50.28% ± 4.18 66.37% ± 4.01 Table 3 – Average results considering the spiral-based dataset and a cross-validation with 20 runnings. OPF NB SVM Control group 26.39% ± 9.17 65.56% ± 11.48 1.67% ± 4.07 Patient group 78.58% ± 5.02 62.91% ± 12.65 98.65% ± 4.34 Global 52.48% ± 5.32 64.23% ± 7.11 50.16% ± 1.71 Table 4 – Average results considering the meander- based dataset and a cross-validation with 20 runnings. OPF NB SVM Control group 32.78% ± 12.08 80.83% ± 16.37 36.94% ± 10.71 Patient group 82.30% ± 3.72 37.57% ± 22.83 96.49% ± 2.50 Global 57.54% ± 6.35 59.20% ± 4.78 66.72% ± 5.33 Table 5 – Average results considering the combination process between spirals and meanders using the constrained 4-fold approach. OPF NB SVM Control group 64.96% ± 16.29 27.30% ± 37.36 12.50% ± 25.00 Patient group 60.23% ± 4.73 70.36% ± 39.08 96.49% ± 2.50 Global 55.86% ± 3.63 45.79% ± 4.15 58.61% ± 2.84 86 c om pu t e r m e thod s and p r og r am s i n b i om ed i c i n e 1 3 6 ( 2 0 1 6 ) 7 9 – 8 8 3.4. Discussion The experiments conducted in this paper may drive us to three main conclusions: (i) first, to ensure that we have the very same patient in both training and testing sets does not seem to benefit the final classification rates, since the results obtained in Sections 3.1 and 3.2 were very similar to each other; (ii) second, meanders can provide more reliable recognition rates; and (iii) finally, it seems the combination of information provided by both spirals and meanders does not benefit the final classifi- cation rates. The main problem related to PD automatic recognition is the patients in the initial stage of the disease, since they often do not present any symptoms related to tremors. Fig. 8 depicts some examples of spirals from both control and patient groups. If we consider Fig. 8b and 8c, for instance, the former belongs to a control individual, and the latter belongs to a patient. Clearly, the patient exam looks like from someone who is not affected by the disease, i.e., it is very similar to Fig. 8a. The high variability of the dataset may lead the classifiers to errors as well. However, the main idea in designing such dataset is to capture such sort of problems, which are not straightforward to solve. Obviously, Fig. 8d is easier to be labeled as patient than Fig. 8c, but the opposite situation is not true. Actually, the main problem is when Fig. 8c is represented in the training set, not in the test set. In the former situation, this exam has a high probability to be an outlier, thus leading the learning process to mistakes in the classification phase. The latter situation usually only affects that sample only, i.e. it will be probably labeled as control. Although it may decrease the overall clas- sification rate, the major problem is related to the fact that such exam will be a false negative, thus postponing the treatment of the disease. Fig. 9 displays some meanders from both control and patient groups. A similar situation to the one faced with spirals can also be observed with meanders. The high variability of the dataset makes the classifiers more prone to errors, thus turning the problem of identifying PD in the early stages quite com- plicated. However, the proposed approach obtained ≈67% of recognition rates using meanders, which we consider a very suitable result. As aforementioned, we have not noticed any particular image-based dataset available in the internet, as well as with the proposed pipeline for feature extraction adopted in this work. 4. Conclusion In this paper, we dealt with the problem of Parkinson’s Disease recognition by combining machine learning and computer vision techniques. The main contributions are related to the design of a new dataset that contains images from both spirals and meanders, which are cropped out from digitized hand- written exams, and we proposed a pipeline that can deal with the problem of learning from non-registered images. The pro- posed approach can automatically extract both the template and the handwritten trace from each exam for further feature extraction and classification. The experimental results can lead us to conclude that me- anders are more informative than spirals, since the latter pose a greater challenge due to the contours inherent to their shape. Also, the combination of both approaches did not seem to improve the results. The main problem is related to the high variability of the dataset, which comprises patients at the very early stages of the disease, thus being very difficult to be di- agnosed. In regard to future works, we intend to increase the dataset with more samples from the control group, as well as to design new features that can better distinguish between control individuals and patients. Fig. 8 – Spirals from the control group (a, b) and from the patient group (c, d). Fig. 9 – Meanders from the control group (a, b) and from the patient group (c, d). 87c om pu t e r m e thod s and p r og r am s i n b i om ed i c i n e 1 3 6 ( 2 0 1 6 ) 7 9 – 8 8 Acknowledgment The authors are grateful to CAPES PROCAD 2966/2014 grant, FAPESP grants #2009/16206-1, #2013/20387-7 and #2014/2014/ 16250-9, as well as CNPq grants #303182/2011-3, #70571/ 2013-6 and #306166/2014-3. R E F E R E N C E S [1] R.E. Burke, Evaluation of the Braak staging scheme for Parkinson’s disease: introduction to a panel presentation, Mov. Disord. 25 (S1) (2010) S76–S77. [2] J. Parkinson, An essay on the shaking palsy, J. Neuropsychiatry Clin. Neurosci. 14 (2) (1817) 223–236. [3] B.E. Sakar, M.E. Isenkul, C.O. Sakar, A. Sertbas, F. Gurgen, S. Delil, et al., Collection and analysis of a parkinson speech dataset with multiple types of sound recordings, IEEE J. Biomed. Health Inform. 17 (2013) 828–834. [4] M.A. Little, P.E. McSharry, E.J. Hunter, J. Spielman, L.O. Ramig, Suitability of dysphonia measurements for telemonitoring of parkinson’s disease, IEEE Trans. Biomed. Eng. 56 (4) (2009) 1015–1022. [5] J.C. Pereira, A.O. Schelp, A.N. Montagnoli, A.R. Gatto, A.A. Spadotto, L.R. Carvalho, Residual signal auto-correlation to evaluate speech in Parkinson’s disease patients, Arq. Neuropsiquiatr. 64 (4) (2006) 912–915. [6] A.K. Ho, R. Lansek, C. Maricliani, J.L. Bradshaw, S. Gates, Speech impairment in a large sample of patients with parkinson’s disease, Behav. Neurol. 3 (11) (1998) 131–137. [7] J.A. Logemann, H.B. Fisher, B. Boshes, E.R. Blonsky, Frequency and cooccurence of vocal tract dysfunctions in the speech of a large sample of parkinson patients, J. Speech Hear. Disord. 43 (11) (1978) 47–57. [8] S. Zhao, F. Rudzicz, L.G. Carvalho, C. Marquez-Chin, S. Livingstone, Automatic detection of expressed emotion in parkinson’s disease, in: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4813–4817, 2014. [9] A. Tsanas, M.A. Little, P.E. McSharry, J. Spielman, L.O. Ramig, Novel speech signal processing algorithms for high-accuracy classification of parkinson’s disease, IEEE Trans. Biomed. Eng. 59 (5) (2012) 1264–1271. [10] B. Harel, M. Cannizzaro, P.J. Snyder, Variability in fundamental frequency during speech in prodromal and incipient parkinson’s disease: a longitudinal case study, Brain Cogn. 6 (1) (2004) 24–29. [11] T.E. Eichhorn, T. Gasser, N. Mai, C. Marquardt, G. Arnold, J. Schwarz, et al., Computational analysis of open loop handwriting movements in parkinson’s disease: a rapid method to detect dopamimetic effects, Mov. Disord. 11 (3) (1996) 289–297. [12] S. Rosenblum, M. Samuel, S. Zlotnik, I. Erikh, I. Schlesinger, Handwriting as an objective tool for parkinson’s disease diagnosis, J. Neurol. 260 (9) (2013) 2357–2361. [13] P. Drotár, J. Mekyska, I. Rektorová, L. Masarová, Z. Smékal, M. Faundez-Zanuy, Analysis of in-air movement in handwriting: a novel marker for parkinson’s disease, Comput. Methods Programs Biomed. 117 (3) (2014) 405–411. [14] A.A. Spadotto, R.C. Guido, J.P. Papa, A.X. Falcão, Parkinson’s disease identification through optimum-path forest, in: International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 6087–6090, 2010. [15] J.P. Papa, A.X. Falcão, C.T.N. Suzuki, Supervised pattern classification based on optimum-path forest, Int. J. Imag. Syst. Technol. 19 (2) (2009) 120–131. [16] J.P. Papa, A.X. Falcão, V.H.C. Albuquerque, J.M.R.S. Tavares, Efficient supervised optimum-path forest classification for large datasets, Pattern Recognit. 45 (1) (2012) 512–520. [17] A.A. Spadotto, R.C. Guido, F.L. Carnevali, A.F. Pagnin, A.X. Falcão, J.P. Papa, Improving parkinson’s disease identification through evolutionary-based feature selection, in: International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 7857–7860, 2011. [18] F.S. Gharehchopogh, P. Mohammadi, Article: a case study of parkinsons disease diagnosis using artificial neural networks, Int. J. Comput. Appl. 73 (19) (2013) 1–6. [19] S. Pan, S. Iplikci, K. Warwick, T.Z. Aziz, Parkinson’s disease tremor classification, a comparison between support vector machines and neural networks, Expert Syst. Appl. 19 (2012) 10764–10771. [20] M. Hariharan, K. Polat, R. Sindhu, A new hybrid intelligent system for accurate detection of parkinson’s disease, Comput. Methods Programs Biomed. 11 (3) (2014) 904–913. [21] M. Peker, B. Sen, D. Delen, Computer-aided diagnosis of parkinson’s disease using complex-valued neural networks and mRMR feature selection algorithm, J. Healthc. Eng. 6 (3) (2015) 281–302. [22] C.R. Pereira, D.R. Pereira, F.A. da Silva, C. Hook, S.A.T. Weber, L.A.M. Pereira, et al., A step towards the automated diagnosis of parkinson’s disease: analyzing handwriting movements, in: IEEE 28th International Symposium on Computer-Based Medical Systems, pp. 171–176, 2015. [23] T.Y. Zhang, C.Y. Suen, A Fast Parallel Algorithm for Thinning Digital Patterns, ACM, New York, NY, USA, 1984. [24] J. Papa, C. Suzuki, A. Falcao, LibOPF: A library for the design of optimum-path forest classifiers, software version 2.1. , 2014 (accessed 01.02.2016). [25] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, et al., Scikit-learn: machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830. [26] F. Wilcoxon, Individual comparisons by ranking methods, Biomet. Bull. 1 (6) (1945) 80–83. 88 c om pu t e r m e thod s and p r og r am s i n b i om ed i c i n e 1 3 6 ( 2 0 1 6 ) 7 9 – 8 8 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0010 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0010 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0010 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0015 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0015 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0020 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0020 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0020 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0020 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0025 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0025 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0025 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0025 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0030 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0030 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0030 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0030 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0035 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0035 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0035 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0040 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0040 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0040 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0040 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0045 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0045 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0045 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0045 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0045 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0050 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0050 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0050 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0050 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0055 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0055 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0055 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0055 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0060 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0060 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0060 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0060 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0060 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0065 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0065 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0065 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0070 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0070 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0070 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0070 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0075 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0075 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0075 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0075 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0080 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0080 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0080 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0085 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0085 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0085 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0090 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0090 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0090 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0090 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0090 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0095 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0095 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0095 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0100 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0100 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0100 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0100 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0105 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0105 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0105 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0110 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0110 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0110 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0110 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0115 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0115 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0115 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0115 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0115 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0120 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0120 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0125 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0125 http://www.ic.unicamp.br/afalcao/libopf/index.html http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0130 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0130 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0130 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0135 http://refhub.elsevier.com/S0169-2607(16)30189-4/sr0135 A new computer vision-based approach to aid the diagnosis of Parkinson's disease Introduction Materials and methods HandPD dataset Feature extraction from visual description Handwritten trace and exam template Feature extraction Experiments and results Experiment 1 Experiment 2 Experiment 3 Discussion Conclusion Acknowledgment References