ORIGINAL ARTICLE Embedded real-time speed limit sign recognition using image processing and machine learning techniques Samuel L. Gomes1 • Elizângela de S. Rebouças1 • Edson Cavalcanti Neto1 • João P. Papa2 • Victor H. C. de Albuquerque3 • Pedro P. Rebouças Filho1 • João Manuel R. S. Tavares4 Received: 1 September 2015 / Accepted: 21 May 2016 / Published online: 3 June 2016 � The Natural Computing Applications Forum 2016 Abstract The number of traffic accidents in Brazil has reached alarming levels and is currently one of the leading causes of death in the country. With the number of vehicles on the roads increasing rapidly, these problems will tend to worsen. Consequently, huge investments in resources to increase road safety will be required. The vertical R-19 system for optical character recognition of regulatory traffic signs (maximum speed limits) according to Brazilian Standards developed in this work uses a camera positioned at the front of the vehicle, facing forward. This is so that images of traffic signs can be captured, enabling the use of image processing and analysis techniques for sign detec- tion. This paper proposes the detection and recognition of speed limit signs based on a cascade of boosted classifiers working with haar-like features. The recognition of the sign detected is achieved based on the optimum-path forest classifier (OPF), support vector machines (SVM), multi- layer perceptron, k-nearest neighbor (kNN), extreme learning machine, least mean squares, and least squares machine learning techniques. The SVM, OPF and kNN classifiers had average accuracies higher than 99.5 %; the OPF classifier with a linear kernel took an average time of 87 ls to recognize a sign, while kNN took 11,721 ls and SVM 12,595 ls. This sign detection approach found and recognized successfully 11,320 road signs from a set of 12,520 images, leading to an overall accuracy of 90.41 %. Analyzing the system globally recognition accuracy was 89.19 %, as 11,167 road signs from a database with 12,520 signs were correctly recognized. The processing speed of the embedded system varied between 20 and 30 frames per second. Therefore, based on these results, the proposed system can be considered a promising tool with high commercial potential. Keywords Cascade haar-like features � Pattern recognition � Computer vision � Automotive applications & Pedro P. Rebouças Filho pedrosarf@ifce.edu.br Samuel L. Gomes samuelluz@lapisco.ifce.edu.br Elizângela de S. Rebouças elizangelareboucas@lapisco.ifce.edu.br Edson Cavalcanti Neto edsoncavalcantineto@gmail.com João P. Papa papa@fc.unesp.br Victor H. C. de Albuquerque victor.albuquerque@unifor.br João Manuel R. S. Tavares tavares@fe.up.pt 1 Laboratório de Processamento Digital de Imagens e Simulação Computacional, Instituto Federal de Federal de Educação, Ciência e Tecnologia do Ceará (IFCE), Ceará, Brazil 2 Departamento de Ciência da Computação, Universidade Estadual Paulista, Bauru, São Paulo, Brazil 3 Programa de Pós-Graduação em Informática Aplicada, Laboratório de Bioinformática, Universidade de Fortaleza, Fortaleza, CE, Brazil 4 Instituto de Ciência e Inovação em Engenharia Mecânica e Engenharia Industrial, Departamento de Engenharia Mecânica, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal 123 Neural Comput & Applic (2017) 28 (Suppl 1):S573–S584 DOI 10.1007/s00521-016-2388-3 http://orcid.org/0000-0002-1878-5489 http://crossmark.crossref.org/dialog/?doi=10.1007/s00521-016-2388-3&domain=pdf http://crossmark.crossref.org/dialog/?doi=10.1007/s00521-016-2388-3&domain=pdf 1 Introduction Car accidents are one of the major causes of death worldwide. Estimates made by the Secretary of Politics of Social Security, show that, in Brazil, the number of people permanently disabled as a consequence of traffic accidents increased from 33,000 to 352,000 between 2002 and 2012. In addition, the number of deaths increased from 46,000 to 60,000 in the same time period. Consequently, close to one million benefits paid nowadays by the National Social Security Institute (INSS) are for victims of car accidents. This cost represents more than 12 billion Brazilian Reais from INSS funds. The data from the Lider Insurance Company also indicate that most victims are between the ages of 18 and 40. The benefit that generates the greatest expense to the INSS is retirement due to disability, because it is a benefit that is paid for a long period of time and in most cases to young people [21]. Given this scenario, manufacturers such as Volvo, Toyota and Ford are investing in technologies like advanced driver assistance systems (ADAS) in their vehi- cles to ensure the safety of occupants by helping to avoid potential accidents. The United States Department of Transportation estimates that an investment of US $ 1.2 billion in Intelligent Transportation Systems (ITS) tech- nology would generate a return of US $ 30.2 billion in approximately 20 years. Likewise, Japan has been invest- ing US $ 700 million annually in these technologies since 2004, while South Korea plans to invest US $ 3.2 billion between the years of 2008 and 2020 [36]. The goal of ADAS is to assist drivers and consequently to significantly decrease the number of accidents. These systems use technologies such as: global positioning, radar, image sensors and techniques of computer vision. A study by the US Insurance Institute for Highway Safety estimated that the use of the intelligent assistance systems such as: lane departure warning (LDW), forward collision warning (FCW), blind spot detection and adaptable headlights that are already available on the market can prevent or ame- liorate one in three fatal collisions and one in five collisions that result in moderate or severe injuries [36]. Observing this trend, this work explores the use of computer vision as a solution for another major traffic safety problem: the driver’s lack of attention to road signs, which causes a large number of accidents. Many of the ADAS systems use computer vision tech- niques in their operations. For example, lane departure warning system (LDWS) is a warning system that alerts the driver when he or she is veering out of or changing lanes by sending visual, audio and/or vibrational signals. This sys- tem is designed to minimize accidents by addressing the main causes, such as distraction and sleepiness [54, 60]. Adaptive cruise control systems read the speed limit signs through computer vision and alert the driver if he/she is not obeying the limit [61]. Assistance for vehicle parking uses ultra-sonic sensors and/or computer vision to indicate the proximity of objects. Current systems actively control the steering wheel, just leaving the driver with the control of moving the vehicle [12]. Other intelligent assistance systems are the blind spot warning system (BSWS) and sleepiness detector. BSWS is able to detect blind spots on the passenger side. This fea- ture alerts the vehicle’s driver if at the moment he/she is maneuvering, there is any risk of collision [59]. Sleepiness detector is an intelligent system that monitors the driver’s facial expressions to perceive the state of the driver’s attention, alerting the driver that he or she should rest if signs of sleepiness or tiredness are detected [19]. Some car manufacturers have develop technologies, like an autonomous brake technology, that can stop a car when other vehicles or obstacles are very close and provides support to stay in the same lane, by applying a corrective force on the steering when the driver veers from the correct lane. A cruise control adapter can also be used to auto- matically maintain a safe speed and a safe distance rela- tively to other vehicles, which in its most active form can prevent a driver from exceeding the speed limit. Barthès and Bonnifait say that the development of advanced driver assistance systems to assist the driver and inform him/her of the road conditions can significantly contribute to reduce the number of accidents [8]. These systems should respond in real time to be useful and some of the major technologies used to ensure these require- ments are: global positioning systems, image sensors and computer vision. Cavalcanti Neto et al. [36] proposed a system to detect and recognize Brazilian vehicle license plates, in which the registered users have permission to enter a specific area. These authors used techniques of digital image processing, such as Hough transform, morphology, threshold and Canny edge detector to extract the characters, as well as least squares, least mean squares, extreme learning machine and neural network multilayer perceptron to identify the numbers and letters. Neto et al. [36] used motion detection to accelerate the embedded application previously developed because only the moving regions in the image were analyzed, which is not possible here because everything is moving in the input images. The main aim of this work was to develop an android application that can detect and recognize speed limit signs in real time. The system exhibits sign detection in the car as shown in Fig. 1. In order to develop the system, techniques of digital image processing (DIP) are used to extract, i.e., S574 Neural Comput & Applic (2017) 28 (Suppl 1):S573–S584 123 segment, the characters, along with techniques of machine learning (ML) to recognize the symbols obtained during the DIP stage. This paper proposes detection of speed limit signs based on a cascade of boosted classifiers working with haar-like features in the DIP step. It also proposes the normalization of attributes for standardization of characters in the DIP stage, which is usually done in the pattern recognition step. Among the contributions of this work for the pattern recognition step is the evaluation of seven classification methods and some of their variations in terms of suitability as an embedded application in real time. The recognition algorithms used were the k-nearest neighbors (kNN), optimum-path forest classifier (OPF) configured with seven distance functions, least squares (LS), least mean squares (LMS), extreme learning machine (ELM), artificial neural network multilayer perceptron (MLP) and support vector machines (SVM) configured with four kernels. As far as the authors know, this is the first time that the OPF clas- sifier has been analyzed in an embedded system. 2 Speed limit signs detection The methodology proposed for the speed limit sign detection is based on a cascade of boosted classifiers working with haar-like features [30, 56]. This proposal will be compared to the methodology suggested by Neto et al. [36]. 2.1 Based on a cascade of boosted classifiers working with haar-like features Viola and Jones [56] proposed a rapid object detection algorithm using a boosted cascade of simple features, and Lienhart and Maydt [30] proposed an extended set of Haar- like features for rapid object detection. This approach has been applied in various applications to detect objects in real time, such as face detection [17, 33, 47], pedestrian detection [43, 63], license plate detection [64] and object classification in microscopy [3]. However, as far as the authors know, this methodology has never been used for applications such as road sign detec- tion neither incremented in an embedded solution. The classifier used to detect speed limit signs is trained with a few samples of signs, called positive and negative examples [30, 56]. The positive examples included hun- dreds of images with speed signs, and negative examples included arbitrary images without valid signs. After a classifier is trained, it can be applied to a region of interest in an input image. The classifier outputs a ‘‘1’’ if the region is likely to show a sign, and ‘‘0’’ otherwise. To search for the object in the whole image, the search window moves across the image and checks every location using the classifier. The classifier is designed so that it can easily find the objects of interest with different sizes, which is more efficient than resizing the image itself. So, to find an object of an unknown size in the image, the scan procedure is done several times using different scales. 2.2 Based on the Hough transform and Canny edge detector This approach was proposed by Neto et al. [36] for the detection of license plates according to the Brazilian Standards. This approach uses the Canny operator com- bined with the Hough transform to detect objects. The Canny edge detector performs two tasks: the fil- tering of noise and highlighting the pixels defining the border of an object in a digital image [16, 46]. To develop this algorithm, primary studies were focused on optimal borders that can be represented by using functions in one dimension (1-D) [11, 53]. The authors showed that the best filter to start their algorithm was a smoothing algorithm, Fig. 1 Illustration of the system developed in this work Neural Comput & Applic (2017) 28 (Suppl 1):S573–S584 S575 123 the Gaussian operator, followed by a Gaussian derivative, which in one dimension can be given by [31]: d ¼ � x r2 � � e� x2 2r2 ; ð1Þ where r2 consists of the data variance and x the input data. The Canny algorithm was designed to have three main properties: minimum error detection, good border locations and minimal response time. Edge detection has been used in many applications for object segmentation [2, 35, 44, 45], and the Canny detector has been commonly used to find objects, and Hough transform to recognize the right objects. The Hough transform is a feature extraction technique used in digital image processing [15]. The aim of the technique is to find imperfect points on the object by comparing it to the desired object class through a voting process. The Hough transform algorithm elects in the voting process objects that have similarities to the desired shape. The classic Hough transform was projected to identify lines on images, and it has been extended to identify other shapes, like circles or ellipses [15, 62]. Neto et al. [36, 37] applied it to detect lines, but in this work we will use it to detect circles. 3 Speed limit sign recognition In this section, the seven machine learning techniques under comparison are introduced. 3.1 Support vector machines One of the fundamental goals of the learning theory can be stated as: given two classes of known objects, assign one of them to a new unknown object. Thus, the objective in a two-class pattern recognition is to infer a function [52]: f : X ! f�1g; ð2Þ regarding the input-output of the training data. Based on the principle of structural risk minimization [55], the SVM optimization process is aimed at establish- ing a separating function while accomplishing a trade-off between generalization and over-fitting. Vapnik [55] considered a class of hyperplanes in some dot product space H: hw; xi þ b ¼ 0; ð3Þ where w; x 2 H; b 2 R, corresponding to the decision function: f ðxÞ ¼ sgnðhw; xi þ bÞ; ð4Þ and, based on the following two arguments, the author proposed the Generalized Portrait learning algorithm for problems that are separable by hyperplanes: 1. Among all hyperplanes separating the data, there exists a unique optimal hyperplane distinguished by the maximum margin of separation between any training point and the hyperplane; 2. The over-fitting of the separating hyperplanes decreases with increasing margin. Thus, to construct the optimal hyperplane, it is necessary to solve: minimize w2H;b2R sðwÞ ¼ 1 2 jjwjj2; ð5Þ subject to: yiðhw; xii þ bÞ� 1 for all i ¼ 1; :::;m; ð6Þ with the constraint (6) ensuring that f ðxiÞ will be þ1 for yi ¼ þ1 and �1 for yi ¼ �1, and also fixing the scale of w. A detailed discussion of these arguments is provided in [52]. The function s in (5) is called the objective function, while in Eq. 6 the functions are the inequality constraints. Together, they form a so-called constrained optimization problem. The separating function is then a weighted combination of elements of the training set. These elements are called Support Vectors and characterize the boundary between the two classes. The replacement referred to as the kernel trick [52] is used to extend the concept of hyperplane classifiers to nonlinear support vector machines. However, even with the advantage of ‘‘kernelizing’’ the problem, the separating hyperplane may still not exist. In order to allow some examples to violate Eq. 6, the slack variables n � 0 are introduced [52], which leads to the constraints: yiðhw; xii þ bÞ� 1� ni for all i ¼ 1; :::;m: ð7Þ A classifier that generalizes efficiently is then found by controlling both the margin (through jjwjj) and the sum of the slack variables P i ni. As a result, a possible accom- plishment of such a soft margin classifier is obtained by minimizing the objective function: sðw; nÞ ¼ 1 2 jjwjj2 þ C Xm i¼1 ni; ð8Þ subject to the constraint in Eq. 7, where the constant C[ 0 determines the balance between over-fitting and general- ization. Due to the tuning variable C, these kinds of SVM- based classifiers are normally referred to as C-support vector classifiers (C-SVC) [14]. S576 Neural Comput & Applic (2017) 28 (Suppl 1):S573–S584 123 The implementation used here for the SVM is the one suggested in [10] and [13]. 3.2 Optimum-path forest classifier The optimum-path forest (OPF) is a framework for the design of pattern classifiers based on optimal graph parti- tions [39, 41], in which each sample is represented as a node of a complete graph, and the arcs between them are weighted by the distance of their corresponding feature vectors. The idea behind OPF is to rule a competition process between some key samples (prototypes) in order to partition the graph into optimum-path trees (OPTs), which will be rooted at each prototype. It is assumed that samples that belong to the same OPT are more strongly connected to their root (prototype) than to any other one in the opti- mum-path forest. Prototypes assign their costs for each node and the prototype that offered the optimum-path cost will conquer that node, which will be marked with the label of the corresponding prototype. Let Z ¼ Z1 [ Z2 be a dataset labeled with a function k, in which Z1 and Z2 are, respectively, training and test sets such that Z1 is used to train a given classifier and Z2 is used to assess its accuracy. Let S � Z1 be a set of prototype samples. Essentially, the OPF classifier creates a discrete optimal partition of the feature space such that any sample s 2 Z2 can be classified according to this partition. This partition is an optimum-path forest (OPF) computed in Rn by the image foresting transform (IFT) algorithm [18]. The OPF algorithm may be used with any smooth path cost function which can group samples with similar prop- erties [18]. This work used the path cost function fmax, which is computed as follows: fmaxðhsiÞ ¼ 0 if s 2 S, þ1 otherwise ; � fmaxðp � hs; tiÞ ¼maxffmaxðpÞ; dðs; tÞg; ð9Þ in which d(s, t) means the distance between samples s and t, and a path p is defined as a sequence of adjacent samples. The fmaxðpÞ computes the maximum distance between adjacent samples in p, when p is not a trivial path. The implementation used here for the OPF is the one suggested in [1, 40]. 3.3 Least Squares Least Squares (LS) was used first by [42] and is a very popular technique to make adjustments around a varied dataset: ykðiÞ ¼ uk Xm j¼1 xj�wkj ! : ð10Þ From Eq. 10, the output value from the network can be obtained through: Y ¼ W � X; ð11Þ where Y corresponds to the output matrix stimulated by the input vector X. Therefore, the W matrix is a matrix with a dimension of m 9 (k ?1) because of the bias on the input system, that is: i = 1, 2..., m and j = 1, 2,..., k in Eq. 10 [9]. Among the proposed LS models, this paper adopted the model proposed by [29]. The input attributes of the clas- sifier are in each column of the X matrix, and the vectors of the classifier outputs are in each column of the Y matrix. In Eq. 11, mathematical operations are used to achieve the goal, which is to isolate the W matrix. In order to remove the X matrix from the right side of the equation, it is necessary to multiply this side by the inverse matrix. However, the matrix must be square, so it is necessary to multiply by its transpose: W ¼ YXTðXXTÞ�1: ð12Þ The optimal linear auto associative memory (OLAM) algorithm is used for both regression functions and clas- sification. This classifier can be used either as a batch or iteratively depending on its application [23]. The implementation used here for the LS-based classi- fier is the one suggested in [21, 36]. 3.4 Least mean squares According to [22], a least mean square (LMS) network is based on the use of instantaneous values and the current values of the input network for the activation function [57]. The topology of the simple perceptron network is similar to the LS algorithm; however, they differ in their form of training. The output values are achieved as follows: YðiÞ ¼ W � XðiÞ; ð13Þ where Y(i) corresponds to the output matrix simple per- ceptron stimulated by the input vector X(i) and i corre- sponds to the actual iteration. Therefore, the W matrix is a matrix with the m 9 (k ? 1) dimensions due to bias from the input system where i = 1, 2,..., m [9, 58]. The neuron activation function y(t) uses the signal function, and the error value is calculated at each iteration. The W matrix is the weight matrix, which is obtained iteratively, using the derivative of the cost function: wðt þ 1Þ ¼ wðtÞ � a onðwÞ ow ; ð14Þ where w(t) is the value of the weights from the previous iteration and a is the learning rate [57]. Thus, the final result ofW is obtained iteratively through the LMS matrix rule: Neural Comput & Applic (2017) 28 (Suppl 1):S573–S584 S577 123 wðt þ 1Þ ¼ wðtÞ þ aeðtÞxðtÞ: ð15Þ The implementation used here for the LMS classifier is the one suggested in [36]. 3.5 Extreme learning machine The extreme learning machine (ELM) is a neural network with a topology of a single-hidden layer feedforward neural network (SLFN), which is a network that has a single- hidden layer [7, 24]. The ELM uses a training method for its layers as fol- lows: the weights of the hidden layer are randomly gen- erated, and the output layer weights are generated after the activation function of the hidden layer. The output of the hidden layer is used as input to the output layer, and then the OLAM algorithm is used to obtain the values of the output layer weights [26]. The ELM algorithm, unlike other traditional algorithms, assumes a smaller training error and also the lowest stan- dard of weights [26]. The disadvantage of the ELM is the need to use a high number of neurons in the hidden layer due to the need of higher hit rates, thus making the implementation of the algorithm in real-time embedded systems difficult, due to its high complexity and processing time [7, 27]. The matrix weights of the hidden layer w are generated randomly. After obtaining the weights randomly, it per- forms the activation of the neurons in the hidden layer from the input x(t) of the system, thus obtaining the activated output of the hidden layer. The output of the hidden layer becomes the input of the output layer, thus transforming the network into a linear network [25]. The implementation used here for the ELM is the one proposed in [36]. 3.6 Multilayer perceptron The multilayer perceptron (MLP) network is a single-layer neural network (SLNN) organized in a cascade and sub- divided in an input layer, one or more hidden layers and an output layer [22, 38, 49]. According to [34], SLNN does not represent separable functions linearly. This problem is solved by the use of two or more neurons with adaptive weights. However, it is necessary to use a training algorithm to adjust weights in these layers [5], the ones that perform error backpropaga- tion to compute the errors of hidden layers [22, 32]. One output layer with nonlinear neurons and one or more intermediate layers composed of neurons that repre- sent the network activation function is the composition of a MLP network [4, 6, 21, 50]. The signal is always forward propagated, layer-by-layer. The data for training were defined as follows: the input vector was equal to 1225, which is the result of 35 9 35 pixels size image vectorization, and the class labels were 0–9 for the class of digits, as the number of neurons in the output layer is equal to the number of possible outputs, 10 digits. In this work, we used a three-layer MLP with an input layer, a hidden layer and an output layer. According to [28], the classification of numbers on traffic signs can be made using a MLP network. Thus, we developed an MLP network for the database used. The training of the MLP was based on the error backpropagation algorithm [22]. A network which has the number of hidden neurons equal to three times the number of classes (30 neurons) was used. The activation function used in the hidden layer was the logistic sigmoid function. Numbers were randomly generated between 0.0001 and -0.0001 for the initial weights of the network [51]. In the training of the network, a decreasing learning rate was used with an initial value of 0.5. For the stopping criteria of the network training, it was decided that the network should not be trained if the net- work spent 10 cycles without decreasing the mean square error or when this error was higher than the one of the previous epoch. The problem of these stopping criteria is that if the solution started to climb to a local minimum, the network training could continue. With that, we defined several starting solutions for the training to be sure that the network stops at the global minimum. The implementation used here for the MLP was the one described in [48] and [36]. 4 Proposed framework The framework developed in this work was built using C language for the Android operating system (OS). The integrated algorithm for the automatic analysis of speed limit signs is composed of two main steps: detection and recognition, where the first step is to find the desired sign in an image with several objects, and the second step is to interpret the information on the sign, i.e., the maximum speed limit, Fig. 2. The first step in the computational pipeline developed is the speed limit sign detection via a cascade of boosted classifiers working with haar-like features [30, 56]. The classifier used to detect speed limit signs is trained with a few sample signs, called positive and negative examples [30, 56]. The positive examples included about 1,000 images with speed signs, and negative examples used were arbitrary images without validate speed limit signs. After a classifier is trained, it can be applied to a region of interest in an input image. The classifier outputs a ‘‘1’’ if S578 Neural Comput & Applic (2017) 28 (Suppl 1):S573–S584 123 the region is likely to show a speed limit sign and ‘‘0’’ otherwise. To search for the object in the whole image, the search window moves across the image and checks every location using the classifier. The classifier is designed so that it can easily find the objects of interest with different sizes, which is more efficient than resizing the input image itself. So, to find an object of an unknown size in the image, the scan procedure should be done several times using different scales. The second step of the pipeline is the segmentation and identification of the digits, but before the segmentation, it is necessary to perform a thresholding of the sign for easy identification of the contours. These are identified through the application of an adaptive thresholding algorithm [20]. The next step is to filter the contours found in order to find the digits of the speed sign. The digits, before being sent to the next step, go through 4 filtering processes: by height, by width and through the spacing between digits. The process of filtering checks the height of the average height of all objects and all filters that are above or below averagewith a toleranceof15 %.After the filteringprocess by height, the next step is to filter by width. The width filtering algorithm works the same as the algorithm for height, but based on the width. After identifying the digits in possible circles, the region where they are is segmented to separate the digits correctly. Figure 2 shows the possible circles found in green and the region where the digits are in blue. After validation of the sign, the position of the digits, which are separated and standardized before the last step, is the recognition of the digits. This pattern recognition pro- cess assumes white digits with dimensions of 35x35 pixels on a black background. The digits are resized to make the algorithm invariant to distance. The scaling of the digits occurs primarily by resizing the height which should be 33 pixels. Then, the width is defined in proportion to the original. Each digit after resizing is placed centrally in relation to the width. The standardized digits were then subject to the LS, LMS, ELM, MLP, kNN, SVM and OPF classifiers. The recognition performance of these classifiers was evaluated by accuracy and processing time. 5 Results and discussion This work proposes an efficient and powerful embedded system to recognize speed limit signs, where the stages must have high accuracy and low processing time. This section presents the results of the digital image processing and pattern recognition steps to find the best method to use in each step. 5.1 Speed limit digits detection The proposed method in the step of digital image processing for the detection of speed limit signs is based on a cascade of boosted classifiers working with haar-like features. Figure 3 shows examples of speed limit sign digits segmented by the developed approach, showing the stages involved from the input image to the size standardized digits. The detection step starts with an image acquired by a smartphone camera, and the images obtained are satisfac- tory, because the image sensor used presents low noise, good focus and good brightness adjustments, as can be seen in Fig. 3a–d. Figure 3e–h shows the results of the conversion of the color acquired images to grayscale images. After applying the cascade of boosted classifiers working with haar-like features, these images have several possible signs. After Fig. 2 Pipeline of the developed framework Neural Comput & Applic (2017) 28 (Suppl 1):S573–S584 S579 123 applying the adaptive thresholding (Fig. 3m–p), the sign digits are obtained and presented in Fig. 3u–x. The test of the embedded system was performed using the speed limit signs of 20, 30, 40, 60 and 80 (km/h) as these are the most commonly used in urban environments. A total of 12,520 images were acquired with different inclinations and distances in streets and avenues of the city of Fortaleza and Maracanaú in the state of Ceará in Brazil. From the 12,520 images of the speed limit signs that were used as test, 11,320 signs were properly located by the segmentation step, giving 90.41 % of success in the detection of speed limit signs. On the other hand, the approach proposed by Neto et al. [36] based on the Canny operator combined with the Hough transform obtained only 45.3 % correct results. 5.2 Speed limit digits recognition In this work, we evaluated several classifiers to inte- grate an efficient and powerful embedded system to recognize the speed limit signs. Then, the recognition step using the k-nearest neighbors (kNN), optimum-path forest (OPF), least squares (LS), least mean squares (LMS), extreme learning machine (ELM), artificial neural network multilayer perceptron (MLP) and sup- port vector machines (SVM) based classifiers was car- ried out. The following section presents the results and discusses them. From the results obtained in the step of digit detection presented in Sect. 5.1, a database with the segmented digits was built, Table 1. The feature extraction approaches and Fig. 3 Examples of results obtained for the segmentation of speed limit digits S580 Neural Comput & Applic (2017) 28 (Suppl 1):S573–S584 123 classifier algorithms were combined to yield an intelligent system with high accuracy and low computational cost. For the training and test set sample sizes, a holdout pro- cedure with 50 % for the training and 50 % for the test, with 10 steps, was employed. Each classifier was configured in various ways, and the best results obtained are the ones shown in Table 2. The kNN was configured with 1, 3 and 5 nearest neighbors. The SVMwas configured using the linear kernels, polynomial, RBF and sigmoid, but only the linear and polynomial kernels had accuracy rates above 90 %. The OPFwas set with seven distances, but only the Euclidean and Chi-squared distance obtained accuracy rates above 99 % for all samples. The MLP classifier trained by the ELM and MLP used 1,225 neurons in the input layer, 10 in the output layer and 30 in the hidden layer. Each classifier was tested ten times, always shuffling the training and testing samples on a mobile device with Android OS 2.5 GHz quad core with 2GB of RAM. Table 2 summarizes the results obtained by each classifier. The values presented show that some classifiers were distinguished in terms of the training speed, mainly the ELM, kNN and SVM. Other classifiers stand out in terms of the speed to process a sample, such as the LS, LMS, MLP and SVM with linear kernel. In terms of accuracy, the kNN, SVM and OPF classifiers were superior than the others, especially the kNN with 5 nearest neighbors, the SVM with linear kernel and both OPF configurations. These classifiers are distinguished for the high average accuracies, always greater than 99%, presenting also high classification stability with low stan- dard deviations. Table 2 shows the accuracy rates and the prediction times to classify samples that are important criteria for embedded applications, and the OPF and SVM classifiers are the ones with the best results. To evaluate these classifiers further, Table 3 presents the accuracy (Acc), sensitivity (Se), specificity (Sp) and Harmonicmeans (HM) metrics for each class under study for the worst case obtained. The values presented in Table 2 show that the SVM with linear kernel stands out as it has the highest accuracy and lowest standard deviation. Also, OPF with Euclidean dis- tance had the lowest test time compared to the other classifiers; its training time was 16 times lower, and the testing time was 64 times lower than the SVM with linear kernel. The OPF with Euclidean distance had an average accuracy of 99.54 ± 0.10. These findings confirm that the OPF classifier with Euclidian distance is suitable to be integrated in an android application for speed limit sign recognition with high efficiency. The standardized digit sizes obtained in the DIP step and used in the classifiers evaluation for the pattern recognition step are available at website Table 1 Number of elements in each class of the speed limit sign digit database Digit Class No. of elements 0 1 1428 1 2 1841 2 3 1879 3 4 1688 4 5 1824 5 6 1569 6 7 1725 7 8 1414 8 9 1650 9 10 1952 Table 2 Results obtained in the evaluation of each classifier used for the recognition of the segmented speed limit digits Classifier Maximum Minimum Mean Standard Average Average testing accuracy accuracy accuracy deviation training time for rate (%) rate (%) rate (%) time (s) a sample (ls) LS 91.3 89.43 90.81 0.4 8.5 1.9 LMS 90.80 45.00 76.45 17.52 50.56 4.1 ELM 97.76 96.46 96.88 0.40 24.6 12.7 MLP 96.83 87.12 91.93 3.08 158 14 kNN (K=1) 99.83 98.51 99.14 0.61 0.027 10,651 kNN (K=3) 99.89 99.7 99.78 0.06 0.028 10639 kNN (K=5) 99.79 99.71 99.76 0.03 0.029 11,721 OPF (Euclidean) 99.65 99.36 99.54 0.10 2.5 87 OPF (Chi-squared) 99.7 99.23 99.47 0.13 2.5 748 SVM (Polynomial) 99.88 97.04 99.43 0.94 70 9875 SVM (Linear) 99.87 99.76 99.82 0.04 40 5595 Best values are in bold Neural Comput & Applic (2017) 28 (Suppl 1):S573–S584 S581 123 5.3 Overall results and main contributions Many methods have been proposed to detect speed signs, often using some reference object in the input images. The first contribution of the framework proposed is the detec- tion of speed signs based on a cascade of boosted classifiers combined with haar-like features. This approach detects speed signs independent of the image acquisition distance, which is an important feature since the signs are smaller the further they are from the camera. This is because samples of signs with various sizes were used in the training of the cascade classifier. Another important contribution of the proposed frame- work is not having to use additional attributes, since the digits are resized to a standard size of 35 9 35 pixels. By doing this, it was found that the digits are invariant in terms of size, but here this is attained by processing the image and not in the recognition step as is normally done. This increases the recognition speed and robustness. Another contribution is also related to the processing speed, veri- fying seven types of classifiers to check which one had the best recognition performance and low processing time. The top recognition rate obtained was superior to 99:7% with the SVM and OPF classifiers performing in real time in the embedded system. Analyzing the framework in an optimal configuration, we obtained a detection and recognition of 89:19%, which cor- responds to 11,167 signs correctly detected and recognized fromadatabasewith 12,520 signs. The speed of the embedded system varied between 20 and 30 frames per second, depending on the number of signs found in the input image. All the solutions that were developed here are fast and able to be embedded in commercial systems. The drawback of the methodology developed is the error generated by large rotations, but this can be mitigated with the correct configuration of the camera. Table 3 Acc, Se, Sp, FS for the worst case of the SVM, with linear and polynomial kernel, and OPF, with Euclidean and Chi-squared distances SVM Linear kernel Polynomial kernel Class Sp (%) Se (%) HM (%) Acc (%) Class Sp (%) Se (%) HM (%) Acc (%) 0 99.94 100.0 99.95 99.72 0 99.97 100.0 99.97 99.86 1 99.96 100.0 99.96 99.83 1 100.0 74.04 97.18 85.09 2 99.97 100.0 99.97 99.89 2 100.0 100.0 100.0 100.0 3 100.0 99.17 99.91 99.58 3 96.87 99.88 97.17 87.53 4 99.94 99.78 99.92 99.67 4 100.0 99.78 99.97 99.89 5 99.98 100.0 99.98 99.93 5 99.96 100.0 99.96 99.80 6 99.98 99.76 99.96 99.82 6 100.0 99.53 99.95 99.76 7 99.97 99.85 99.96 99.78 7 99.98 99.85 99.97 99.85 8 99.97 99.51 99.92 99.63 8 99.97 99.63 99.94 99.69 9 99.98 99.59 99.94 99.74 9 99.94 99.89 99.94 99.74 Total 99.97 99.76 99.95 99.76 Total 99.67 97.04 99.40 97.04 OPF Euclidean distance Chi-squared distance Class Sp (%) Se (%) HM (%) Acc (%) Class Sp (%) Se (%) HM (%) Acc (%) 0 99.93 100.0 99.94 99.71 0 99.97 99.81 99.96 99.81 1 99.95 100.0 99.96 99.80 1 99.97 99.81 99.96 99.81 2 99.61 100.0 99.65 98.37 2 99.95 99.81 99.94 99.72 3 99.95 96.48 99.61 98.02 3 99.97 98.97 99.87 99.39 4 100.0 99.64 99.96 99.82 4 99.93 99.81 99.92 99.62 5 100.0 99.81 99.98 99.90 5 99.91 99.62 99.89 99.44 6 99.95 98.91 99.85 99.27 6 99.65 99.08 99.60 98.00 7 99.97 99.82 99.96 99.82 7 99.97 100.0 99.98 99.91 8 99.91 99.07 99.83 99.16 8 99.79 96.51 99.47 97.31 9 99.95 99.82 99.94 99.73 9 99.95 98.89 99.85 99.25 Total 99.92 99.36 99.87 99.36 Total 99.91 99.23 99.84 99.23 S582 Neural Comput & Applic (2017) 28 (Suppl 1):S573–S584 123 6 Conclusion The addressed problem is very challenging because with the growth of cities and the rise in the number of cars on the streets, it is extremely important to use systems able to identifying speed limit signs. The objectives defined for this work were fully met, since the system developed is able to detect and recognize the speed limit signs satisfactorily. The developed system successfully segmented the speed signs and recognized their values with high accuracy. The system obtained a global detection and recognition rate of 89.19 %, with 90.41 % in the detection step and 98.64 % in the recognition step. From the images tested, it can be concluded that the system implemented is quite tolerant relative to the size of the image to be evaluated. Even with good results, this work has some limitations. First, the system implemented is limited in terms of the distance between the image device and the speed limit sign, and the rotation involved, since when these are high, the identification can be erroneous. Acknowledgments Pedro Pedrosa Rebouças Filho acknowledges the sponsorship from the Instituto Federal do Ceará (IFCE) via grants PROINFRA/2013, PROAPP/2014 and PROINFRA/2015. Victor Hugo C. Albuquerque acknowledges the sponsorship from the Brazilian National Council for Research and Development (CNPq) through grants 470501/2013-8 and 301928/2014-2. Authors gratefully acknowledge the funding of Project NORTE-01-0145-FEDER- 000022, SciTech—Science and Technology for Competitive and Sustainable Industries, co-financed by Programa ‘‘Operacional Regional do Norte (NORTE2020)’’ through ‘‘Fundo Europeu de Desenvolvimento Regional (FEDER).’’ References 1. Albuquerque VHC, Barbosa CV, Silva CC, Moura EP, Rebouças Filho PP, Papa JP, Tavares JMRS (2015) Ultrasonic sensor sig- nals and optimum-path forest classifier for the microstructural characterization of thermally-aged inconel 625 alloy. Sensors 15(6):12,474 2. Albuquerque VHC, Rebouças Filho PP, da Silveira Cavalcanti T, Tavares JMRS (2010) New computational solution to quantify synthetic material porosity from optical microscopic images. J Microsc 240(1):50–59 3. Amat F, Keller P (2013) 3D Haar-like elliptical features for object classification in microscopy. In: 10th international sym- posium on biomedical imaging (ISBI), pp 1194–1197 4. Arbib MA (2003) The handbook of brain theory and neural networks. MIT Press, Cambridge 5. de Azevedo FM, Brasil LM, de Oliveira RCL (2000) Neural net- works with applications control and expert systems. Visual Books 6. Barreto G, Frota R (2013) A unifying methodology for the evaluation of neural network models on novelty detection tasks. Pattern Anal Appl 16(1):83–97 7. Barros ALBP, Barreto GA (2012) Extreme learning machine robusta para reconhecimento de faces. In: Brazilian conference on intelligent systems. Curitiba, PR, Brasil 8. Barthès JPA, Bonnifait P (2015) Chapter 9 - Multi-Agent active collaboration between drivers and assistance systems. In: Advances in artificial transportation systems and simulation, pp 163–180. Academic Press, Boston 9. Bittencourt G (2006) Artificial Intelligence - Tools and Theories, 3 edn. Federal University of Santa Catarina 10. Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Mining Knowl Discov 2(2):121–167 11. Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698 12. Carrese S, Mantovani S, Nigro M (2014) A security plan pro- cedure for heavy goods vehicles parking areas: an application to the lazio region (Italy). Transp Res E Logist Transp Rev 65:35–49 13. Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27 14. Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20(3):273–297 15. Duda RO, Hart PE (1972) Use of the hough transformation to detect lines and curves in pictures. Commun ACM 15(1):11–15 16. da Silva Felix JH, Cortez PC, Rebouças Filho PP, de Alexandria AR, Costa RCS, Holanda MA (2008) Identification and quan- tification of pulmonary emphysema through pseudocolors. In: MICAI 2008: Advances in Artificial Intelligence, pp 957–964. Springer 17. Elmer P, Lupp A, Sprenger S, Thaler R, Uhl A (2015) Exploring compression impact on face detection using haar-like features. In: Paulsen RR, Pedersen KS (eds) Image analysis, lecture notes in computer science, vol 9127, pp 53–64. Springer International Publishing 18. Falcão AX, Stolfi J, Lotufo RA (2004) The image foresting transform theory, algorithms, and applications. IEEE Trans Pat- tern Anal Mach Intell 26(1):19–29 19. Garcia I, Bronte S, Bergasa L, Almazan J, Yebes J (2012) Vision- based drowsiness detector for real driving conditions. In: Intel- ligent vehicles symposium (IV), pp 618–623 20. Glasbey CA (1993) Analysis of histogram-based thresholding algorithms. CVGIP Graph Models Image Process 55:532–537 21. Gomes SL, Rebouças ES, Rebouças Filho PP (2014) Reconhec- imento Óptico de caracteres para reconhecimento das sinal- izações verticais das vias de trânsito. Rev SODEBRAS 9:9–12 22. Haykin SO (2008) Neural networks and learning machines. Pearson Prentice Hall, Upper Saddle River 23. Helene O (2006) Method of least squares. Livraria da Fı́sica 24. Horata P, Chiewchanwattana S, Sunat K (2013) Robust extreme learning machine. Neurocomputing 102:31–44 25. Huang GB, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with ran- dom hidden nodes. IEEE Trans Neural Netw 17:879–892 26. Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2:107–122 27. Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501 28. Kocer HE, Cevik KK (2011) Artificial neural networks based vehicle license plate recognition. Proc Comput Sci 3:1033–1037 29. Kohonen T (1989) Self-organization and associative memory, 3rd edn. Springer-Verlag New York Inc, New York, NY 30. Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: International conference on image processing, vol 1, pp I–900–I–903 31. McAndrew A (2004) Introduction do digital image processing with matlab. Thomson Learning 32. Medeiros C, Barreto G (2013) A novel weight pruning method for mlp classifiers based on the maxcore principle. Neural Comput Appl 22(1):71–84 Neural Comput & Applic (2017) 28 (Suppl 1):S573–S584 S583 123 33. Mena AP, Bachiller Mayoral M, Dı́az-Lópe E (2015) Compara- tive study of the features used by algorithms based on viola and jones face detection algorithm. In: Bioinspired computation in artificial systems, lecture notes in computer science, vol 9108, pp. 175–183. Springer International Publishing 34. Minsky M, Papert S (1969) Perceptrons. MIT Press, Cambridge 35. Moreira FDL, Kleinberg MN, Arruda HF, Freitas FNC, Parente MMV, de Albuquerque VHC, Rebouças Filho PP (2016) A novel vickers hardness measurement technique based on adaptive bal- loon active contour method. Expert Syst Appl 45:294–306 36. Neto EC, Gomes SL, Filho PPR, de Albuquerque VHC (2015) Brazilian vehicle identification using a new embedded plate recognition system. Measurement 70:36–46 37. Neto EC, Rebouças ES, Moraes JL, Gomes SL, Rebouças Filho PP (2015) Development control parking access using techniques digital image processing and applied computational intelligence. IEEE Transactions on Latin. IEEE Trans Latin America 13:272–276 38. Nissen S (2003) Implementation of a fast artificial neural network library (FANN). Department of Computer Science University of Copenhagen (DIKU) 39. Papa JP, Falcão AX, de Albuquerque VHC, Tavares JMRS (2012) Efficient supervised optimum-path forest classification for large datasets. Pattern Recognit 45(1):512–520 40. Papa JP, Falcao AX, Suzuki CT (2009) Supervised pattern clas- sification based on optimum-path forest. Int J Imaging Syst Technol 19(2):120–131 41. Papa JP, Falcão AX, Suzuki CTN (2009) Supervised pattern classification based on optimum-path forest. Int J Imaging Syst Technol 19(2):120–131 42. Plucker JA, Esping A (2016) Human intelligence: historical influences, current controversies, teaching resources. http://www. intelltheory.com 43. Rakate G, Borhade S, Jadhav P, Shah M (2012) Advanced pedestrian detection system using combination of haar-like fea- tures, adaboost algorithm and edgelet-shapelet. In: IEEE inter- national conference on computational intelligence computing research (ICCIC), pp 1–5 44. Rebouças Filho PP, Cortez PC, da Silva Barros AC, Albuquerque VHC (2014) Novel adaptive balloon active contour method based on internal force for image segmentation - a systematic evalua- tion on synthetic and real images. Expert Syst Appl 41(17):7707–7721 45. Rebouças Filho PP, Moreira FDL, de Lima Xavier FG, Gomes SL, Santos JC, Freitas FNC, Freitas RG (2015) New analysis method application in metallographic images through the con- struction of mosaics via speeded up robust features and scale invariant feature transform. Materials 8(7):3864 46. Rebouças Filho PP, Cortez PC, Félix JHDS, Cavalcante TdS, Holanda MA (2013) Adaptive 2d crisp active contour model applied to lung segmentation in ct images of the thorax of healthy volunteers and patients with pulmonary emphysema. Revista Brasileira de Engenharia Biomédica 29(4):363–376 47. Rezaei M, Ziaei Nafchi H, Morales S (2014) Global haar-like features: a new extension of classic haar features for efficient face detection in noisy images. Image and Video Technology, Lecture Notes in Computer Science, vol 8333, pp 302–313. Springer Berlin Heidelberg 48. Riedmiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: the RPROP algorithm. IEEE Int Conf Neural Netw 1:586–591 49. Ruck DW, Rogers SK, Kabrisky M, Oxley ME, Suter BW (1990) The multilayer perceptron as an approximation to a bayes optimal discriminant function. IEEE Trans Neural Netw 1(4):296–298 50. Russell SJ, Norvig P (2009) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall, Upper Saddle River 51. Schimidt W (1993) Initialization, backpropagation and general- ization of feed-forward classifiers. IEEE Int Conf Neural Netw 1:598–604 52. Schölkopf B, Smola AJ (2002) Learning with kernels. MIT press, Cambridge 53. Tavares JMR, Rebouças Filho PP, Cavalcante TDS, de Albu- querque VHC (2009) Brinell and vickers hardness measurement using image processing and analysis techniques. J Test Eval 38(1):1–7 54. Tu C, van Wyk B, Hamam Y, Djouani K, Du S (2013) Vehicle position monitoring using hough transform. Int Conf Electron Eng Comput Sci (EECS 2013) 4:316–322 55. Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999 56. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. IEEE Comput Soc Conf Comput Vision Pattern Recognit 1:511–518 57. WidrowB (1990) 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proc IEEE 78:1415–1442 58. Widrow B, Winter R (1988) Neural nets for adaptative filtering and adaptative pattern recognition. IEEE Comput 21:25–39 59. Wu BF, Huang HY, Chen CJ, Chen YH, Chang CW, Chen YL (2013) A vision-based blind spot warning system for daytime and nighttime driver assistance. Comput Electr Eng 39(3):846–862 60. Yi SC, Chen YC, Chang CH (2015) A lane detection approach based on intelligent vision. Comput Electr Eng 42:23–29 61. Yu S, Shi Z (2015) The effects of vehicular gap changes with memory on traffic flow in cooperative adaptive cruise control strategy. Phys A Stat Mech Appl 428:206–223 62. Yuen HK, Illingworth J, Kittler J (1989) Detecting partially occluded ellipses using the hough transform. Image Vis Comput 7(1):31–37 63. Zhang S, Bauckhage C, Cremers A (2014) Informed haar-like features improve pedestrian detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 947–954 64. Zheng K, Zhao Y, Gu J, Hu Q (2012) License plate detection using haar-like features and histogram of oriented gradients. In: IEEE international symposium on industrial electronics (ISIE), pp 1502–1505 S584 Neural Comput & Applic (2017) 28 (Suppl 1):S573–S584 123 http://www.intelltheory.com http://www.intelltheory.com Embedded real-time speed limit sign recognition using image processing and machine learning techniques Abstract Introduction Speed limit signs detection Based on a cascade of boosted classifiers working with haar-like features Based on the Hough transform and Canny edge detector Speed limit sign recognition Support vector machines Optimum-path forest classifier Least Squares Least mean squares Extreme learning machine Multilayer perceptron Proposed framework Results and discussion Speed limit digits detection Speed limit digits recognition Overall results and main contributions Conclusion Acknowledgments References