A Proof of Concept Study for the Parameters of Corn
Grains Using Digital Images and aMultivariate RegressionModel

Vanessa Rodrigues de Camargo1 & Lucas Janoni dos Santos1 &

Fabíola Manhas Verbi Pereira1

Received: 7 June 2017 /Accepted: 20 August 2017 /Published online: 31 August 2017
#

Abstract In this method, a numerical matrix comprised of ten
color scales (RGB, HSV, L, and rgb) as independent variables
from digitalized images was used as a proof of concept for the
prediction of the mass, apparent volume, and bulk density
parameters of grains for quality control considering post-
harvest purposes. The goal was to develop a high throughput
multivariate regression model using partial least squares
(PLS) combined with the information from color images to
assess the raw product. The data set of external samples was
successfully evaluated with standard error of cross-validation
(SECV) values of 1.23 g (16.4–28.9), 2.03 cm3 (20.5–40.5),
and 0.018 g cm−3 (0.68–0.85) for the mass, apparent volume,
and bulk density, respectively.

Keywords Corn grains . Digital images . Direct analysis .

Quality control . Chemometrics

Introduction

Image analysis converted into mathematical arrays can be per-
formed with the aid of several chemometric tools (Russ, 1992;
Wojnar, 1999; Liu & MacGregor, 2007; Pereira & Bueno,
2007). In this study, the principle will be an emphasis on the

partial least squares (PLS) regression method (Geladi et al.,
2004).

Walker and Panozzo (Walker & Panozzo, 2012) evaluated
three different mathematical models for measuring the volume
and bulk density of a barley grain using an ellipsoid approx-
imation from a two-dimensional digital image. They achieved
correlations to grain features of 0.97 and 0.63 for the volume
and bulk density, respectively. However, it is not reported how
the PLS is able to perform for the prediction of these proper-
ties using a single model.

By means of the red, green, and blue (RGB) scale and
chemometrics, Borin et al. (Borin et al., 2007) accomplished
the quantification of lactobacilli colonies in commercial
fermented milk using digital images provided by a household
scanner. The authors applied one-dimensional vector data
based on the frequency distribution of values of curves
(histograms) for the three colors (RGB). Two models have
been developed: (i) the first was nonlinear using LS-SVM
(support vector machine least squares) and (ii) the second
was linear based on PLS. Both models were used to calculate
the number of lactobacilli colonies. The relative error for the
quantification was approximately 10%. This fact indicated
that the proposed method can be used for automated counting
of this type of colony.

The application of the RGB scale to images scanned using
a conventional digital camera (CCD, common digital camera)
in food samples was investigated by Antonelli et al. (Antonelli
et al., 2004). The authors used these images to obtain
colorgrams for the three colors (red, green, and blue). A rec-
ognition model was obtained using the colorgrams from the
pesto sauce samples and an algorithm based on the wavelet
transformation. The authors reported having achieved an ac-
curacy of 100% for their ratings with this method.

Santos et al. (Santos et al., 2012) and Santos and Pereira-
Filho (Santos & Pereira-Filho, 2013) have reported two

Electronic supplementary material The online version of this article
(https://doi.org/10.1007/s12161-017-1028-6) contains supplementary
material, which is available to authorized users.

* Fabíola Manhas Verbi Pereira
fabiola@iq.unesp.br

1 Instituto de Química de Araraquara, Universidade Estadual Paulista
(UNESP), Rua Professor Francisco Degni, 55,
Araraquara, SP 14800-060, Brazil

Food Anal. Methods (2018) 11:1852–1856
DOI 10.1007/s12161-017-1028-6

Springer Science+Business Media, LLC 2017

https://doi.org/10.1007/s12161-017-1028-6
mailto:fabiola@iq.unesp.br
http://crossmark.crossref.org/dialog/?doi=10.1007/s12161-017-1028-6&domain=pdf


studies that employed digital images obtained in a convention-
al scanner for the identification of adulterated cow milk sam-
ples. The authors used the following ten color parameters: R,
G, B, H, S, V (value), r, g, b, and L. The r, g, and b color
parameters are related, that is, they are each a primary color
(R, G, B) divided by the sum of the colors (L). The authors
investigated the presence of many types of adulterants that are
commonly added to cow milk, including water, synthetic
milk, cow urine, and hydrogen peroxide, among others. In
the case of regression models, they used the PLS technique,
and prediction errors were approximately 6%. As for the clas-
sification models, hit rates were obtained in the range of 90 to
100%.

The goal of this study was the development of a method for
determining the mass, apparent volume, and bulk density of
corn grains for the assessment of the raw product of the most
widely grown crop in the Americas.

This method was developed with the use of images ac-
quired through a conventional household scanner, thus en-
abling important information to be obtained without subjec-
tivity. Information of the color histograms from the RGB (red,
green, and blue), HSV (hue, saturation, and value), L (lumi-
nosity), and rgb (relative RGB) of digital images was orga-
nized as a numerical matrix. This evaluation was performed
with the use of chemometric tools that were able to better
understand the generated data.

With the proof of concept described in this study, it can
already be inferred that the multivariate method has potential
for predicting the mass, apparent volume, and bulk density in
corn grains.

Materials and Methods

Samples and Instrument

Two types of samples of corn grain were purchased locally.
The identifying characteristic of these grains was their size,
with the largest ones measuring approximately 10 by 10 mm,
being the corn as type 1 which is employed to feed animals,
and with the smallest ones measuring approximately 5 by
5 mm, which are those that are specific to popcorn (type 2).
For sampling of the material, the entire grains, all of regular
size and no defects, were distributed in a 10 cm by 6 cm
transparent plastic package so that the contents of each had a
total of 100 entire grains.

The images were obtained by placing the plastic packaging
directly on the glass of a conventional scanner (HP Color
LaserJet Pro MFP 200 M276nw, Brazil). For the acquisition
of images, the scanner automatic settings were disabled, such
as the brightness and contrast adjusts, and the resolution was
300 dpi (dots per inch). The cover of the scanner was closed to
avoid any influence of external light.

Physical Trait Measurements

The mass (m) was estimated using an analytical electronic
balance (FA-2104N, EQUIPAR, Curitiba/PR, Brazil) with a
precision of ± 0.0001 g. The apparent volume (V) was mea-
sured with the help of a 50-mL graduated cylinder (± 0.5 mL).
The relationship between the values of mass and apparent
volume was used to estimate the bulk density (g cm−3), where
ρ = m/V.

Image Evaluation

The data were treated using computer programs such as
Matlab R2015b (The MathWorks, Natick, USA) and
Pirouette 4.5 rev. 2 (Infometrix, Bothel, USA). The original
images were converted to the color gamut RGB [red (R),
green (G), and blue (B)], HSV [hue (H), saturation (S), and
value (V)], the relative value scale for RGB colors (rgb), and L
(luminosity, light) bymeans of a computer routine available as
Supplementary content, which can be calculated with Matlab.
The size of all images was standardized to be 500 × 850 pixels
(width × height). Samples were placed in training and valida-
tion sets by means of the Kennard-Stone algorithm consider-
ing 80 and 20% of the samples, respectively (Daszykowski
et al., 2002).

Results and Discussion

Corn type 1 represents the suitable target for grain production
considering that the size, mass, and geometry are more homo-
geneous, and corn type 2 typifies the features not proper for
the raw product. With the aid of Fig. 1, examples of images of
samples packaged in plastic can be verified, since these were
used to develop the proposed method.

The images, such as those shown in Fig. 1, have been pre-
viously converted into averages of color histograms. The inde-
pendent variables were the colors from ten scales (RGB, HSV,
rgb, and L), and the response was how often they are repeated.
As shown in the pictorial of Fig. 2, the advantage of the aver-
age of color histograms is the size of the matrix with 10 col-
umns and 1 numerical response for each color. In other words,
each sample was represented by 10 averaged values per color
(50 samples × 10 variables). In this case, the data can be com-
puted faster since each original colorgram has 256 variables for
each color (total of 2560 independent variables).

By the visual inspection of images in Fig. 1, it is possible to
differentiate the size of grains for the homogeneous composi-
tion with 100 larger grains (Fig. 1a) or even 100 smaller grains
(Fig. 1b), but for the image in Fig. 1c, it is very hard to differ-
entiate that it represents the composition of 55 larger grains and
45 smaller ones. To overcome this difficulty, the image can be

Food Anal. Methods (2018) 11:1852–1856 1853


converted into a numerical matrix using the color scale as
parameters, as previously described.

The computational routine to process the calculations in the
image and different magnitudes may be checked as follows:
(i) the scale is represented by RGB values between 0 and 255;
(ii) for HSV and the relative colors of RGB (rgb), the magni-
tude range is between 0 and 1; and (iii) the luminosity is the
sum of the RGB colors.

For example, the differences among the scales were on the
order of hundreds, and as described, the minimum and maxi-
mum values were as follows: R (196–217), G (122–145), B
(75–100), H (0.06–0.07), S (0.55–0.64), V (0.77–085), r
(0.49–0.53), g (0.30–0.31), b (0.17–0.20), and L (397–461).

The luminosity (L) has the highest values and the largest
range of variation. The other colors showed faint differences,
and in the case of HSV and rgb, they had ranges with lower
magnitudes. Another piece of relevant information from the
color images is that the blue scale showed correlation with the
parameters being investigated. This is strong evidence that the
colors from the images can detect variations related to the
shape and can differentiate questions about interferences such

as leaves or soil. The color yellow is the most predominant
nuance for corn grains visualized with the naked eye. To sup-
port this statement, the blue color is the complementary color
of the yellow in visible radiation (Boynton, 1960).

A high variation of luminosity corroborates with the infor-
mation extracted from the images, which is in fact related to
the size of the grains, promoting the possibility of detecting
variations between big grains mixed with small ones or even
in the presence of undesirable components, such as leaves,
soil, or other residues from the harvest.

The suggestion of a multivariate model can be interesting
because the mass and apparent volume are extensive proper-
ties of the materials and they vary with different compositions,
as shown in Table 1. PLS is the chemometric tool that is more
robust and disseminated for predicting these parameters, and it
is possible to use more than one variable to predict, since the
data of this study did not show high correlation with only one
independent variable (Geladi et al., 2004; Santos et al., 2012;
Santos & Pereira-Filho, 2013).

For analysis with PLS, the matrix was organized with the
samples in the rows and the independent variables in columns

Fig. 1 Examples of scanned
images of representative samples.
a With larger grains. b With
smaller grains. c With the
proportion 55:45 for larger and
smaller grains, respectively

Sample a
(100 larger grains)

Sample b
(100 smaller grains)

Sample c
(55 larger and
45 smaller grains)

R
214

R
205

R
207

G
139

G
129

G
135

B
92

B
86

B
92

H
0.1

H
0.1

H
0.1

S
0.6

S
0.6

S
0.6

V
0.8

V
0.8

V
0.8

r
0.5

r
0.5

r
0.5

g
0.3

g
0.3

g
0.3

445

419

L

L

L
434

b
0.2

b
0.2

b
0.2

Independent variablesFig. 2 Picture of the arrangement
of the matrix color scale from the
corn grain images shown in Fig. 1

1854 Food Anal. Methods (2018) 11:1852–1856


corresponding to the color scales. To better analyze this data,
the independent variables were auto-scaled (Geladi et al.,
2004) so that all passes had a zero mean and standard devia-
tion equal to one. The goal is for all measured variables to
contribute the same weight, even if the magnitudes between

the scales are very different. The dependent variables were the
mass, apparent volume, and bulk density of the grains mea-
sured earlier.

Table 1 Physical traits estimated for corn grain samples. The square symbols represent the type of sampling on the graphs from the developed model

Number of samples
(plastic bags)

Type 1 (larger grains)
per bag

Type 2 (smaller grains)
per bag

Mass/g (x ± sd) Apparent volume/cm3

(x ± sd)
Bulk density/g cm−3 (x ± sd)

10 100 0 25.4 ± 0.5 36.1 ± 0.7 0.70 ± 0.01
10 ■ 0 100 17.0 ± 0.3 21.1 ± 0.5 0.80 ± 0.01
3 □ 5 95 18.6 ± 0.8 23.2 ± 1.2 0.80 ± 0.02
3 □ 15 85 19.6 ± 1.5 25.2 ± 1.5 0.78 ± 0.02
3 □ 25 75 20.4 ± 1.2 26.2 ± 1.5 0.78 ± 0.02
3 □ 35 65 21.6 ± 1.8 27.8 ± 2.5 0.78 ± 0.01
3 □ 45 55 23.4 ± 2.2 29.5 ± 1.0 0.79 ± 0.05
3 55 45 23.5 ± 1.6 32.5 ± 2.0 0.72 ± 0.01
3 65 35 24.6 ± 1.6 33.8 ± 3.2 0.73 ± 0.03
3 75 35 24.8 ± 2.6 34.2 ± 3.1 0.73 ± 0.01
3 85 15 25.7 ± 2.9 35.2 ± 3.8 0.73 ± 0.01
3 95 5 26.4 ± 2.2 36.8 ± 3.2 0.72 ± 0.01

15 20 25 30
15

20

25

30
 Type 1
 Type 2
 Types 1 (5-45) and 2 (95-55)
 Types 1 (55-95) and 2 (45-5)

M
as

s 
pr

ed
ic

te
d 

va
lu

e 
(g

)

Mass reference value (g)

y = 0.90x + 2.22
r = 0.94

a

15 20 25 30
15

20

25

30
 Type 1
 Type 2
 Types 1 (5-45) and 2 (95-55)
 Types 1 (55-95) and 2 (45-5)

M
as

s 
pr

ed
ic

te
d 

va
lu

e 
(g

)

Mass reference value (g)

y = 0.94x + 0.81
r = 0.90

b

Fig. 3 Predicted values for the PLS model for the mass of grains using
the data of the image colors for the a training and b validation sets

18 24 30 36 42
18

24

30

36

42
 Type 1
 Type 2
 Types 1 (5-45) and 2 (95-55)
 Types 1 (55-95) and 2 (45-5)

Vo
lu

m
e 

pr
ed

ic
te

d 
va

lu
e 

(c
m

3 )

Volume reference value (cm3)

y = 0.90x + 2.83
r = 0.94

a

18 24 30 36 42
18

24

30

36

42
 Type 1
 Type 2
 Types 1 (5-45) and 2 (95-55)
 Types 1 (55-95) and 2 (45-5)

Vo
lu

m
e 

pr
ed

ic
te

d 
va

lu
e 

(c
m

3 )

Volume reference value (cm3)

y = 0.91x + 2.29
r = 0.90

b

Fig. 4 Predicted values obtained by the PLS model for the apparent
volume of grains using the data of the image colors for the a training
and b validation sets

Food Anal. Methods (2018) 11:1852–1856 1855


The standard error of cross-validation (SECV) can assess
the predictive ability of the models. In this case, using five
latent variables (LV)with a total explained variance of approx-
imately 100%, the RMSECV values were 1.23 g, 2.03 cm3,
and 0.018 g/cm3 for the mass, apparent volume, and bulk
density, respectively.

Figures 3, 4, and 5 show the good linear correlation be-
tween the reference values for the parameters measured for the
corn grains and those predicted by the PLS prediction model.
The numbers between brackets in the legend of these figures
denote the range of the proportion for larger and smaller
grains, as shown in Table 1. For all plots, it is possible to verify
the spreading of the data along the x-coordinate, and this in-
formation confirmed that the sampling has the variation nec-
essary and the color scales from images detected these differ-
ences among them. In this case, the mass and apparent volume
suffered variations according to the levels of the undesirable
corn (type 2).

Conclusions

This study can generate fast and accurate methods for predicting
the mass, apparent volume, and bulk density of grains as an
alternative method to the existing gravimetric and geometric
methods. The color scales from digital images provide informa-
tion that proved be useful to detect variations between grains for
post-harvest purposes with some advantages, such as its practi-
cality, economy, safety, efficiency, speed, and accuracy.

Acknowledgments The authors are grateful to the Fundunesp process
number 0268/001/14, The National Council for Scientific and
Technological Development (CNPq) process number 445729/2014-7,
The São Paulo Research Foundation (FAPESP, 2016/00779-6), and the
PROPe process number 39229 - L.J.S. grant fellowship).

Compliance with Ethical Standards

Conflict of Interest Fabíola Manhas Verbi Pereira declares that she has
no conflict of interest. Vanessa Rodrigues de Camargo declares that she
has no conflict of interest. Lucas Janoni dos Santos declares that he has no
conflict of interest.

Ethics Approval This article does not contain any studies with human
or animal subjects performed by any of the authors.

Consent for Publication Publication has been approved by all individ-
ual participants.

References

Russ JC (1992) Image processing handbook. CRC Press, New York
Wojnar L (1999) Image analysis: applications in materials engineering.

CRC Press, Boca Raton
Liu JJ, MacGregor JF (2007) On the extraction of spectral and spatial

information from images. Chemom Intell Lab Syst 85:119–130
Pereira FMV, Bueno MIMS (2007) Image evaluation with chemometric

strategies for quality control of paints. Anal Chim Acta 588:184–191
Geladi P, Sethson B, Nyström J, Lillhonga T, Torbjörn L, Burger J (2004)

Chemometrics in spectroscopy. Part 2. Examples. Spectrochim Acta
B 59:1347–1357

Walker CK, Panozzo JF (2012)Measuring volume and density of a barley
grain using ellipsoid approximation from a 2-D digital image. J
Cereal Sci 55:61–68

Borin A, Ferrão MF, Mello C, Cordi L, Pataca LCM, Durán N, Poppi RJ
(2007) Quantification of Lactobacillus in fermented milk by multi-
variate image analysis with least-squares support-vector machines.
Anal Bioanal Chem 387:1105–1112

Antonelli A, Cocchi M, Fava P, Foca G, Franchini GC, Manzini D, Ulrici
A (2004) Automated evaluation of food colour by means of multi-
variate image analysis coupled to a wavelet-based classification al-
gorithm. Anal Chim Acta 515:3–13

Santos PM,Wentzell PD, Pereira-Filho ER (2012) Scanner digital images
combined with color parameters: a case study to detect adulterations
in liquid cow’s milk. Food Anal Methods 5:89–95

Santos PM, Pereira-Filho ER (2013) Digital image analysis—an alternative
tool for monitoring milk authenticity. Anal Methods 5:3669–3674

Daszykowski M, Walczak B, Massart DL (2002) Representative subset
selection. Anal Chim Acta 468:91–103

Boynton RM (1960) J Opt Soc Am 50:929–944

0.68 0.72 0.76 0.80 0.84
0.68

0.72

0.76

0.80

0.84
 Type 1
 Type 2
 Types 1 (5-45) and 2 (95-55)
 Types 1 (55-95) and 2 (45-5)

B
ul

k 
de

ns
ity

 p
re

di
ct

ed
 v

al
ue

 (g
 c

m
-3
)

Bulk density reference value (g cm
-3)

y = 0.83x + 0.13
r = 0.90

a

0.68 0.72 0.76 0.80 0.84
0.68

0.72

0.76

0.80

0.84
 Type 1
 Type 2
 Types 1 (5-45) and 2 (95-55)
 Types 1 (55-95) and 2 (45-5)

B
ul

k 
de

ns
ity

 p
re

di
ct

ed
 v

al
ue

 (g
 c

m
-3
)

Bulk density reference value (g cm
-3)

y = 0.50x + 0.36
r = 0.79

b

Fig. 5 Predicted values obtained by the PLS model for the bulk density
of grains using the data of the image colors for the a training and b
validation sets

1856 Food Anal. Methods (2018) 11:1852–1856


	A Proof of Concept Study for the Parameters of Corn Grains Using Digital Images and a Multivariate Regression Model
	Abstract
	Introduction
	Materials and Methods
	Samples and Instrument
	Physical Trait Measurements
	Image Evaluation

	Results and Discussion
	Conclusions
	References