962 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 15, NO. 6, JUNE 2018

Inducing Contextual Classifications With Kernel
Functions Into Support Vector Machines

Rogério Galante Negri , Erivaldo Antônio da Silva, and Wallace Casaca

Abstract— Kernel functions have revolutionized theory and
practice in the field of pattern recognition, especially to perform
image classification. Besides giving rise to nonlinear variants of
the well-known support vector machine (SVM), these functions
have also been successfully used to classify nonvectorial data (e.g.,
graphs and collection of sets), in which customized metrics are
created to precisely measure the similarity among such contextual
data entities. This letter introduces two context-inspired kernel
functions as new SVM-driven methods for remote sensing image
classification. In contrast to the existing SVM-based approaches
that assume only multiattribute vectors as representative features
in a high-dimensional space, the proposed models formally
establish comparisons between the entire sets of context-given
data, thus employing these contextual measurements to drive
the classification. More precisely, stochastic distances as well as
hypothesis tests are conveniently handled and “kernelized” to
build our models. A complete battery of experiments involving
both remote sensing and real-world images is conducted to
validate the performance of the proposed kernels against various
well-established SVM-based methods.

Index Terms— Context, image classification, Kernel functions.

I. INTRODUCTION

KERNEL functions play an important role in pattern
recognition, especially when one intends to accomplish

data classification. In fact, this class of functions forms the
basis of the nonlinear extensions for several popular lin-
ear models, in particular the well-established support vector
machine (SVM). In addition to being effective and mathe-
matically well posed, kernel functions also allow the use of
their prior linear counterparts on a variety of data classifi-
cation problems, including such cases whereby no vectorial
representations are available (e.g., textual content, collections
of sets, and graphs). In summary, kernel operators are seen as
a flexible and powerful tool for tuning a certain classification
method to fit the input data and not the inverse, as they may
lead to alternatives specially designed to obtain more refined
results and customizations on the addressed problem [1].

Considering the data classification context, investigations
toward achieving more accurate and robust methods remain

Manuscript received May 26, 2017; revised September 29, 2017 and Janu-
ary 13, 2018; accepted March 6, 2018. Date of publication April 3, 2018; date
of current version May 21, 2018. This work was supported in part by FAPESP
under Grant 2014/14830-8, Grant 2013/07375-0, Grant 2014/08822-2, and
Grant 2017/03595-6, and in part by UNESP-PROPe under Grant PROINTER-
2017/1654. (Corresponding author: Rogério Galante Negri.)

R. G. Negri is with the Instituto de Ciência e Tecnologia, UNESP, São José
dos Campos 12247-004, Brazil (e-mail: rogerio.negri@ict.unesp.br).

E. A. da Silva is with the Faculdade de Ciência e Tecnologia, UNESP,
Presidente Prudente 19060-900, Brazil (e-mail: erivaldo@fct.unesp.br).

W. Casaca is with the Campus Experimental de Rosana, UNESP, São Paulo
19274-000, Brazil (e-mail: wallace.coc@gmail.com).

Color versions of one or more of the figures in this letter are available
online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/LGRS.2018.2816460

challenging. Thus, the inclusion of contextual information
into the classification process may lead to new results in
terms of accuracy and data readability. There are a few works
that exploit this concept, by introducing kernels that are able
to embed contextual clues into the classification task. For
example, Camps-Valls et al. [2] and Gurram and Kwon [3]
first paved the way for kernel operators as contextual models
to properly address the specific case of hyperspectral image
classification.

This letter presents two novel kernels for contextual
remotely sensed image classification. The kernels are designed
in terms of stochastic distances and statistical hypothesis
tests, which take advantage of the robustness and versatility
provided by the probability theory. An extensive battery of
experiments with remote and nonremote sensing images is
conducted to numerically assess and validate the proposed
kernel functions. These experiments include a study case
with a synthetic aperture radar (SAR) image, a full data set
of optical remote sensing scenes, and a standard real-world
image. As a baseline, the SVM method is adopted as the kernel
machine in our analysis. The kernels are equipped on the
SVM and then compared against two contextual SVM-derived
methods: Markov random fields and smoothing-based models.
Finally, comparisons with classical SVM are also provided.

II. IMAGE CLASSIFICATION, SVM, AND KERNELS

Formally, a classifier is a function F : X → Y that assigns
an element x from the attribute space X to a specific class
ωi listed on � = {ω1, ω2, . . . , ωc}, c ∈ �∗, with class labels
varying in Y = {1, 2, . . . , c}. It means that x corresponds to
a certain class ωy , y ∈ Y , i.e., y = F(x).

Focusing on the image classification problem, the classifier
F is evaluated on the attribute vector x associated with a
given pixel s from the target image I, which is defined on
a support set S ⊂ �2. While the expression I(s) = x states
that pixel s ∈ S has its attributes characterized by x ∈ X ,
the neighborhood of s can be mathematically represented as

Vρ(s) = {t ∈ S : 0 ≤ md(s, t) ≤ ρ} (1)

where ρ accounts for the neighborhood influence radius, and
md(s, t) = max{|s1 − t1|, |s2 − t2|}, with s = (s1, s2),
t = (t1, t2) being the spatial positions of pixels s and t .
Note that the set Vρ(s) allows for incorporating contextual
information into a given classification model, as the resulting
classification will be induced by the attribute vectors xi so
that I(t) = xi , t ∈ Vρ(s). In practice, the context of s
will be conveyed by Vρ(s). Therefore, methods considered as
“contextual” are those ones that formally embed the context
of the image pixels into the classification pipeline, i.e., the

1545-598X © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

https://orcid.org/0000-0002-4808-2362


NEGRI et al.: INDUCING CONTEXTUAL CLASSIFICATIONS WITH KERNEL FUNCTIONS INTO SVMS 963

neighborhood structures, which are represented as a full set of
pixels, not properly as a vector.

Techniques devoted to classifying images vary in many
different aspects as to define F : X → Y as well as to
evaluate F on I. For example, methods that rely on
supervised learning take information from a training set
D = {(xi , yi ) ∈ X × Y : i = 1, . . . , c}. The mapping created
by F between X and Y represents the knowledge acquired
from D. A good representative of this kind of approach is
the well-known SVM, which has become popular within the
remote sensing community [4]. This method accomplishes
data classification by computing a hyperplane with a larger
separating margin. Such a hyperplane is given by the geometric
place for which the following function is set equal to zero:

f (x) = �w, x� + bias (2)

where w represents an orthogonal vector to the hyperplane and
the quotient |bias|/	w	 accounts for the distance between the
origin of the attribute space and the hyperplane. The tuning
parameters w and bias are obtained by solving an optimization
problem built from the training set D. For a detailed discussion
concerning the formality and training issues of SVM, see [5].

An attractive manner to increase the assertiveness of SVM-
derived classifications is to embed the input patterns (i.e.,
feature vectors) into a more appropriate space with better
separability. Indeed, this can be implicitly done by apply-
ing kernel functions. Kernels are used to modify the inner
product between input patterns in (2), acting directly on the
corresponding optimization problem to train the SVM [5].
Furthermore, they also enable the SVM and other existing
kernel methods to address problems, wherein the input data
are not necessarily modeled as a vectorial representation (e.g.,
textual content, arbitrary sets, and graphs).

In more mathematical terms, K : X 2 → � is called a kernel
function if K is symmetric with right to the input elements
and satisfies Mercer’s theorem conditions [1]. Despite its
versatility and well posedness, defining new kernels while
satisfying Mercer’s conditions is not a straightforward task
in practice. An alternative for successfully creating a kernel
is to take general models of kernels, for example, the radial
basis function (RBF) [1]

K (xu, xv ) = g (d (xu, xv )) (3)

where d : X 2 → � is a metric and g : � → � is a strictly
positive function.

The study and development of more appropriate kernels
for remotely sensed image classification have attracted con-
siderable attention over the past two decades. A pioneering
study concerning contextual classifications driven by kernel
functions is presented in [2], and later improved upon in [3].
Both methodologies integrate the spatial context, delimited by
an influence radius, with an average-guided kernel operator.
As a result, their outputs tend to produce excessive blurring,
similar to the use of smoothing filters. Still on kernel models,
Kondor and Jebara [6] combine the concept of stochastic
distance and kernels, being their approach later extended in [7]
to properly cope with region-based classifications on remote
sensing images.

This letter proposes two novel kernel operators which
conceptually embed the notion of measuring contextual data

into the image classification procedure. The kernels bear many
attractive features such as the ability to promote context-
guided classifications, solid mathematical foundation, and
good adaptability to fit the SVM onto the available data.

In contrast to the existing SVM-based methods that only
assume the input data as the set of multiattribute vectors
in a vector space, the designed kernels formally establish
comparisons between sets of abstract data instances in the
sense of a metric space. In other words, the neighborhood
structures of the pixels are interpreted as statistic models, not
necessarily as a well-structured feature vector. These models
are provided as input to the SVM, enabling the classifier to
perform contextual classifications while still exploiting the
spatial variability of the pixels neighborhoods in a deeper and
more effective way. As a result, the classification tends to be
more precise and discriminative, since the data to be labeled
are represented in terms of their local neighborhood patterns.

III. INDUCING CONTEXTUAL ANALYSIS INTO KERNELS

A. Jeffries-Matusita Kernel

A simple and effective manner to describe neighborhood
content as a valid contextual information is to consider such
content as data instances. Consequently, the acquired data will
no longer behave as usual, due to the absence of a regular
set of attribute points X that formally represents a pixel
neighborhood (i.e., collection of vectors Vρ(s)) as a single
feature vector. To deal with this criticism, one may set d(·, ·)
as a stochastic distance in (3) so that comparisons between
neighborhoods Vρ(si ) and Vρ(s j ) would make sense.

Stochastic distances are viewed in probability theory as a
powerful tool to quantify the separability between different
sets of data, therein measuring how far apart their corre-
sponding probability distributions are from each other [8].
Assuming that Vρ(si ) and Vρ(s j ) satisfy multivariate Gaussian
distributions, the so-called Jeffries-Matusita measure (JM) [9]
was taken as a stochastic distance in (3). More precisely,
the similarity between the neighbors of si and s j is computed
as follows:

J M(Vρ(si ),Vρ(s j )) = 2(1 − e−B(Vρ(si ),Vρ(s j ))) (4)

where B(·, ·) is the Bhattacharyya distance between two
multivariate Gaussian models [5].

Notice that the straight substitution of the JM function
into (3) does not lead to a genuine kernel because the triangular
inequality may not be strictly verified, thus hampering the use
of (4) as an authentic metric. To validate this requirement,
a sufficiently large constant is conventionally added to the JM
expression when the inputs Vρ(si ) and Vρ(s j ) are structurally
different. Once the values of JM are bounded by [0, 2],
the constant 2 arises as a suitable choice to be incorporated
into (4), thereby establishing the definitive JM kernel function
as follows:

K J M(V(si ),V(s j )) = e−γ J M(Vρ(si ),Vρ(s j )) (5)

where γ ∈ �+ is a regularization factor and JM is given as

J M(Vρ(si ),Vρ(s j )=
{

0, if Vρ(si ) = Vρ(s j ),
J M(Vρ(si ),Vρ(s j )) + 2, o\w.

(6)


964 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 15, NO. 6, JUNE 2018

B. Hypothesis Testing Kernel

Following Section III-A, where neighborhoods are also
inputs and (3) is taken to originate a kernel function,
the semantic data contained in Vρ(si ) match the valid content
in Vρ(s j ) by computing a p-value from some hypothesis tests.
Among the various well-known hypothesis tests, a convenient
choice is Student’s t-test, because it compares the mean values
of two data samples with distinct variances.

Let pst(Vρ(si ),Vρ(s j ); k) be a function that returns p-
values from a Student’s t-test so that these values are used
to compare the neighborhood structures of Vρ(si ) and Vρ(s j ).
The constant k denotes the kth band of the image I, while
b sets the total number of bands in I. Then, the following
similarity measure can be formulated:

P(Vρ(si ),Vρ(s j )) = 1

b

b∑
k=1

(1 − pst(Vρ(si ),Vρ(s j ); k)). (7)

Notice from the p-value of the Student’s test that values
closer to 1 indicate greater correlations between neighbor-
hoods. In addition, to precisely measure the similarity with
p-values, the quantity (1 − pst(·, ·; ·)) is intentionally accom-
modated within 7. This guarantees that the identity property
will hold, that is, identical inputs will return a null distance
when measured. Since P(·, ·) is symmetric and it satisfies the
identity property, function (7) is indeed a distance. Therefore,
as in Section III-A, P(·, ·) is redesigned as a metric by
summing constant 1 when Vρ(si ) 
= Vρ(s j ).

Finally, taking the radial basis equation (3) as premise, our
Student’s t-test-based kernel operator is defined as

KST (V(si ),V(s j )) = e−γ P(Vρ(si ),Vρ(s j ) (8)

where γ ∈ �+ is a tuning constant and P is the metric as
follows:

P(Vρ(si ),Vρ(s j )=
{

0, if Vρ(si ) = Vρ(s j ),
2− P(Vρ(si ),Vρ(s j )), otherwise. (9)

IV. EXPERIMENTS AND RESULTS

The performance of our context-inspired kernels is assessed
through an expressive set of comparisons involving three well-
established SVM classification methods on three application
scenarios. The first one aims at classifying a full data set
of very high-resolution optical remote sensing images. Then,
a realistic case study for land use/cover classification from a
radar image is discussed as part of our comparative analysis.
Finally, experiments run on a typical real-world image are also
provided to conclude this section. Implementation aspects such
as the training and test (ground truth) samples, as well as the
classes used in our experiments are shown in Table I.

The outputs computed by the pair of methods SVM
“plus” JM and Student’s t-test kernels are named here as
SVM+KJM and SVM+KST, while the acronyms SVM+ICM
and SVM+Mode denote the context-based algorithms of SVM
“plus” Markov random fields (with the iterated conditional
modes algorithm) [10] and the statistical mode filter. SVM in
a basic pixel-based version is also included as a baseline in
our tests. The ρ parameter in (1) is empirically taken as 1–3,
resulting in spatial windows of sizes 3×3, 5×5, and 7×7 for
the context-based methods. The “one-against-all” multiclass

TABLE I

NUMBER OF PIXELS FOR TRAINING AND TEST SAMPLES WITH RESPECT
TO LULC CLASSES FOR THE SIRI-WHU DATABASE,

TAPAJÓS AREA, AND LENA

strategy [5] is adopted in all evaluated methods, while the
RBF is applied to “kernelize” the pixel-based SVM and the
variants SVM+Mode and SVM+ICM.

To numerically validate the results, the kappa coefficient is
used as a quality measure. This metric gauges the agreement
level between the resulting classification and the expected
labels with respect to a set of test samples with ground-truth-
labeled areas [11]. The kappa variances are also tabulated,
in order to properly conduct hypothesis testing. The results of
the statistical analysis are generated in the sense of the bilateral
confidence interval of 95% to compare the kappa values.

Finally, to train the methods and maximize their clas-
sification accuracies, convenient choices are made toward
determining optimized values for C and γ . The selection is
performed by applying an exhaustive grid search process with
10-fold cross validation for the parameters that yield the best
accurate results with respect to the training samples, as in [12].
Motivated by [12], C ranges over {1, 10, 100, 1k, 10k}, while
the kernel parameter γ ∈ {0.25, 0.5, 0.75, 1.0, 1.25, 1.5}. The
number of iterations and minimum percentage change are
estimated in a similar manner.

A. Classifying Full Collections of Aerial Images

We first run all the evaluated techniques on a comprehensive
data set of remote sensing images. This collection of images,
called SIRI-WHU data set, contains more than 2400 aerial
scenes made publicly available by Zhao et al. [13]. The full
data set offers a dozen land classes covering different urban
areas in China. Therefore, we take this data set as a valid
benchmark to verify the assertiveness rate of the methods for
two representative land classes: industrial and coastal zones
(see the frames illustrating both zones in Fig. 1).

Table II summarizes the kappa values and their variances
for SVM, SVM+ICM, SVM+Mode, and our techniques,
SVM+KJM and SVM+KST, for the above-described land
classes of aerial scenes. One can observe from the tabulated
scores that the SVM+KJM variant outperforms the existing
context-based methods in all the measurements, attesting to its
accuracy in addressing the entire sets of aerial images such as
the ones found in [13]. Next, SVM+KST achieves the second
best scores for both the kappa and variance measurements,
followed by SVM+Mode and SVM+ICM.

One can note that the quality increment is more expansive
when the neighborhood size (#Neigh.) comes up from 3 × 3
to 5×5. Another observed aspect is that this size enlargement
implies in better classifications, but this is not always a learned
behavior, as reported in the bottommost rows of Table II.


NEGRI et al.: INDUCING CONTEXTUAL CLASSIFICATIONS WITH KERNEL FUNCTIONS INTO SVMS 965

Fig. 1. (a) and (b) Two illustrative aerial scenes taken from SIRI-WHU
database. (c) Examined areas and LULC samples ( AG, BS, PF,
and PS) used for training and accuracy evaluations in our case study with
a SAR image.

TABLE II

QUANTITATIVE COMPARISON OF CONTEXTUAL SVM-BASED METHODS

WHEN ASSESSED ON THE REMOTE SENSING DATA SET [13]. BOLD

AND GRAY VALUES INDICATE THE BEST AND
SECOND BEST SCORES

B. Case Study Description and Validation
A case study covering a multiclass classification case using

a SAR image captured by the ALOS-PALSAR sensor is
given next. This SAR image, acquired on April 23, 2007,
at 600×600 pixels and with a 20-m resolution after applying a
3×3 multilook process in HH and HV amplitude polarization,
corresponds to an area of the Amazon situated in the south
of Santarém, state of Pará, Brazil, more precisely around
the Tapajós National Forest. Its area’s geographic coordinates
are 3o8�20�� S and 54o55�33�� W. Field work conducted in
September 2009 identified the following land use and land
cover (LULC) types: Primary Forest (PF), Pasture (PS), Bare
Soil (BS), and Agriculture (AG) [see Fig. 1(c), where the
polygonal regions indicate the test samples and colored points
the training sets].

1) SAR Image Classification (Quantitative Assessments):
From the kappa and variance values listed in Table III,
the SVM+KJM technique delivers the best scores, followed by
SVM+KST. Indeed, the SVM+KJM and SVM+KST variants
differ substantially from the other for both quality measures.
For instance, the kappa achieved by the pair SVM+KST for
the minimum neighborhood size of 3 × 3 matches the kappa
obtained by SVM+ICM and SVM+Mode with a 5 × 5 spa-
tial window. Concerning hypothesis testing, the SVM+Mode
results under the sizes of 5×5 and 7×7 are statistically equiv-
alent. Similar conclusions can be verified from the bottommost
rows of the SVM+KST variant. Finally, one can check that
the accuracies of SVM+KJM, SVM+KST, and SVM+Mode
rise as the neighborhood dimensions increase.

TABLE III

QUANTITATIVE COMPARISON OF SVM-BASED METHODS WHEN APPLIED
TO THE SAR IMAGE OF TAPAJÓS AREA AND LENA’S PICTURE.

BOLD AND GRAY VALUES INDICATE THE BEST AND

SECOND BEST SCORES

2) SAR Image Classification (Qualitative Assessments):
Fig. 2 portrays for the investigated area, the visual results
obtained by the techniques when the neighborhood is fixed
to 5 × 5. Although the context-guided methods produce better
(noiseless) partitions when compared with the SVM pixelwise
approach, the use of the proposed JM kernel leads to clearer
characterization of the PS and AG segments, as its output
is more similar to the ground-truth classifications outlined
in Fig. 1(c). Note also that, similar to the SVM+KJM method,
the pair SVM+KST produces satisfactory classifications, even
for the PS and AG patterns.

C. Contextual Classifications on Real-Word Images
Fig. 3 illustrates the capability of SVM+KJM and

SVM+KST methods to classify images that commonly appear
in everyday life, even for images that have been contaminated
by noise. So, to establish a valid benchmark to assess the
classification results, the segments identified and annotated as
classes in the Lena’s picture [see Fig. 3(a)] are: SK (skin), HT
(hat), FE (feather) and HA (hair).

Visually inspecting the results, one can see that both the
KJM and KST kernels achieve more accurate and consistent
outcomes, mainly regarding the assertiveness level for the FE
and SK classes, as indicated by the large areas demarcated
by blue and red pixels as shown in Fig. 3(e) and (f). This
is also confirmed when one visualizes the quality measures
listed in Table III, where the kappas for SVM+KJM and
SMV+KST are greater than those obtained by the evaluated
methods in almost all the comparative scenarios with different
neighborhoods.

D. Computational Aspects and Timings

To perform our experiments, an Intel Core i7 processor with
16 GB of RAM running a Linux operating system has been
used. The implementations were coded using IDL (Interactive
Data Language) programming language.

The computational timings of the examined algorithms are
reported in Fig. 4. Although the SVM+KJM and SVM+KST
methods produce more accurate classifications, they also take
longer than the others. To address this, more efficient imple-
mentations (e.g., GPU and parallel architectures) can be used
to alleviate the computational burden of the algorithms.


966 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 15, NO. 6, JUNE 2018

Fig. 2. Classification results obtained from the evaluated methods when applied to the Tapajós study area. The classes are AG, BS, PF, and PS.
(a) SVM. (b) SVM+ICM. (c) SVM+Mode (5×5). (d) SVM+KJM (5×5). (e) SVM+KST (5×5).

Fig. 3. Classification results obtained from the evaluated methods on a real-world picture (Lena’s picture). The classes are SK, HT, FE, and HA.
(a) Training/test samples. (b) SVM. (c) SVM+ICM. (d) SVM+Mode (5×5). (e) SVM+KJM (5×5). (f) SVM+KST (5×5).

Fig. 4. Computational timings for all the evaluated methods.

V. CONCLUSION

This letter proposes two context-inspired kernel functions
as new SVM methods for remote sensing image classifi-
cation. In contrast to the existing SVM-based approaches
that exploit the pixel context as an attribute vector, the pro-
posed JM and Student’s t-test kernels allow for comput-
ing the similarity between the entire sets of neighborhoods,
thus leading to the use of such contextual patterns to
drive the classification task. Besides implicitly embedding
contextual data into kernel functions, the SVM+KJM and
SVM+KST techniques are found to be robust when dealing
with complex aerial images. This behavior can be verified
from the experiments involving full collections of aerial
scenes and the case study with an Amazon’s area, where
both kernels performed better than others, thus achieving
high scores while still making the regions classified in the
images more noticeable. In summary, flexibility and high
accuracy render the proposed methods two very attractive
image classification approaches in the context of remote
sensing.

As future work, parallel architectures will be used to
reduce the cost of the methods. Another goal is to exploit
other statistical tests to address different applications in this
context.

ACKNOWLEDGMENT

The authors would like to thank FAPESP [grants 2014/
14830-8, 2013/07375-0, 2014/08822-2 and 2017/03595-6] and
UNESP-PROPe [grant PROINTER-2017/1654] for funding
this research.

REFERENCES

[1] B. Schölkopf and A. J. Smola, Learning With Kernels: Support Vector
Machines, Regularization, Optimization, and Beyond (Adaptive Com-
putation and Machine Learning). Cambridge, MA, USA: MIT Press,
2002.

[2] G. Camps-Valls, L. Gomez-Chova, J. Munoz-Mari, J. Vila-Frances,
and J. Calpe-Maravilla, “Composite kernels for hyperspectral image
classification,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 1, pp. 93–97,
Jan. 2006.

[3] P. Gurram and H. Kwon, “Contextual SVM using Hilbert space embed-
ding for hyperspectral classification,” IEEE Geosci. Remote Sens. Lett.,
vol. 10, no. 5, pp. 1031–1035, Sep. 2013.

[4] G. Mountrakis, J. Im, and C. Ogole, “Support vector machines in remote
sensing: A review,” ISPRS J. Photogramm. Remote Sens., vol. 66, no. 3,
pp. 247–259, 2011.

[5] A. R. Webb and K. D. Copsey, Statistical Pattern Recognition, 3rd ed.
New York, NY, USA: Wiley, 2011.

[6] R. Kondor and T. Jebara, “A kernel between sets of vectors,” in Proc.
Int. Conf. Mach. Learn., 2003, pp. 1–8.

[7] R. G. Negri, L. V. Dutra, S. J. S. Sant’Anna, and D. Lu, “Examining
region-based methods for land cover classification using stochastic
distances,” Int. J. Remote Sens., vol. 37, no. 8, pp. 1902–1921, 2016.

[8] L. Castañeda, V. Arunachalam, and S. Dharmaraja, Introduction to
Probability and Stochastic Processes With Applications. New York, NY,
USA: Wiley, 2012.

[9] J. A. Richards, Remote Sensing Digital Image Analysis, 5th ed. Berlin,
Germany: Springer, 2013.

[10] F. Bovolo and L. Bruzzone, “A context-sensitive technique based on
support vector machines for image classification,” in Pattern Recogni-
tion and Machine Intelligence (Lecture Notes in Computer Science),
vol. 3776. Berlin, Germany: Springer, 2005, pp. 260–265.

[11] R. G. Congalton and K. Green, Assessing the Accuracy of Remotely
Sensed Data. Boca Raton, FL, USA: CRC Press, 2009.

[12] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, “A practical guide to support
vector classification,” Dept. Comput. Sci., Nat. Taiwan Univ., Taipei,
Taiwan, Res. Paper, 2016. [Online]. Available: http://www.
mdpi.com/2072-4292/8/2/157

[13] B. Zhao, Y. Zhong, L. Zhang, and B. Huang, “The Fisher kernel
coding framework for high spatial resolution scene classification,”
Remote Sens., vol. 8, no. 2, pp. 157–177, 2016. [Online]. Available:
http://www.mdpi.com/2072-4292/8/2/157