Chapter 11

Decision Fusion of Remote-Sensing Data for Land Cover Classification

Arnaud Le BrisNesrine Chehata,Walid Ouerghemmi,Cyril Wendl,Tristan PostadjianAnne Puissant§Clément Mallet    Univ. Paris-Est, LASTIG STRUDEL, IGN, ENSG, Saint-Mande, France
EA G&E Bordeaux INP, Université Bordeaux Montaigne, Pessac, France
Student at Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
§CNRS UMR 7362 LIVE-Université de Strasbourg, Strasbourg, France
Aix-Marseille Université, CNRS ESPACE UMR 7300, Aix-en-Provence, France

Abstract

Very high spatial resolution (VHR) multispectral imagery enables a fine delineation of objects and a possible use of texture information. Other sensors provide a lower spatial resolution but an enhanced spectral or temporal information, permitting one to consider richer land cover semantics. So as to benefit from the complementary characteristics of these multimodal sources, a decision late fusion scheme is proposed. This makes it possible to benefit from the full capacities of each sensor, while dealing with both semantic and spatial uncertainties. The different remote-sensing modalities are first classified independently. Separate class membership maps are calculated and then merged at the pixel level, using decision fusion rules. A final label map is obtained from a global regularization scheme in order to deal with spatial uncertainties while conserving the contrasts from the initial images. It relies on a probabilistic graphical model involving a fit-to-data term related to merged class membership measures and an image-based contrast-sensitive regularization term. Conflict between sources can also be integrated into this scheme.

Two experimental cases are presented. In the first case one considers the fusion of VHR multispectral imagery with lower spatial resolution hyperspectral imagery for fine-grained land cover classification problem in dense urban areas. In the second case one uses SPOT 6/7 satellite imagery and Sentinel-2 time series to extract urban area footprints through a two-step process: classifications are first merged in order to detect building objects, from which a urban area prior probability is derived and eventually merged to Sentinel-2 classification output for urban footprint detection.

Keywords

Late fusion; Decision fusion; Multimodal remote sensing; Classification; Land cover; Very high spatial resolution; Hyperspectral; Time series; Urban area; Urban footprint

11.1 Introduction

The last years have witnessed the emergence of a large variety of new sensors with various characteristics. The possibility to collect different kinds of observations over the same area has considerably increased: remote sensing can now be considered generically multimodal [1]. Those sensors can use different modalities (radar, Lidar or optical). They can be air-borne or satellite-borne. Even for the same modality, they can exhibit very distinct characteristics: for instance, optical sensors show a large range of spectral configurations (number of spectral bands, position and width of the spectral bands), spatial resolutions, coverage and, for spaceborne sensors, revisit times (i.e., minimum delay between two possible consecutive acquisitions over the same area, thus conditioning the possibility to capture genuine time series). As a consequence, combining remote-sensing data with different characteristics is a standard remote-sensing problem that has been extensively investigated in the literature [2]. The overall aim consists in fusing multisensor information as a means of combining the respective advantages of each sensor. Complementary observations can thus be exploited for land cover mapping purposes, which is a core remote-sensing application and the necessary input for a large number of public policies and environmental models. Combining existing sensors can mitigate limitations of any one particular sensor for various land cover issues [3,4].

This chapter specifically focuses on the fusion of one data type, exhibiting a very high spatial resolution, with another one, exhibiting a lower spatial resolution but enhanced complementary characteristics. Indeed, very high spatial resolution (VHR) multispectral imagery enables an accurate spatial delineation of objects and a possible use of texture information for enhanced class discrimination [5]. On the other hand, sensors with lower spatial resolutions offer enhanced spectral or temporal information, making it possible to consider richer land cover semantics. To illustrate this problem, two use cases will be considered, accompanied by two methodological contributions:

  • •  The first one considers the fusion of very high spatial resolution multispectral imagery with lower spatial resolution hyperspectral imagery for detailed urban land cover classification. Hyperspectral imagery provides an accurate spectral information but generally with a low geometric precision. Hence, it can provide finer land cover semantics classification results, while a VHR image helps to retrieve the geometric contours of such classes. Combining both data sources helps to reach better accuracy scores at the highest spatial resolution of both datasets.
  • •  The second one integrates very high spatial resolution multispectral/monodate SPOT 6/7 satellite imagery (classified with a convolutional neural network, CNN), and Sentinel-2 time series (classified with a random forest). The final aim is urban footprint detection. In this case, VHR imagery provides texture information and fine object delineation, while the Sentinel-2 time series gives access to better contextual information.

In both cases, the land cover fusion scheme targets benefiting from the complementary characteristics of these multimodal sources.

Existing data fusion approaches will be analyzed in Sect. 11.1.1. From this review, existing methods will be discussed, and a fusion strategy elaborated (Sect. 11.1.2). This proposed framework will then be presented in detail (Sect. 11.2), before being applied to the two above-mentioned use cases (Sects. 11.3 and 11.4, respectively).

11.1.1 Review of the Main Data Fusion Methods

Fusion of heterogeneous data sources have been widely investigated in the remote-sensing literature (e.g., [69]). Fusion can be carried out at three different levels [10]:

  • •  The observation level: early fusion.
  • •  The attribute/feature level: intermediate fusion.
  • •  The decision level: late fusion.

11.1.1.1 Early fusion – fusion at the observation level

Fusion can be achieved at the observation level, i.e., through the direct joint analysis of the pixel values (with or without calibration procedures). For that purpose, pan-sharpening is a well-known technique that integrates the geometric details of a high resolution panchromatic image and the color information of a low spatial resolution multispectral (or hyperspectral) image to produce a high spatial resolution multispectral (or hyperspectral) image. Pan-sharpening methods usually use the panchromatic image to replace the high frequency part in the low resolution image [11]. Other fusion algorithms have been proposed to merge multispectral (or hyperspectral) and panchromatic (but also multispectral and hyperspectral) images to combine complementary characteristics in terms of spatial and spectral resolutions [1214]). A review of such methods can be found in [14]. Eventually, super-resolution is another approach relying on early fusion of several sensors [15].

11.1.1.2 Intermediate fusion – fusion at the attribute/feature level

Data sources can be merged at the feature level. Features (spectral indices, texture-based, etc.) are computed for each source separately or for both of them and fed into the same classifier through a unique feature set [16]. Examples of remote-sensing pipelines involving fusion at the attribute level can be found in [1722]. For instance, [18] proposed a conditional random field (CRF) model for building detection using InSAR and orthoimage features. Reference [20] merged Lidar and optical aerial image features for forest stand extraction: the proposed approach involves several steps (segmentation, classification, and regularization) and fusion is performed at each of them. Improvement is noticed for each step. More recently, deep convolutional neural networks were used to perform data fusion at the attribute level [23]. Reference [22] applied for instance deep forests to Lidar and hyperspectral imagery features. A detailed review can be found in [24]. Several datasets and challenges have been released in the last decades, under the aegis of the IEEE GRSS society [2527].

11.1.1.3 Late fusion – fusion at the decision level

Late decision fusion happens after the classification process: the outputs of multiple independent classifiers are combined in order to provide a more reliable decision. Such classification results can be either label maps or class membership probability maps. Various late decision fusion methods have been proposed. Most of them can be divided into different categories: consensus rules (majority voting), probabilistic approaches (Bayesian fusion), credibilist or evidential ones, and possibilist ones.

The probabilistic, evidential, and credibilist decision fusion approaches are generic and can be applied to different fusion problems. They only require class “membership” measures (probabilities or belief masses depending on the approach) for each source and for each class, or at least a confidence measure for each source.

Possibilist methods use fuzzy logic-based fusion rules [2830]. They require one to define weights [31] in order to better deal with the uncertainty of the different sources. Such generic approaches have been applied to remote-sensing data [32].

Evidential approaches are a generalization of probabilistic ones. They include the well-known Dempster–Shafer fusion rule [33]. In remote sensing, this rule has often been used to merge classifications or alarm detection results. For instance, [3436] applied the Dempster–Shafer rule to combine several different supervised building detectors based on different remote-sensing modalities (optical, Lidar, radar). This rule was also used by [37] for the fusion of different road obstacle detectors in the context of intelligent vehicle development. Reference [38] used the Dempster–Shafer rule for the fusion of urban footprints detected at different dates out of satellite archival images. References [39,40] applied this rule to detecting changes in an unsupervised classification context.

Another evidential fusion rule is the Yager rule. It was applied by [41] to combine different road obstacle detectors for vehicle navigation. Other evidential rules have been proposed more recently: the Dezert–Smarandache rule may be mentioned [42]. Other new rules have also been proposed by [43,44]: they extend Dempster–Shafer rules in order to achieve a better management of the conflict between sources. However efficient in some cases, Dempster–Shafer remains a theoretical complex framework that does not easily apply when dealing with heterogeneous and multiple data.

Another important issue consists in defining the input of the different fusion rules. Different situations can be taken into account depending on whether a global confidence measure is affected for each source, or, conversely, whether class posterior probabilities for a given source are directly available for each pixel (directly provided by the initial classifier). In the former situation, global confidence measures affected to each source are calculated from the confusion matrices (indeed, a confusion matrix provides the probability of an object labeled class A by one source to belong in fact to class B). Validation data are thus necessary to calculate these weights. Furthermore, for evidential methods and especially Dempster–Shafer methods, uncertainty classes (i.e. unions of original classes) must necessarily be defined. Hence, belief masses associated to these classes must be computed. If a global confidence is associated to a source, some solutions have been proposed. For instance, we have Appriou's method [45], or more recently the method presented in [46,47]. If a class membership measure is affected to each pixel for each source and for each class, it is also possible to derive belief masses for these uncertainty (union) classes. Finally, [38] propose an alternative way to integrate these two kinds of information.

Last but not least, the last category of late fusion approaches consists in supervised learning. They automatically learn from training examples the best way to merge input sources. Per-source posterior class probabilities are concatenated and considered as a feature vector. This feature vector is then provided as an input to a classifier which is trained to perform the best possible fusion. Thus, they can be considered as being at the interplay between late and intermediate fusion (classifiers previously applied to each source can then be considered as a kind of feature generators, then referred to as auto-context classification [48]). Such approaches have been used with different classifiers: random forest [49,50], Adaboost [50], support vector machines [51,52]. These supervised learning-based methods enable good results but require a sufficient amount of training data to model the classes and avoid over-fitting (especially for deep learning).

In addition, it must here be said that most fusion methods mentioned in this section generate measures to assess the conflict between two sources.

Late approaches have often been applied to remote-sensing fusion problems [5356,32,51,5761]. Fusion methods operating at the decision level can be applied in two situations, considering if they try to merge multiple classifiers applied to either the same data source, or to multiple sources. For instance, [53] combined neuronal and statistical maximum likelihood classifiers using several consensus theory rules (i.e., majority voting, complete agreement) to classify multispectral and hyperspectral images. References [57,58] merged posterior probabilities from maximum likelihood classifications of optical images with prior information about classes derived out of, respectively, digital terrain models or digital surface models, as well as information from existing land cover databases. Reference [55] investigated the fusion of multitemporal thematic mapper images, using decision fusion-based methods (i.e., joint likelihood and weighted majority fusion). A characterization of the spatial organization of SAR image elements is investigated by [54] merging the responses of multiple low-level detectors applied to the same image within a Dempster–Shafer scheme. Reference [32] investigated the use of fuzzy decision rules to combine the classification results of a conjugate gradient neural network and a fuzzy classifier over an IKONOS image. Reference [61] combined convolutional neural networks and random forest classifiers using a multiplication Bayesian scheme.

11.1.2 Discussion and Proposal of a Strategy

Fusion can be performed at different levels. Fusion at the observation level, e.g., pan or multisharpening, is limited to specific situations where it has a real physical meaning (e.g., hyperspectral and multispectral images acquired simultaneously). It is not generic enough.

Fusion at the observation level or at the decision level is more generic and also eligible to the present fusion problem. Two main issues remain: (i) for both, the spatial scale of analysis and, subsequently, the interpolation process; (ii) for feature-based approaches, the ability to correctly handle the various data sources in the decision process. In the case of imbalanced feature sets, supervised techniques such as random forests or support vector machines, even with feature selection strategies, may prefer the data source generating the larger number of attributes. Thus, the process will not fully benefit from the advantages of all datasets.

As a consequence, late fusion strategy is adopted in this chapter. Indeed, contrary to intermediate level fusion methods, it makes it possible to initially process each input data source independently through specific optimal methods. Moreover, it even enables one to use already existing results from available operational land cover classification services, as long as they provide class membership confidence measures. Besides, especially in this last situation, it can be used without any ground truth (training) information, contrary to intermediate level fusion methods.

Most existing decision fusion methods do not explicitly take into account the fact that input data sources have different spatial resolutions, and thus do not explicitly deal with both semantic and spatial uncertainties. Spatial uncertainty handling here consists in removing classification noise, and enforcing that the classification result follows as closely as possible the natural borders in the original images. Such a task can be cast in the form of a smoothing problem. Local smoothing methods exist: majority voting, Gaussian and bilateral filtering [62], as well as probabilistic relaxation [63] are possible. The majority vote can be used in particular when a segmentation of the area is available: the major class is assigned to the segment. The vote can also be weighted by class probabilities of the different pixels. The probabilistic relaxation is another local smoothing method that aims at homogenizing probabilities of a pixel according to its neighbors. It is an iterative algorithm in which the class probability at each pixel is updated at each iteration in order to have it closer to the probabilities of its neighbors.

However, these local smoothing methods are generally outperformed by global regularization strategies [64,20]. Global regularization methods consider the whole image by connecting each pixel to its neighbors. They traditionally adopt Markov random fields (MRFs): the labels at different locations are not considered to be independent and the global solution can be retrieved with the simple knowledge of the close neighborhood for each pixel. The optimal configuration of labels is retrieved when finding the maximum a posteriori over the entire field [65,64]. The problem is therefore considered as a minimization procedure of a global energy over the whole image. Despite a simple neighborhood encoding (pairwise relations are often preferred), the optimization procedure propagates over large distances. Global regularization is often considered as a post-processing step within a classification process. It has been associated to late fusion in recent works, as for instance in [66,67].

As a consequence, so as to benefit from the complementarity of a very high spatial resolution sensor with another one, exhibiting lower spatial resolution but enhanced complementary characteristics, the proposed fusion framework involves (i) fusion at decision level, (ii) associated to global regularization. It mostly relies on existing state-of-the-art methods, but combines them in order to cope both with semantic and spatial uncertainties. Besides, it is flexible enough to integrate several fusion rules and be applied to various use cases.

11.2 Proposed Framework

A late fusion framework is proposed in order to benefit both from low spatial resolution data (but enhanced spectrally or temporally) and very high spatial resolution multispectral monodate data. It aims at dealing both with semantic and spatial uncertainties. It consists in three main steps, presented in Fig. 11.1.

  1. (A)  Classification of each original source and generation of the posterior class probabilities: the two input data sources are first automatically labeled, independently, by specific adapted processes. At the end, a posterior class probability map is generated for each of them. Here, predictions have already been computed and are considered as granted.
  2. (B)  Per pixel fusion of the posterior probabilities at the decision level: a per-pixel decision fusion is applied to these maps so as to combine them into a more accurate decision at the highest spatial resolution. Different fusion rules are considered. This step aims at dealing with semantic uncertainties between sources. A conflict measure between sources can also be generated at this step (Sect. 11.2.1).
  3. (C)  Global regularization: The final classification map is retrieved as a result of a global regularization step of the merged class membership probability map obtained at the previous step. It allows one to deal with spatial uncertainties between both sources, to reduce remaining noise and to take into account the contrast information of the original image and subsequently to follow as closely as possible the natural borders in images (Sect. 11.2.2).

Image
Figure 11.1 Proposed generic framework.

11.2.1 Fusion Rules

11.2.1.1 Fuzzy rules

The first tested fusion approach is based on fuzzy rules [30]. Fuzzy rule theory states that a fuzzy set A in a reference set of classes LImage is a set of ordered pairs:

A=[(c,P(c)A(x)|cL)],

Image(11.1)

where the membership probability of A in PImage is given by P(c)A:L[0,1]Image. The measure of conflict (1KImage) between two sources is given as [68]

K=supcLmin(P(c)A(x),P(c)B(x)).

Image(11.2)

In order to account for the fact that fuzzy sets with a strong fuzziness possibly hold unreliable information, each fuzzy set i is weighted according to a pointwise confidence measure wiImage [32]:

wi=nk=0,kiHαQE(Pk)(n1)nk=0HαQE(Pk).

Image(11.3)

n is the number of sources and HαQEImage is a fuzziness measure called α-quadratic entropy (QE) [31]. Each fuzzy set i is weighted by the fuzziness degree of all other fuzzy sets (i.e., classifications). If the fuzziness degree of the other sets is high, the weight of a given source i will be high too. For all the following fusion rules, the fuzzy sets have been weighted as P˜(c)A(x)=wAP(c)A(x)Image, P˜(c)B(x)=wBP(c)B(x)Image, P(c)A(x)Image and P(c)B(x)Image being the original membership probabilities and wAImage, wBImage their corresponding pointwise measures.

In further experiments, all fuzzy rules have been tested as input using the probabilities weighted by the pointwise measure.

The following fusion rules based on fuzzy logic were considered:

  1. 1.  Minimum rule (Min) as the intersection of two fuzzy sets PAImage and PBImage, given by the minimum of their membership probabilities (conjunctive behavior):

cL(PAPB)(c)(x)=P(c)fusion(x)=min(PA(c)(x),P(c)B(x)).

Image(11.4)

  1. 2.  Maximum rule (Max) as the union between the two fuzzy sets PAImage and PBImage, given by the maximum of their membership probabilities (disjunctive behavior):

cL(PAPB)(c)(x)=P(c)fusion(x)=max(PA(c)(x),P(c)B(x)).

Image(11.5)

  1. 3.  Compromise operator:

P(c)fusion(x)=max(T1,min(T2,(1K)))if(1K)1,max(P(c)A(x),P(c)B(x))if (1K)=1.

Image(11.6)

  1. T1=min(P(c)A(x),P(c)B(x))KImage, T2=max(P(c)A(x),P(c)B(x))Image.
  2. It can be noticed that the operator behavior is conjunctive when the conflict between A and B is low (1K0Image), and disjunctive when the conflict is high (1K1Image). When the conflict is partial, the operator behaves in a compromise way [68].
  3. 4.  Compromise modified: Since a pure compromise fusion rule would favor T1Image, [69] proposed to measure the intra-class conflict as fc=abs(Cbest1Cbest2)Image with Cbest1=argmaxcLP(c)S(x)Image and Cbest2=argmaxcLCbest1P(c)S(x)Image. They set a conflict threshold tcImage (e.g., tc=0.25Image for experiments in Sect. 11.3) to be used as follows:
Image
Algorithm 1 Compromise rule according to [69]
  1. 5.  Prioritized operators (referred to as Prior 1 for Eq. (11.7) and Prior 2 for Eq. (11.8)):

P(c)fusion(x)=max(P(c)A(x),min(P(c)B(x),K)),

Image(11.7)

P(c)fusion(x)=min(P(c)A(x),max(P(c)B(x),(1K))).

Image(11.8)

  1. If the conflict between A and B is high (i.e., K0Image), only P(c)A(x)Image is taken into account (prioritized) and P(c)B(x)Image is considered as a specific piece of information. Thus, the fusion result depends on the order of the sources A and B.
  2. 6.  An accuracy dependent (AD) operator [32], integrating both local and global confidence measurements:

P(c)fusion(x)=max(min(wi.P(c)i(x),f(c)i(x)),i[1,n]),

Image(11.9)

  1. where f(c)iImage is the global confidence of source i regarding class c, PiImage is a class membership of source i, and wiImage is a normalization factor (see Eq. (11.3)). This operator ensures that only reliable sources are taken into consideration for each class, via the predefined coefficients f(c)iImage. The idea seems interesting. Nevertheless, the final result depends on the reliability of the classifier and also on the availability of ground truth data, which is mandatory to generate the f(c)iImage term.

11.2.1.2 Bayesian combination and majority vote

A straightforward approach is to sum or multiply the input class membership probabilities as a Bayesian sum (∼ majority vote) or product [70]:

P(c)fusion_sum(x)=P(c)A(x)+P(c)B(x),

Image(11.10)

P(c)fusion_product(x)=P(c)A(x)×P(c)B(x).

Image(11.11)

Those rules will be referred to as Bayesian sum and Bayesian product, respectively.

11.2.1.3 Margin-based rules

The aim of these rules is to take into account the confidence, measured by the classification margin, of each source. The classification margin is defined for each pixel x and each source s as the difference between the two highest class probabilities:

margin(s)(x)=P(Cbest1)s(x)P(Cbest2)s(x),

Image(11.12)

with C(s)best1(x)=argmaxcLP(c)s(x)Image and C(s)best2(x)=argmaxcL{C(s)best1(x)}Ps(c)(x)Image.

Fusion can then be carried out preferring for each pixel the most confident source, i.e. the one with the highest margin. This fusion rule (referred to as margin-Max) selects, for each pixel, the source for which the margin between the two highest probabilities is the highest, with sources S=A,BImage and the classes L={ci}1inImage:

x,cLP(c)fusion(x)=P(c)Sbest(x),

Image(11.13)

where Sbest=argmaxSCmargin(s)(x)Image.

The classifier confidence information provided by the margin can also be used to weight the class probabilities of each source in the Bayesian sum and product (respectively, margin Bayesian sum weighted, margin Bayesian product weighted):

P(c)fusion_sum(x)=P(c)A(x)margin(A)(x)+PB(c)(x)margin(c)B(x)margin(A)(x)+margin(B)(x),

Image(11.14)

P(c)fusion_product(x)=(P(c)A(x))margin(A)(x)margin(A)(x)+margin(B)(x)×(P(c)B(x))margin(B)(x)margin(A)(x)+margin(B)(x).

Image(11.15)

11.2.1.4 Dempster–Shafer evidence theory

According to the Dempster–Shafer (DS) formalism, an information from a source s for a class c can be given as a mass function mc|mc[0,1]Image [33]. Dempster–Shafer's evidence theory rule assumes simple classes cLImage as well as composed classes, which were hence limited to two simple classes at most [69].

Masses associated to each simple class are directly the class membership probabilities:

m(c)s(x)=P(x)s.

Image(11.16)

For mixed classes, c1,c2L,Image pixel x and sSImage, two versions were tested, denoted DSV1Image and DSV2Image, respectively:

m(c1c2)s(x)=(P(c1)s(x)+P(c2)s(x))×(1max(P(c1)s(x),P(c2)s(x)))+min(P(c1)s(x),P(c2)s),

Image(11.17)

m(c1c2)s(x)=12(P(c1)s(x)+P(c2)s(x))×(1max(P(c1)s(x),P(c2)s(x)))+min(P(c1)s(x),P(c2)s).

Image(11.18)

This leads to a mass m(c1c2)s(x)[0,1]Image, being 1 if P(c1)s=0Image, P(c2)s=1Image or P(c2)s=0Image, P(c1)s=1Image. In both versions, all masses are normalized such that (s)cLm(s)c(x)=1Image.

The fusion rule is based on the following conflict measure between two sources A and B:

K(x)=c,dLcd(s)=m(c)A(x)m(c)B(x),

Image(11.19)

c,dLImage being mixed classes with cd=Image.

The fusion is performed by

m(c)fusion(x)=11K(x)c1,c2Lc1c2=c(s)m(c)A(x)m(c)B(x).

Image(11.20)

11.2.1.5 Supervised fusion rules: learning based approaches

In addition to the previous standard fusion rules, learning-based supervised methods were also tested [51,52,23]. Such methods consist in learning how to best merge both sources (based on a ground truth). A classifier is trained to label feature vectors corresponding to the concatenation of class membership measures from both sources. Thus, such a strategy can be considered at the interplay between late and intermediate fusion (classifiers applied independently to each source can then be considered as a kind of feature generators). It is similar to auto-context approaches. A drawback stems from the fact that they require a significant amount of reference data.

In next experiments, two classifiers were considered for supervised fusion: random forests (RFs) [71] and support vector machines (SVMs) [72] with a linear or a radial basis function (rbf) kernel.

11.2.2 Global Regularization

After fusion rules have been applied at the pixel level, a spatial regularization of the obtained classification map is performed. This regularization here aims at dealing with spatial uncertainties between both sources. Indeed, the fusion result still contains noisy patches, especially in transition areas between neighboring classes. Besides, considering original image information, such a regularization also enables one to preserve real-world contours more accurately.

A global regularization strategy is adopted [66]. The problem is expressed using an energetic graphical model and solved as a min-cut problem. Indeed, such a formulation has been used successfully for many purposes related to image processing in the last years [73].

11.2.2.1 Model formulation(s)

The problem is formulated in terms of an energy E that has to be minimized over the whole image I in order to retrieve a labeling C of the entire image which corresponds to a minimum of E. As commonly adopted in the literature, E consists of two terms, one related to the data fidelity, and one to prior spatial knowledge, setting constraints on class transitions between the different pairs of neighboring pixels N. Several options were considered for the different energy terms. We have

E(Pfusion,Cfusion,C)=xIEdata(C(x))+λx,yNxyEreg(C(x),C(y)),

Image(11.21)

where

Edata(C(x))=f(Pfusion(C(x))),Ereg(C(x)=C(y))=g(Pfusion(C(x)),Pfusion(C(y)),Cfusion(x),Cfusion(y),I(x),I(y)),Ereg(C(x)C(y))=h(Pfusion(C(x)),Pfusion(C(y)),Cfusion(x),Cfusion(y),I(x),I(y)).

Image

The final label map corresponds to the configuration C which minimizes E over I.

The data term is a fit-to-data attachment term. It relies on the probability distribution PfusionImage, defined by a function f(.)Image. The function f ensures that if the probability for a pixel x to belong to class C(x)Image is close to 1, EdataImage will be small. It will not impact the total energy E. Conversely, if the probability for a pixel x to belong to class C(x)Image is low, EdataImage will be near its maximum and will penalize such a configuration. The following options were tested:

Option 1f(t)=log(t),

Image(11.22)

Option 2f(t)=1t.

Image(11.23)

Earlier experiments [20] verified that Option 2 tends to smooth the classification map more than Option 2. Thus, the data term will be selected among these options depending on the targeted application and on the input data. Option 1 will be selected to keep small regions as long as they are relevant according to class probabilities, while Option 2 will be used to obtain smoother maps with wider flat areas [20].

The regularization term EregImage defines the interactions between a pixel x and its eight neighbors, setting a constraint to smooth the initial classification map. Several options were also considered. They were all based on an enhanced Potts model [64] in order to guarantee smoother label changes.

  • •  A simple Potts model is defined by

Ereg(C(x)=C(y))=0,Ereg(C(x)C(y))=1.

Image(11.24)

  • •  This model can be modified to integrate an image contrast constraint. It accounts for the fact that label changes should be less strongly penalized in high-contrast areas of the image:

Ereg(C(x)=C(y))=0,Ereg(C(x)C(y))=(1γ)+γV(x,y,ε).

Image(11.25)

  • γ=0Image yields a pure Potts model, while γ=1Image puts all weight on the contrast term. The contrast component accounts for the fact that label changes need to be penalized in high frequencies areas. V(x,y,ε)Image is the term that integrates the contrast of the image (as defined below).
  • •  Another model was proposed:

Ereg(C(x)=C(y))=0,Ereg(C(x)C(y))=(1γ).(1Pfusion(Cfusion(x))β)+γV(x,y,ε).

Image(11.26)

  • This re-writing of the Potts model regarding the regularization term handles more efficiently the smoothing procedure. Indeed, when C(x)C(y)Image, Ereg(C(x)C(y))Image becomes a function of PfusionImage and V. If Pfusion(Cfusion(x))Image is close to 1, decision fusion gives a high confidence for x to belong to the class CfusionImage. Then, EregImage becomes only dependent on V, which will decide whether the configuration CfusionImage is favored or not. Conversely, if Pfusion(Cfusion(x))Image is close to 0, EregImage is high, and the configuration CfusionImage is prone to be rejected. A Potts model is obtained if γ=0Image and β=0Image.

The term V(x,y,ε)Image for image contrast is the same for all regularization term formulations. It is based on [66,74] and defined by

V(x,y)=1dimi[0,dim]Vi(x,y)ε,

Image(11.27)

Vi(x,y)=exp((Ii(x)Ii(y))22MeanGrad(I))Image and MeanGrad(I)=1Card(N)x,yN;xy(Ii(x)Ii(y))2Image. dim is the dimension of the image I, Ii(x)Image is the intensity for x in band i in the multispectral image I; ε[0,[Image.

11.2.2.2 Optimization

Once the energy E has been defined, it has to be minimized in order to get a labeling configuration C (i.e., a classification map) of the entire image, which corresponds to a minimum of the energy E. This model can be expressed as a graphical model and solved as a min-cut problem [73,75].

The graph-cut algorithm employed here is the quadratic pseudo-Boolean optimization (QPBO)1 [75,76]. QPBO is a classical graph-cut method that builds a probabilistic graph where each pixel is a node. The minimization is computed by finding the minimal cut. Contrary to several standard graph-cut methods for which the pairwise term EregulImage can only consider the two configurations (C(x)=C(y))Image and (C(x)C(y))Image, QPBO enables one to integrate more constraints, defining a pairwise term EregulImage differently according to these four configurations (C(x)=0,C(y)=0)Image, (C(x)=0,C(y)=1)Image, (C(x)=1,C(y)=0)Image and (C(x)=1,C(y)=1)Image.

QPBO performs binary classification. Extension to the multiclass problem is performed using an α-expansion routine [73]. Each label α is visited in turn and a binary labeling is solved between that label and all others, thus flipping the labels of some pixels to α. These expansion steps are iterated until convergence and at the end the algorithm returns a labeling C of the entire image which corresponds to a minimum of the energy E.

11.2.2.3 Parameter tuning

The regularization term E (Eq. (11.21)) is controlled by up to four parameters, depending on the retained formulation for EregImage: λ, γ, β and ε. Each of them is attached to a particular sub-term of E.

  • •  λ[0,[Image is a trade-off parameter between the terms EdataImage and EregImage. The more λ increases the more is the regularization effect. The choice of this parameter will depend on the distribution of the decision fusion map to be optimized.
  • •  γ[0,1]Image is a trade-off parameter between the basic energy model and the rectified model, integrating the contrast measure.
  • •  ε[0,[Image is a parameter controlling the influence of the contrast measure in the energy term.
  • •  β[0,[Image is a trade-off parameter between the smoothing criterion and the importance of CfusionImage in the model. If β is high, the smoothing criterion is predominant and the model approximates a Potts model. On the opposite, if β is low, the model will tend to follow the classification given by CfusionImage.

A simple Potts model can be obtained using the following parameterizations:

  • •  γ=0Image and β+Image,
  • •  γ=1Image and β+Image and ε=0Image.

A greedy way for parameter optimization was presented in [66]. The value λoptImage maximizing the classification result for a Potts model is assumed to be the same value as the one maximizing the fusion model classification result. Hence, λoptImage is first computed using a simple Potts model. Then, with λoptImage and γ=0Image, βoptImage is found. Similarly, using λoptImage and γ=1Image, the value ϵoptImage is computed. Lastly, the trade-off parameter γoptImage maximizing results of the model is chosen in the [0,1]Image interval. The process is iterated, optimizing parameters in the same order at each iteration according to the current parameter set.

Such a strategy can be performed by quantitative cross-validation when sufficient reference data is available. Otherwise, it can be empirically performed by qualitative (visual) evaluation of the results. The set of parameters yielding the nicest and smoothest possible result while following the real object contours can thus be identified. This solution is relevant when regularization also targets improving the visual quality and the interpretability of classification results in operational contexts.

In practice, a set of parameters defined for a classification problem and a decision fusion rule is stable enough to be used in other, similar situations.

11.3 Use Case #1: Hyperspectral and Very High Resolution Multispectral Imagery for Urban Material Discrimination

11.3.1 Introduction

This first use case concerns the joint use of hyperspectral and very high resolution (VHR) multispectral imagery for fine urban land cover classification. Indeed, several applications require fine-grained knowledge about urban land cover and especially urban material maps [77,78]. As no geodatabases contain such information, remote-sensing techniques are urgently required.

Mapping urban environments requires VHR optical images. Indeed, such a spatial resolution is necessary to individualize and precisely delineate urban objects and to consider sharper geometrical details (e.g., [79,80]). However, VHR sensors have generally a poor spectral configuration (usually four bands, blue–green–red–near infrared), limiting their ability to discriminate fine classes [8184], compared to superspectral or hyperspectral (HS) sensors. Unfortunately, the latter generally exhibit a lower spatial resolution. To overcome the weaknesses of both sensors, HS and VHR multispectral (MS) images can be jointly integrated to benefit from their complementary characteristics and subsequently efficiently separate the classes of interest. Thus, the fusion of such sensors should enhance the classification performance at the highest spatial resolution.

It here may be recalled that early fusion (at observation level) i.e., image sharpening [14], could be applied within this context. However, late fusion is more generic and still valid even for images not acquired simultaneously and processed by specific land cover labeling approaches.

11.3.2 Fusion Process

As mentioned earlier, the method is based on three main steps:

  1. 1.  Classification of HS and MS images and generation of the posterior class probabilities: the two images were classified independently. A SVM classifier with a radial basis function (rbf) kernel [72] was used. The posterior class probabilities were retrieved with the Platt technique [85]. The SVM classifier was used as a baseline, since it was shown to provide good results for this kind of data. However, other supervised classifiers could be used (e.g., random forest) as well as specific methods dedicated to HS imagery such as spectral unmixing [86] (endmember abundances would substitute standard class probabilities).
  2. 2.  Fusion at the decision level: A decision fusion was applied to these posterior class probability maps to combine them at the highest resolution. Different fusion rules listed in Sect. 11.2.1 were tested: fuzzy decision rules (Min, Max, compromise, prioritized, accuracy dependent), Bayesian combination (sum and product based rules), evidence theory (Dempster–Shafer rule), margin theory (margin-Max rule).
  3. 3.  Final optimization: This last step consists in performing a global regularization of the classification map obtained at previous step so as to deal with spatial uncertainties between both sources. The graphical model introduced in Sect. 11.2.2 was used: Option 1 (f(t)=log(t)Image) was retained for the data term, while the contrast sensitive regularization term was formulated following Eq. (11.26).
  4. The parameters differ from the Potts configuration, which over-smooths the decision fusion classification: γ=0.5Image, β=1Image, and ε=1Image. For λ, two configurations were tuned, depending on the decision rule: λ=0.1Image for the Min and Dempster–Shafer rules, and λ=10Image for the compromise rule.

11.3.3 Datasets

Experiments were performed over three datasets captured over the cities of Pavia (Italy), and Toulouse (France; see Fig. 11.2). For all datasets, a SVM classifier was trained using 50 samples per class extracted from the images.

Image
Figure 11.2 Datasets and corresponding ground truth with labels. From left to right, Toulouse Center, Pavia University, and Pavia Center.

Concerning Pavia city (Italy), two datasets called “Pavia University” and “Pavia Center” were used. They are free datasets widely used by the hyperspectral community and available on line.2 Initially captured by a ROSIS hyperspectral sensor, these datasets have, respectively, 103 and 102 spectral bands from 430 to 860 nm. Pavia University is a 335 × 605 pixels image, Pavia Center is a 715 × 1096 pixels image, and both have a GSD of 1.3 m. Both scenes are composed of nine land cover classes (Fig. 11.2): Asphalt, Meadows, Gravel, Trees, Painted Metal Sheets, Bare Soil, Bitumen, Self-Blocking Bricks, Shadows for Pavia University and Water, Trees, Meadows, Self-Blocking Bricks, Bare soil, Asphalt, Bitumen roofing, Tiles roofing, Shadows for Pavia Center. MS images were generated for a Pleiades satellite spectral configuration (limited to three bands, red–green–blue), with a GSD of 1.3 m, while HS images were resampled at a lower spatial resolution of 7.8 m and at the full original spectral range (i.e., 103 and 102 bands), so that their pan-sharpening ratio would be the same as for the Toulouse dataset.

The third dataset is called “Toulouse Center” (France). It was captured over the city Toulouse in 2012 by Hyspex sensors [87]. It has 405 spectral bands ranging from 400 to 2500 nm, and an initial GSD of 1.6 m. Its associated land cover is composed of 15 classes (Fig. 11.2): Slate roofing, Asphalt, Cement, Water, Pavements, Bare soil, Gravel roofing, Metal roofing 1, Metal roofing 2, Tiles roofing, Grass, Trees, Railway tracks, Rubber roofing, Shadows. MS and HS images were created for the fusion purpose; a MS image using Pleiades satellite spectral configuration (four bands, red (R)–green (G)–blue (B)–near infrared (NIR)), with a GSD of 1.6 m, and a HS image which is a resampled version of the original image at a spatial resolution of 8 m [88].

11.3.4 Results and Discussion

11.3.4.1 Source comparison

The MS image is characterized by a high spatial resolution and few bands, while the HS one has a low spatial resolution and a hundred(s) of bands. As expected, the SVM classifier applied over these images led to:

  • •  sharp object delineation in the MS image due to its good spatial resolution, but also a lot of artifacts (see Figs. 11.3, 11.4, and 11.5);
Image
Figure 11.3 Pavia University classification results using the best decision fusion rule. From left to right, SVM classification of HS image, SVM classification of MS image, classification fusion by Min rule, global classification regularization.
Image
Figure 11.4 Pavia Center classification results using the best decision fusion rule. From left to right, SVM classification of HS image, SVM classification of MS image, classification fusion by Min rule, global classification regularization.
Image
Figure 11.5 Toulouse Center classification results using the best decision fusion rule. From left to right, SVM classification of HS image, SVM classification of MS image, classification fusion by Min rule, global classification regularization.
  • •  a good discrimination of the different classes in the HS image. However, blurry object delineation is also noticed, due to its low spatial resolution (see Figs. 11.3, 11.4, and 11.5).

The corresponding classification accuracies are listed in Table 11.2: better results are retrieved using the HS image.

11.3.4.2 Decision fusion classification

10 different decision fusion rules were first tested and compared over the three datasets. The quantitative results provided in Table 11.1 lead us to consider the compromise, Bayesian product, margin-Max and Dempster–Shafer rules to be the most efficient rules. The comparison must also take into consideration the visual inspection of the results, as ground truth data remains very limited on these datasets. For Pavia University, four of the best accuracies were reached for Min, compromise, Bayesian product, and Dempster–Shafer rules. In practice, the Min/Compromise rules give the most satisfactory rendering, especially regarding the Self-Blocking Bricks class which is a conflicting class (see Fig. 11.3, magenta color class). The two other rules seem to overestimate this class and to a greater extent consider the HS classification map in the fusion process. This explains their higher accuracy (Table 11.1). The Min rule acts in a cautious way when taking the best of the lowest memberships, while the compromise rule acts depending on the degree of conflict between sources. The Bayesian product rule is a good and simple trade-off if the initial classification maps are not highly conflicting. Otherwise, the result will be degraded by wrong information.

Table 11.1

Classification accuracies (in %) after fusion procedure, 10 fusion rules at decision level are compared. (OA = Overall Accuracy; F-score = mean F-score.)

blank cellPavia UniversityPavia CenterToulouse Center
OAKappaF-scoreOAKappaF-scoreOAKappaF-score
Max92.890.790.698.597.896.075.662.469.8
Min96.194.995.198.698.096.372.258.765.8
Compromise96.195.095.098.898.396.773.660.268.0
Prior194.793.193.498.297.595.371.357.765.5
Prior292.890.790.698.597.896.075.662.469.8
AD95.093.593.599.098.797.775.858.128.3
Sum Bayes95.093.593.298.798.196.575.762.770.5
Prod Bayes96.695.595.699.098.697.274.561.469.8
Margin-Max94.092.292.098.898.396.675.662.569.6
Dempster–Shafer V196.495.495.398.998.597.174.661.569.8

Image

Concerning Pavia Center, all the rules seem accurate (Fig. 11.4, e.g., with the Dempster–Shafer rule), with an overall accuracy higher than 98% (Table 11.1). When visually inspecting the results, all rules gave similar good results excepting Prior 1, showing a result guided by the HS classification map rather than the MS one.

The Toulouse dataset is the largest one, with up to 15 classes. This explains the lower accuracies reached for this dataset. The best results were given by the Max, Prior 2, Bayesian sum and Dempster–Shafer rules. In practice, the Max, prior and sum rules seem to overestimate certain classes; especially tile roofing and vegetation. The best qualitative results are given by the Min, compromise and Dempster–Shafer rules. Despite a satisfactory accuracy, the AD rule exhibits many misclassifications regarding tile roofing (i.e., underestimation), metal roofing 1 (i.e., overestimation), and an erroneous detection of the gravel roofing. This is mainly due to the global accuracy measure which is included in the rule and calculated thanks to a limited ground truth data.

However, due to the very limited amount of reference data, the quantitative accuracies do not necessarily transcribe the real potential of the fusion rules. The best ones from a quantitative and practical qualitative point of view are the compromise, the Bayesian product and the Dempster–Shafer rules.

In this study, VHR-MS images as well as HS ones at lower resolution were generated from VHR HS original images. Thus, working on such synthetic datasets leads to quite optimistic results, but this is sufficient to assess the different fusion rules. Besides, the fusion method is flexible enough for instance to integrate a specific process to deal with shadows in a diachronic acquisition context.

11.3.4.3 Regularization

Global regularization was applied to enhance the classification results and eliminate the artifacts. Table 11.2 presents the optimization results for the best fusion rules per dataset. Indeed, the optimization procedure permits one to enhance further the classification. Quantitatively, it slightly enhances the decision fusion classification (by 1–2%) but offers a better visual rendering with an elimination of the artifacts, a better decimation of the classes borders, and a regularization of the scattered pixels (Figs. 11.3, 11.4 and 11.5). These optimized maps seem better in modeling the real scene. The optimization effect is more visible over Pavia University and Toulouse Center. Concerning Pavia Center, the decision fusion gives already good results and thus, the optimized maps are only slightly improved (Table 11.2). Results obtained over the Pavia datasets are comparable to other studies (e.g., [17]). For Pavia University; the painted metal sheets are better recovered and no mismatches with the surrounding road are noticeable. The proposed method permits one to extract some bitumen buildings that were difficult to differentiate from roads (i.e., upper right and lower right, Fig. 11.5), even if the gravel buildings could still be better refined. For Pavia center, the global rendering is enhanced with a minimization of the classification artifacts.

Table 11.2

Classification accuracy of images HS and MS separately, after decision fusion, and after global regularization. For each dataset, results are provided for the fusion rule achieving the best final results after global regularization.

Image HS classificationImage MS classificationDecision fusionAfter regularization
Pavia University (Min rule)
OA (%)94.768.896.197.0
Kappa (%)93.161.694.996.1
F-score (%)93.472.895.196.3
Pavia Center (Dempster–Shafer V1 rule)
OA (%)98.292.098.999.3
Kappa (%)97.589.098.599.0
F-score (%)95.383.597.198.0
Toulouse Center (Compromise rule)
OA (%)71.269.273.574.6
Kappa (%)57.653.860.261.5
F-score (%)65.455.968.070.9

Image

11.3.5 Conclusion

Several decision fusion methods were tested and compared. Among the fuzzy rules, the Min and Compromise rules are the most efficient. The Max rule often leads to misclassifications due to the fact it pays more confidence to the highest membership. The prioritized rules favor a source rather than the other. Indeed, the reliability is not ensured, as noticed for Prior 1, which gives confidence to the less reliable source. The AD rule accuracy is too dependent on the ground truth reliability: it gives encouraging results for Pavia datasets, but the accuracy was not sufficient for Toulouse dataset. The Bayesian sum and product rules can be interesting in the case of low conflict between sources, since they give acceptable results over Pavia Center and Toulouse. Concerning the proposed margin-based rule, it performs well over Pavia center, and correctly over Toulouse. However, it is not sufficient over Pavia University. Finally, the Dempster–Shafer rule has homogeneous performance over the three datasets, leading always to interesting results.

Even if the decision fusion enables one to increase the classification accuracy compared to the initial maps, the results remain affected by classification artifacts and unclear borders. The final maps are either guided by one of the initial maps or by both: the final result is, therefore, a better version of the initial maps. The optimization procedure gives encouraging results, with clear borders among the different classes, and artifacts elimination.

The method also has the possibility to integrate other decision rules in a fully tunable way. The optimization model is simple and flexible and could be modified according to the used dataset and the spatial resolution of the data sources. In further work one will investigate the explicit use of conflict measures from the fusion step within the regularization framework. At the moment, the optimization parameter selection is rather manual; some automation could be included, and other contrast measures could be tested to improve the accuracy.

11.4 Use Case #2: Urban Footprint Detection

11.4.1 Introduction

This second use case focuses on the detection of urban areas out of SPOT 6 and Sentinel-2 satellite imagery. Mapping urban areas is important to monitor urban sprawl and soil imperviousness, and to predict their further evolution [89,90]. Remote sensing is highly relevant for such a regular and continuous monitoring over time. Supervised classification approaches using satellite imagery have been extensively studied in order to automate the process of land cover (LC) classification [9193,38], but often rely only on one sensor.

Urban and peri-urban areas are complex and heterogeneous landscapes containing impervious areas, trees, grass, bare ground, and water [94,95]. “Artificialized areas” can be defined as irreversibly impervious areas, including buildings and roads, but also small enclosed pervious structures such as gardens, backyards, and green public spaces [96]. There is no unique clear definition of the urban area or footprint. It generally corresponds to a simplification of the artificialized area: road networks outside of built areas are then excluded.

This study aims at detecting such area automatically out of multisource remote-sensing data, trying to follow the real-world city boundary contour as closely as possible. Isolated built-up areas should also be retrieved.

The remote-sensing paradigm has drastically changed in the very last years with the advent of new sensors exhibiting enhanced spectral, spatial, swath or revisit period characteristics, making it possible to acquire datasets at country scale in a limited time. SPOT 6/7 and Sentinel-2 are examples of these new sensors. They will be used in this use case. Indeed, on one hand, they are freely available over the whole French territory thanks to the Théia initiative3 and GEOSUD Equipex.4 On the other hand, they exhibit complementary characteristics:

  • •  VHR sensors, e.g., SPOT 6/7, enable the delineation of small features and the use of texture information. However, they often do not have enough spectral information to distinguish fine Land cover types.
  • •  Sentinel-2 sensors exhibit more spectral bands coupled with an important revisit frequency but with a rather limited geometric resolution (10–20 m).

Last years have witnessed the advent of deep learning methods, and especially Convolutional Neural Networks (CNNs) [9799]. Such approaches have shown their superiority compared to standard classification processes. Indeed, thanks to their end-to-end process, CNNs directly learn optimized (spectral and textural) features (convolution filters) for each classification problem as well as the best way to use them (i.e., the classification model). Besides, implicit features directly take into account the context and thus perform a multiscale analysis of the image. As a counterpart, they require a huge amount of training data.

New studies as regards urban footprint detection have been initiated by the advent of Sentinel data. Sentinel-2 optical images exhibit excellent spectral and temporal characteristics. They are perfectly tailored for land cover production. In [91], Sentinel-2 time series are directly classified by a Random Forest for the yearly extraction of 20-class land cover maps. The method presented in [38] can also be applied to such time series, classifying each date independently before merging the results by a Dempster–Shafer process. It must here be said that, as Sentinel-2 exhibits 10 m GSD for some bands, several studies have tried to use both their radiometric and texture information to detect urban areas [92,100]. However, here, it was decided to focus on Sentinel-2 specificities (enhanced spectral characteristics and time series) and not to exploit the texture information poorer than the one from SPOT 6/7.

To summarize, deep learning approaches are optimal to analyze SPOT 6/7 images. Their spatial resolution makes it possible to try to detect urban elements (e.g. buildings) and to use them to derive urban areas. For Sentinel-2 data, it is more interesting to focus on their specificities (enhanced spectral characteristics and time series). Thus, the fusion of such sources would combine their advantages to reduce spatial and semantic uncertainties. The late fusion scheme proposed in Sect. 11.2 is adapted, considering again that original data have been classified earlier and independently by specific methods. Besides, it can enable the integration of existing land cover maps produced such as Théia's ones based on [91].

11.4.2 Proposed Framework: A Two-Step Urban Footprint Detection

The proposed workflow (Fig. 11.6) consists of three steps.

  1. 1.  The SPOT 6/7 image and the Sentinel-2 time series are individually classified according to a 5-class nomenclature: buildings, roads, water, forest, and other vegetation. A membership to each class is provided per pixel.
  2. 2.  The two classification results are merged at the decision level, aiming at the best detection of building objects.
  3. 3.  These detected buildings are considered as seeds of urban areas: they are used as prior knowledge for being in an urban area, which is then merged with a binary urban/non-urban Sentinel-2 classification within a second fusion still at the decision level.

Image
Figure 11.6 Proposed framework for urban footprint detection.

Both fusions (for the 5-class and the binary classifications) are performed following the scheme presented in Sect. 11.2.

11.4.2.1 Initial classifications

Both sources are classified individually. The Sentinel-2 time series is labeled using a random forest (RF) classifier trained from 50,000 samples per class. RF is used to have a framework similar to the one from [91], of which the LC maps are intended to be available at a national scale. The SPOT 6/7 image is classified using a deep Convolutional Neural Network (CNN) [98], because of its high ability to efficiently exploit context and texture information from VHR image. The training process of the CNN used 10,000 samples per class (from which 10% were kept for cross-validation). Both classifiers produce membership probabilities for the five classes.

11.4.2.2 First regularization

Both fusions (i.e., for the 5-class nomenclature and the urban/non-urban one) are performed according to Sect. 11.2, involving a per-pixel decision fusion followed by a spatial regularization.

  1. 1.  Per pixel fusion: several rules are compared. In addition to the rules tested in Sect. 11.3, supervised ones, consisting in learning from the ground truth how to best merge both sources, are investigated for the 5-class problem. A classifier is trained to label feature vectors corresponding to the concatenation of class membership measures from both sources. Two classifiers are compared for fusion: RF and Support Vector Machine (SVM):
    • •  RF, with 10,000 training samples per class and 100 trees;
    • •  SVM with a linear kernel, with 10,000 training samples per class;
    • •  SVM with a radial basis kernel function (rbf) with a lower number of 500 training samples per class due to practical reasons of higher calculation times. Parameters of the SVM model were optimized using cross-validation.
  2. 2.  Regularization: In order to smooth the fusion result, still containing noisy labels, and make it follow real-world image contours more accurately, a global regularization is performed. The graphical model introduced in Sect. 11.2.2 is used: Option 2 (f(t)=1tImage) is retained for the data term, while the contrast-sensitive regularization term is formulated as in Eq. (11.25).
  3. For the 5-class fusion, the contrast term is calculated on the SPOT 6/7 image blurred by a Gaussian filter of standard deviation 2 to obtain smoother contours. For the binary classification fusion, it is calculated on the Sentinel-2 image, also blurred by a Gaussian filter of standard deviation 2.

11.4.2.3 Binary classification and fusion

The “urban/non-urban” fusion requires one to derive binary class probabilities from results of the previous steps. Buildings from the 5-class fusion result are considered as seeds of urban areas and used to define a prior to be in an urban area (see Fig. 11.7): a linearly decreasing function assigning a probability is applied surrounding all buildings. This probability to be in an urban area starts from 1 and reaches a value of 0 after a distance of 100 m to a building.

Image
Figure 11.7 From left to right, buildings detected from the 5-class fusion scheme (red), distance to these buildings (black: low → white: high), prior probability to be in an urban area (black: low → white: high).

Class posterior probabilities from the Sentinel-2 image RF 5-class classification are converted to a binary classification with an urban area class (u), and a non-urban area class (¬u): P(u)=P(b)+P(r)Image and P(¬u)=1P(u)Image, with P(b)Image and P(r)Image the class probabilities for buildings and roads, respectively.

Then, the prior probability map to be in an urban area according to previous building detection is merged together with binary class probabilities from the Sentinel-2 RF classifier.

11.4.3 Data

Sentinel-2 offers both a spectral configuration improved over the usual multispectral sensors and has an ability to acquire time series (5 day revisit). In further experiments, only the 10 spectral bands having a 10 or 20 m GSD are used. All are upsampled to 10 m GSD. Six dates (namely, August 15th 2016, January 25th 2017, March 16th 2017, April 12th 2017 and May 25th 2017) are kept. They were retained both because of their low cloud cover and in order to have different seasons/appearances of land cover classes.

SPOT 6/7 includes four spectral bands (red–green–blue–near infrared) pan-sharpened to 1.5 m. A single date (April 16th) is used.

A ground truth of five classes is generated (Fig. 11.8) from available national reference geo databases (training and evaluation: the number of pixels used as training samples is very small compared to the total amount of samples). Buildings, roads and water areas are extracted from IGN's BD Topo®5) topographic database, forests from IGN's BD Forêt®6 database and crops from the Référentiel Parcellaire Graphique7 of the French Ministry of Agriculture.

Image
Figure 11.8 Left: SPOT 6/7 image. Right: the associated ground truth.

Experiments are performed over a test area spanning 648 km2 in Finistère, North Western France. This study area contains both urban, peri-urban, rural, and natural landscapes.

11.4.4 Results

11.4.4.1 Five-class classifications

For the sake of visibility, the results are shown over a restricted area of 0.64 km2 (Figs. 11.10, 11.9 and 11.11). There, the original classifications exhibit several errors: the impact of data fusion can be clearly demonstrated. Quantitative evaluation over image is for five tiles of 3000 × 3000 m size, totaling an area of 45 km2, distributed over the entire 648 km2 study zone. Each classification is compared to the class labels of the ground truth (Fig. 11.8). Evaluation measures for individual classifications, fusion and regularization, all using five classes, are shown in Table 11.3.

Image
Figure 11.9 The two initial classifications. (A) SPOT 6 CNN classification. (B) S2 RF classification. The SPOT 6/7 image is superimposed to the classifications.

Table 11.3

Accuracy scores (in %) for the first step. OA = overall accuracy, AA = average accuracy, Fm = Mean F-score, Fb = F-score for buildings. Fusion rules are described in Sect. 11.2.1.

MethodKappaOAAAFmFb
Original classifications
Sentinel-272.083.781.564.652.2
SPOT 673.385.270.863.462.5
Per pixel fusion (before regularization)
Fuzzy Min79.188.484.776.773.8
Fuzzy Max77.387.484.273.270.5
Compromise78.888.284.576.272.5
Prior 173.385.270.863.462.5
Prior 273.385.270.863.462.5
Bayesian Sum78.387.984.874.871.9
Bayesian Product79.188.585.076.773.8
Dempster–Shafer V179.188.485.176.573.7
Dempster–Shafer V279.088.485.176.473.7
Margin Maximum77.787.684.573.470.2
Margin Bayesian Product78.488.084.775.172.0
Margin Bayesian Sum78.087.784.674.071.0
RF81.890.090.181.281.6
SVM linear80.589.388.677.980.4
SVM rbf81.089.689.179.183.4
Fusion and regularization
Fuzzy Min75.185.882.373.973.8
SVM rbf81.489.889.379.883.9

Image

Initial classifications. Original classifications confirm the initial observation that the SPOT 6/7 CNN result tends to preserve small objects. However, some confusions between (bare soil) crops and built-up areas occur. The Sentinel-2 RF classification overall behaves better, but it mixes buildings and roads due to its coarse spatial resolution. The results from the individual classifications on the SPOT 6/7 and Sentinel-2 images are shown on Fig. 11.9. This area was selected to illustrate problems of both classifications and improvements obtained with fusion and regularization (better results are observed in all other areas). Confusion between water and built areas can be noticed. This phenomenon is caused by the fact that the water area database used to generate training samples included ponds at the bottom of careers appearing white and very similar to built-up areas.

Per pixel fusion. This first fusion is performed at the SPOT 6/7 resolution (1.5 m) as it aims at achieving the most fine detection of building objects.

Among the classic fusion rules proposed by [69], the Min fuzzy rule produces the best results, following the objects' borders most precisely while producing the least class confusions (Fig. 11.10 and Table 11.3). Considering Fig. 11.10, all rules managed to eliminate the wrongly classified building patch (top-left), preferring the Sentinel-2 classification over the SPOT 6/7 one. The Min fusion rule follows the field contours a bit less smoothly than the other ones. The industrial area (at the center of the displayed area) is still confused with water in both results (such a confusion can be explained by the presence of very white water area training samples).

Image
Figure 11.10 Fusion results before regularization. (A) Min rule. (B) SVM rbf with buffer class.

The RF/SVM supervised fusions initially tend to produce patches of buildings rather than separate buildings due to missing training data (and thus constraints) between buildings and roads (Fig. 11.8). Indeed the ground truth contains gaps between buildings and so the classifiers tend to aggregate individual buildings as there are no training data available around building to prevent the classifier from doing it. Adding an additional sixth buffer class “around buildings” helps to refine the contours and to obtain a higher level of detail than just using the Min or Bayes rules. It preserves more details of individual buildings, but can cause some confusion between buildings and this “buffer” class in the center of very wide buildings in industrial areas (see Fig. 11.10). It can also erase small patches of buildings. However, this remains quite an exception and is not a real problem for the final goal of detection of urban areas. The same observations are made on all areas, even those on which the supervised fusion model is not trained. Thus, the result of the supervised fusion using a rbf SVM classifier with buffer class is used in subsequent steps.

Regularization. The parameters are as follows: λ=10Image, γ=0.7Image, ε=50Image. They were selected by cross-validation, optimizing visual qualities of the regularization result. The contrast term is calculated from the SPOT 6/7 image: it especially aims at obtaining the finest detection of building objects. The result is shown in Fig. 11.11. The regularization performs well, smoothing out small noisy patches and yielding a visually more appealing result. However, accuracy measures of the regularization remain similar to the ones of the fusion (cf. Table 11.3).

Image
Figure 11.11 SVM rbf fusion after regularization.

11.4.4.2 Urban footprint extraction

Raw urban area maps directly derived using the binary class probabilities are shown in Fig. 11.12. Again, they are first merged using a per pixel fusion rule. The supervised learning based fusion approaches cannot be used here since no training data for urban/non-urban areas was available. Several rules yield visually similar results; the Min fuzzy fusion rule is eventually chosen.

Image
Figure 11.12 Urban area: input dilated detected building mask (top left), Sentinel-2 classification (top right) and fusion result before (bottom left) and after (bottom right) regularization.

Global regularization is then performed with the same (λ,γ,ε)Image parameters as for the 5-class regularization, but the image contrast term is now calculated from the Sentinel-2 image. The result obtained is shown in Fig. 11.12. Compared to Sentinel-2 detection, roads and small misclassifications have been removed. Compared to the dilated building mask from previous step, the objects' borders fit better to the true urban area borders.

As no true reference data for urban areas is available, strict evaluation is not possible. However, the detected artificialized area can be compared to binary ground truth maps derived from the following other related databases:

  • •  Dilated BD Topo® ground truth: Buildings of this database used as ground truth for the 5-class classification were dilated by 20 m.
  • •  OpenStreetMap (OSM) data which uses refined Corine land cover data;
  • •  The CESBIO land cover OSO product8 [91], gathering the classes “urban diffuse”, “dense urban” and “industrial and commercial zones”.

Strict quantitative evaluation is not possible, but such data can provide some hints: accuracies are provided in Table 11.4. Besides, a visual comparison with the different sources is shown in Fig. 11.13.

Table 11.4

Accuracy measures: F-score for the buildings class (F-scoreu), Kappa, overall accuracy (OA) and intersection over union for the buildings class (IoUu).

ClassificationGround TruthF-Scoreu [%]Kappa [%]OA [%]IoUu [%]
Binary dilated buildingsBD Topo®86.783.294.576.5
OSO56.850.186.839.7
OSM58.351.987.341.2
Binary Sentinel-2BD Topo®65.256.485.948.3
OSO63.958.389.346.9
OSM52.645.586.035.7
Fusion MinBD Topo®79.875.292.466.3
OSO66.962.291.250.3
OSM62.957.690.245.9
RegularizationBD Topo®79.775.192.566.2
OSO67.462.891.550.9
OSM64.459.490.747.4

Image

Image
Figure 11.13 Binary evaluation for our urban footprint. (A) Comparison to dilated BD Topo®. (B) Comparison to OSM. (C) Comparison to OSO.

The following aspects can be underlined:

  • •  The derived OSM ground truth is rather simplified and contains few, rough urbanized patches.
  • •  The derived OSO ground truth is partial, containing gaps of artificial areas within urban patches. This is due to the resolution and methodology used for the classification and the selection of classes considered as urban.
  • •  The dilated BD Topo® ground truth is the most truthful, as expected. However, it is also approximate and exceeds the urban area due to the dilatation used to generate it.

The dilated BD Topo® ground truth generally yields the highest agreement with classifications in terms of accuracy measures. The fusion and regularization steps improve the accuracy measures over the individual input classifications with the exception of the binary regularization input, which can be explained by the fact that both have been produced by dilatation.

11.4.5 Conclusion

A framework was proposed to detect urban areas. Sentinel-2 and SPOT 6/7 data were classified individually in five topographic classes. Decision-level fusion and regularization then enable one to obtain a result preserving high geometric details while reducing misclassifications. Results presented in this study were presented on one dataset, but the processing chain was applied to another region (Gironde department in South Western France) exhibiting a different (climatic and topographic) landscape. It led to similar conclusions, showing high generalization potential despite varying behaviors of the initial classifiers.

Traditional fusion methods enable artifact removal but can keep confusion between buildings and roads. In contrast, supervised fusion methods enable an enhanced detection of buildings at the price of a new artificial “building buffer” class. Such an introduction comes at the cost of introducing a certain additional amount of semantic uncertainty. However, new class confusions mostly occur between the vegetation and forest classes, which is not a problem here. Buildings could thus be extracted with a higher amount of semantic details.

Second, the urban area can be approximated by fusion and regularization merging the urban / non-urban class membership probabilities of the Sentinel-2 classification and a urban prior measure derived from previously detected buildings. A simple function was used to derive a probability map of urbanized area. Improvements could be made having a more advanced urban membership prior measure, decreasing faster for uncertain buildings.

Furthermore, the promising results of supervised fusion would justify the use of CNNs for fusion. Although such a strategy looks also promising for the identification of artificialized areas, it would have to face missing ground truth data and heterogeneous, user- and application-dependent definitions of such areas.

11.5 Final Outlook and Perspectives

A fusion framework was proposed to merge very high spatial resolution, monodate, multispectral images with time series of images exhibiting an enhanced spectral configuration but with a lower spatial resolution. It mostly relies on existing state-of-the-art methods, combined in order to cope both with semantic and with spatial uncertainties. Besides, it is flexible enough to integrate other fusion rules. It is a late fusion strategy, permitting one to initially process each input data source independently through specific methods, and even to use already calculated results from existing operational land cover classification services. Besides, in this case, it can be used without any ground truth information, contrary to intermediate level fusion methods.

The proposed framework was applied to two different use cases. For each of them, classification results were improved. Several fusion rules were tested. Good results were reached for several of them, but the best results were obtained by the “Minimum” fuzzy rule, by Dempster–Shafer rules, and, when sufficient training data is available, by supervised learning-based methods.

At present, the proposed fusion framework has been applied to only two sources, but it could easily be extended to more sources. Besides, for the moment, it has been tested only for optical images exhibiting different characteristics, but it is generic enough to be applied to input data of different modalities. It has also been used only to merge class membership probability maps from classification results, but it would be interesting to apply it to other kinds of results. Especially for the lower spatial resolution source, it would be relevant to use class abundances obtained from an unmixing process (applied to the hyperspectral case or time series of data).

References

[1] L. Gomez-Chova, D. Tuia, G. Moser, G. Camps-Valls, Multimodal classification of remote sensing images: a review and future directions, Proceedings of the IEEE 2015;103(9):1560–1584.

[2] S. Chavez, J. Stuart, C. Sides, J. Anderson, Comparison of three different methods to merge multiresolution, multispectral data Landsat TM and SPOT panchromatic, Photogrammetric Engineering and Remote Sensing 1991;57(3):259–303.

[3] P. Gamba, Image and data fusion in remote sensing of urban areas: status issues and research trends, International Journal of Image and Data Fusion 2014;5(1):2–12.

[4] N. Joshi, M. Baumann, A. Ehammer, B. Waske, A review of the application of optical and radar remote sensing data fusion to land use mapping and monitoring, Remote Sensing 2016;8(1):70.

[5] M.-T. Pham, G. Mercier, O. Regniers, J. Michel, Texture retrieval from VHR optical remote sensed images using the local extrema descriptor with application to vineyard parcel detection, Remote Sensing 2016;8(5):368.

[6] C. Pohl, J. Van Genderen, Multi-sensor image fusion in remote sensing: concepts methods and applications, International Journal of Remote Sensing 1998;19(5):823–854.

[7] M. Schmitt, X. Zhu, Data fusion and remote sensing: an ever-growing relationship, IEEE Geoscience and Remote Sensing Magazine 2016;4(4):6–23.

[8] J.A. Benediktsson, G. Cavallaro, N. Falco, I. Hedhli, V.A. Krylov, G. Moser, S.B. Serpico, J. Zerubia, Remote sensing data fusion: Markov models and mathematical morphology for multisensor, multiresolution, and multiscale image classification. Springer International Publishing; 2018:277–323.

[9] W. Liao, J. Chanussot, W. Philips, Remote sensing data fusion: guided filter-based hyperspectral pansharpening and graph-based feature-level fusion. Springer International Publishing; 2018:243–275.

[10] H. Ghassemian, A review of remote sensing image fusion methods, Information Fusion 2016;32:75–89.

[11] W. Carper, T. Lillesand, R. Kiefer, The use of intensity-hue-saturation transform for merging SPOT panchromatic and multispectral image data, Photogrammetric Engineering and Remote Sensing 1990;56(4):459–467.

[12] S. Yang, M. Wang, L. Jiao, Fusion of multispectral and panchromatic images based on support value transform and adaptive principal component analysis, Information Fusion 2012;13(3):177–184.

[13] R. Gharbia, A. Azar, A. El Baz, A. Hassanien, Image fusion techniques in remote sensing, arXiv preprint arXiv:1403.5473.

[14] L. Loncan, L. Almeida, J. Bioucas-Dias, W. Liao, X. Briottet, J. Chanussot, N. Dobigeon, S. Fabre, W. Liao, G. Licciardi, M. Simoes, J.-Y. Tourneret, M. Veganzones, G. Vivone, Q. Wei, N. Yokoya, Hyperspectral pansharpening: a review, IEEE Geoscience and Remote Sensing Magazine 2015;3(3):27–46.

[15] Y. Yuan, X. Zheng, X. Lu, Hyperspectral image superresolution by transfer learning, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2017;10(5):1963–1974.

[16] A. Gressin, C. Mallet, M. Paget, C. Barbanson, P.L. Frison, J.P. Rudant, N. Paparoditis, N. Vincent, Un-sensored very high resolution land-cover mapping, Proc. of the IEEE International Geoscience and Remote Sensing Symposium. IGARSS. 2015:2939–2942.

[17] M. Fauvel, J. Benediktsson, J. Sveinsson, J. Chanussot, Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles, Proc. of the IEEE International Geoscience and Remote Sensing Symposium. IGARSS. 2007:4834–4837.

[18] J.D. Wegner, R. Hänsch, A. Thiele, U. Soergel, Building detection from one orthophoto and high-resolution InSAR data using conditional random fields, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2011;4:83–91.

[19] Y. Ban, A. Jacob, Object-based fusion of multitemporal multiangle ENVISAT ASAR and HJ-1B multispectral data for urban land-cover mapping, IEEE Transactions on Geoscience and Remote Sensing 2013;51(4):1998–2006.

[20] C. Dechesne, C. Mallet, A. Le Bris, V. Gouet-Brunet, Semantic segmentation of forest stands of pure specie as a global optimisation problem, ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences 2017;4(1/W1):141–148.

[21] C. Dechesne, C. Mallet, A. Le Bris, V. Gouet-Brunet, Semantic segmentation of forest stands of pure species combining airborne lidar data and very high resolution multispectral imagery, ISPRS Journal of Photogrammetry and Remote Sensing 2017;126:129–145.

[22] J. Xia, Z. Ming, A. Iwasaki, Multiple sources data fusion via deep forest, Proc. of the IEEE International Geoscience and Remote Sensing Symposium. IGARSS. 2018:1722–1725.

[23] N. Audebert, B. Le Saux, S. Lefèvre, Fusion of heterogeneous data in convolutional networks for urban semantic labeling, Proc. of the Joint Urban Remote Sensing Event. JURSE. 2017.

[24] X.X. Zhu, D. Tuia, L. Mou, G. Xia, L. Zhang, F. Xu, F. Fraundorfer, Deep learning in remote sensing: a comprehensive review and list of resources, IEEE Geoscience and Remote Sensing Magazine 2017;5(4):8–36.

[25] C. Berger, M. Voltersen, R. Eckardt, J. Eberle, T. Heyer, N. Salepci, S. Hese, C. Schmullius, J. Tao, S. Auer, R. Bamler, K. Ewald, M. Gartley, J. Jacobson, A. Buswell, Q. Du, F. Pacifici, Multi-modal and multi-temporal data fusion: outcome of the 2012 GRSS Data Fusion Contest, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2013;6(3):1324–1340.

[26] W. Liao, X. Huang, F.V. Coillie, S. Gautama, A. Pižurica, W. Philips, H. Liu, T. Zhu, M. Shimoni, G. Moser, D. Tuia, Processing of multiresolution thermal hyperspectral and digital color data: outcome of the 2014 IEEE GRSS Data Fusion Contest, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2015;8(6):2984–2996.

[27] N. Yokoya, P. Ghamisi, J. Xia, S. Sukhanov, R. Heremans, I. Tankoyeu, B. Bechtel, B. Le Saux, G. Moser, D. Tuia, Open data for global multimodal land use classification: outcome of the 2017 IEEE GRSS Data Fusion Contest, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2018;11(5):1363–1377.

[28] D. Dubois, H. Prade, Combination of fuzzy information in the framework of possibility theory, M.A. Abidi, R.C. Gonzalez, eds. Data Fusion in Robotics and Machine Intelligence. New York: Academic Press; 1992:481–505.

[29] D. Dubois, H. Prade, Possibility theory and data fusion in poorly informed environments, Control Engineering Practice 1994;2(5):811–823.

[30] L. Zadeh, Fuzzy sets, Information and Control 1965;8(3):338–353.

[31] N.R. Pal, J.C. Bezdek, Measuring fuzzy uncertainty, IEEE Transactions on Fuzzy Systems 1994;2(2):107–118.

[32] M. Fauvel, J. Chanussot, J. Benediktsson, Decision fusion for the classification of urban remote sensing images, IEEE Transactions on Geoscience and Remote Sensing 2006;44(10):2828–2838.

[33] G. Shafer, A Mathematical Theory of Evidence. Princeton University Press; 1976.

[34] V. Poulain, J. Inglada, M. Spigai, J.Y. Tourneret, P. Marthon, High-resolution optical and SAR image fusion for building database updating, IEEE Transactions on Geoscience and Remote Sensing 2011;49(8):2900–2910.

[35] F. Rottensteiner, J. Trinder, S. Clode, K. Kubik, Building detection by fusion of airborne laser scanner data and multi-spectral images: performance evaluation and sensitivity analysis, ISPRS Journal of Photogrammetry and Remote Sensing 2007;62(2):135–149.

[36] J. Tian, J. Dezert, Fusion of multispectral imagery and dsms for building change detection using belief functions and reliabilities, International Journal of Image and Data Fusion 2019;10(1):1–27.

[37] R.O. Chavez-Garcia, O. Aycard, Multiple sensor fusion and classification for moving object detection and tracking, IEEE Transactions on Intelligent Transportation Systems 2016;17(2):525–534.

[38] A. Lefebvre, C. Sannier, T. Corpetti, Monitoring urban areas with Sentinel-2A data: application to the update of the Copernicus high resolution layer imperviousness degree, Remote Sensing 2016;8(7):606.

[39] S. Le Hégarat-Mascle, R. Seltz, Automatic change detection by evidential fusion of change indices, Remote Sensing of Environment 2004;91(3):390–404.

[40] S. Le Hégarat-Mascle, R. Seltz, L. Hubert-Moy, S. Corgne, N. Stach, Performance of change detection using remotely sensed data and evidential fusion: comparison of three cases of application, International Journal of Remote Sensing 2006;27(16):3515–3532.

[41] S.I. Oh, H.B. Kang, Object detection and classification by decision-level fusion for intelligent vehicle systems, Sensors 2017;17(1):207.

[42] J. Dezert, Fondations pour une nouvelle théorie du raisonnement plausible et paradoxal: application à‘la fusion d'informations incertaines et conflictuelles. [Technical Report 1/06769 DTIM, Tech. rep.] ONERA; 2003.

[43] A. Martin, C. Osswald, Une nouvelle règle de combinaison répartissant le conflit – applications en imagerie sonar et classification radar, Traitement du Signal 2007;24:71–82.

[44] A. Martin, Modélisation et gestion du conflit dans la théorie des fonctions de croyance. [Habilitation à Diriger des Recherches de l'Université de Bretagne Occidentale, France] 2009.

[45] A. Appriou, Probabilités et incertitude en fusion de données multi-senseurs, Revue Scientifique Et Technique de la Défense 1991;11:27–40.

[46] Y. Cao, H. Lee, H. Kwon, Enhanced object detection via fusion with prior beliefs from image classification, arXiv preprint arXiv:1610.06907.

[47] H. Lee, H. Kwon, R.M. Robinson, W.D. Nothwang, A.M. Marathe, Dynamic belief fusion for object detection, Proc. of the 2016 IEEE Winter Conference on Applications of Computer Vision. WACV. 2016.

[48] Z. Tu, X. Bai, Auto-context and its application to high-level vision tasks and 3D brain image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 2010;32(10):1744–1757.

[49] C. Wendl, A. Le Bris, N. Chehata, A. Puissant, T. Postadjian, Decision fusion of SPOT-6 and multitemporal Sentinel-2 images for urban area detection, Proc. of the IEEE International Geoscience and Remote Sensing Symposium. IGARSS. 2018:1734–1737.

[50] U. Knauer, U. Seiffert, A comparison of late fusion methods for object detection, Proc. of IEEE International Conference on Image Processing. ICIP. 2013:3297–3301.

[51] B. Waske, J. Benediktsson, Fusion of support vector machines for classification of multisensor data, IEEE Transactions on Geoscience and Remote Sensing 2007;45:3858–3866.

[52] X. Ceamanos, B. Waske, J. Benediktsson, J. Chanussot, M. Fauvel, J. Sveinsson, A classifier ensemble based on fusion of support vector machines for classifying hyperspectral data, International Journal of Image and Data Fusion 2010;1(4):293–307.

[53] J. Benediktsson, I. Kanellopoulos, Decision fusion methods in classification of multisource and hyperdimensional data, IEEE Transactions on Geoscience and Remote Sensing 1999;37(3):1367–1377.

[54] F. Tupin, I. Bloch, H. Maitre, A first step toward automatic interpretation of SAR images using evidential fusion of several structure detectors, IEEE Transactions on Geoscience and Remote Sensing 1999;37(3):1327–1343.

[55] B. Jeon, Decision fusion approach for multitemporal classification, IEEE Transactions on Geoscience and Remote Sensing 1999;37(3):1227–1233.

[56] A. Mohammad-Djafari, A bayesian approach for data and image fusion, AIP Conference Proceedings 2003;659:386–408.

[57] A. Le Bris, D. Boldo, Extraction of land cover themes from aerial ortho-images in mountainous areas using external information, Photogrammetric Record 2008;23(124):387–404.

[58] A. Le Bris, N. Chehata, Change detection in a topographic building database using submetric satellite images, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 2011;38(3/W22):25–30.

[59] M. Aitkenhead, I. Aalders, Automating land cover mapping of Scotland using expert system and knowledge integration methods, Remote Sensing of Environment 2011;115(5):1285–1295.

[60] X. Huang, L. Zhang, A multilevel decision fusion approach for urban mapping using very high-resolution multi-hyper-spectral imagery, International Journal of Remote Sensing 2012;33(11):3354–3372.

[61] S. Paisitkriangkrai, J. Sherrah, P. Janney, A. Van-Den Hengel, Effective semantic pixel labelling with convolutional networks and conditional random fields, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition Workshop. CVPR, Boston, USA. 2015:36–43.

[62] P. Perona, J. Malik, Scale-space and edge detection using anisotropic diffusion, IEEE Transactions on Pattern Analysis and Machine Intelligence 1990;12(7):629–639.

[63] P. Gong, P. Howarth, Performance analyses of probabilistic relaxation methods for land-cover classification, Remote Sensing of Environment 1989;30(1):33–42.

[64] K. Schindler, An overview and comparison of smooth labeling methods for land-cover classification, IEEE Transactions on Geoscience and Remote Sensing 2012;50(11):4534–4545.

[65] G. Moser, S. Serpico, J. Benediktsson, Land-cover mapping by Markov modeling of spatial contextual information in very-high-resolution remote sensing images, Proceedings of the IEEE 2013;101(3):631–651.

[66] A. Hervieu, A. Le Bris, C. Mallet, Fusion of hyperspectral and vhr multispectral image classifications in urban areas, ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences 2016;III-3:457–464.

[67] V. Andrejchenko, R. Heylen, W. Liao, W. Philips, P. Sheunders, MRF-based decision fusion for hyperspectral image classification, Proc. of the IEEE International Geoscience and Remote Sensing Symposium. IGARSS, Valencia, Spain. 2018:8070–8073.

[68] D. Dubois, H. Prade, Possibility theory and data fusion in poorly informed environment, Control Engineering Practice 1997;2:811–823.

[69] W. Ouerghemmi, A. Le Bris, N. Chehata, C. Mallet, A two-step decision fusion strategy: application to hyperspectral and multispectral images for urban classification, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 2017;XLII-1-W1:167–174.

[70] I. Bloch, Information combination operators for data fusion: a comparative review, IEEE Transactions on Systems, Man and Cybernetics 1996;26(1):52–67.

[71] L. Breiman, Random forests, Machine Learning 2001;45(1):5–32.

[72] V.N. Vapnik, An overview of statistical learning theory, IEEE Transactions on Neural Networks 1999;10:988–999.

[73] V. Kolmogorov, R. Zabih, What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence 2004;26(2):65–81.

[74] C. Rother, V. Kolmogorov, A. Blake, “GrabCut”: interactive foreground extraction using iterated graph cuts, ACM Transactions on Graphics 2004;23(3):309–314.

[75] V. Kolmogorov, C. Rother, Minimizing non-submodular functions with graph cuts – a review, IEEE Transactions on Pattern Analysis and Machine Intelligence 2007;29(7):1274–1279.

[76] C. Rother, V. Kolmogorov, V. Lempitsky, M. Szummer, Optimizing binary MRFs via extended roof duality, Conference on Computer Vision and Pattern Recognition. CVPR. 2007.

[77] W. Heldens, U. Heiden, T. Esch, E. Stein, A. Muller, Can the future EnMAP mission contribute to urban applications? A literature survey, Remote Sensing 2011;3:1817–1846.

[78] H. Shafri, E. Taherzadeh, S. Mansor, R. Ashurov, Hyperspectral remote sensing of urban areas: an overview of techniques and applications, Research Journal of Applied Sciences, Engineering and Technology 2012;4(11):1557–1565.

[79] M. Herold, X. Liu, K. Clarke, Spatial metrics and image texture for mapping urban land-use, Photogrammetric Engineering and Remote Sensing 2003;69(9):991–1001.

[80] C. Cleve, M. Kelly, F. Kearns, M. Moritz, Classification of the wildland–urban interface: a comparison of pixel- and object-based classifications using high-resolution aerial photography, Computers, Environment and Urban Systems 2008;32(4):317–326.

[81] N. Thomas, C. Hendrix, R. Congalton, A comparison of urban mapping methods using high-resolution digital imagery, Photogrammetric Engineering and Remote Sensing 2003;69(9):963–972.

[82] A.P. Carleer, O. Debeir, E. Wolff, Assessment of very high spatial resolution satellite image segmentations, Photogrammetric Engineering and Remote Sensing 2005;71(11):1285–1294.

[83] Q. Yu, P. Gong, N. Clinton, G. Biging, M. Kelly, D. Schirokauer, Object-based detailed vegetation classification with airborne high resolution remote sensing imagery, Photogrammetric Engineering and Remote Sensing 2006;72(7):799–811.

[84] A. Le Bris, N. Chehata, X. Briottet, N. Paparoditis, Spectral band selection for urban material classification using hyperspectral libraries, ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences 2016;3(7):33–40.

[85] J. Platt, Probabilistic outputs for support vector machines and comparison to regularized likelihood methods, Advances in Large Margin Classifiers. MIT Press; 2000:61–74.

[86] J. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, J. Chanussot, Hyperspectral unmixing overview: geometrical, statistical, and sparse regression-based approaches, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2012;5(2):354–379.

[87] K. Adeline, A. Le Bris, F. Coubard, X. Briottet, N. Paparoditis, F. Viallefont, N. Rivière, J.-P. Papelard, P. Deliot, J. Duffaut, S. Airault, N. David, G. Maillet, L. Poutier, P.-Y. Foucher, V. Achard, J.-P. Souchon, C. Thom, Description de la campagne aéroportée UMBRA: étude de l'impact anthropique sur les écosystèmes urbains et naturels avec des images THR multispectrales et hyperspectrales, Revue Française de Photogrammétrie et de Télédétection 2013;202:79–92.

[88] S. Michel, M.-J. Lefèvre-Fonollosa, S. Hosford, HYPXIM – a hyperspectral satellite defined for science, security and defence users, Proc. of the 3rd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing. WHISPERS, Lisbon, Portugal. 2011.

[89] C. Kurtz, N. Passat, P. Gançarski, A. Puissant, Extraction of complex patterns from multiresolution remote sensing images: a hierarchical top-down methodology, Pattern Recognition 2012;45(2):685–706.

[90] C. Wemmert, A. Puissant, G. Forestier, P. Gancarski, Multiresolution remote sensing image clustering, IEEE Geoscience and Remote Sensing Letters 2009;6(3):533–537.

[91] J. Inglada, A. Vincent, M. Arias, B. Tardy, D. Morin, I. Rodes, Operational high resolution land cover map production at the country scale using satellite image time series, Remote Sensing 2017;9(1):95.

[92] M. Pesaresi, C. Corbane, A. Julea, V. Florczyk, A. Syrris, P. Soille, Assessment of the added value of Sentinel-2 for detecting built-up areas, Remote Sensing 2016;8(4):299.

[93] M. Li, A. Stein, W. Bijker, Q. Zhan, Urban land use extraction from Very High Resolution remote sensing imagery using a Bayesian network, ISPRS Journal of Photogrammetry and Remote Sensing 2016;122:192–205.

[94] M.K. Ridd, Exploring a V-I-S (vegetation-impervious surface-soil) model for urban ecosystem analysis through remote sensing: comparative anatomy for cities, International Journal of Remote Sensing 1995;16:2165–2185.

[95] Q. Weng, Remote sensing of impervious surfaces in the urban areas: requirements, methods, and trends, Remote Sensing of Environment 2012;117:34–49.

[96] A. Puissant, S. Rougier, A. Stumpf, Object-oriented mapping of urban trees using random forest classifiers, International Journal of Applied Earth Observation and Geoinformation 2014;26:235–245.

[97] E. Maggiori, Y. Tarabalka, G. Charpiat, P. Alliez, Convolutional neural networks for large-scale remote-sensing image classification, IEEE Transactions on Geoscience and Remote Sensing 2017;55(2):645–657.

[98] T. Postadjian, A. Le Bris, H. Sahbi, C. Mallet, Investigating the potential of deep neural networks for large-scale classification of very high resolution satellite images, ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences 2017;IV-1-W1:183–190.

[99] M. Volpi, D. Tuia, Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks, IEEE Transactions on Geoscience and Remote Sensing 2017;55(2):881–893.

[100] F. Sabo, C. Corbane, S. Ferri, Inter-sensor comparison of built-up derived from Landsat, Sentinel-1, Sentinel-2 and SPOT5/SPOT6 over selected cities. [Tech. rep.] JRC; 2017.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset