Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 11

Decision Fusion of Remote-Sensing Data for Land Cover Classification

Arnaud Le Bris^⁎; Nesrine Chehata^⁎^,^†; Walid Ouerghemmi^⁎^,^¶; Cyril Wendl^⁎^,^‡; Tristan Postadjian^⁎; Anne Puissant^§; Clément Mallet^⁎ ^⁎Univ. Paris-Est, LASTIG STRUDEL, IGN, ENSG, Saint-Mande, France
^†EA G&E Bordeaux INP, Université Bordeaux Montaigne, Pessac, France
^‡Student at Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
^§CNRS UMR 7362 LIVE-Université de Strasbourg, Strasbourg, France
^¶Aix-Marseille Université, CNRS ESPACE UMR 7300, Aix-en-Provence, France

Abstract

Very high spatial resolution (VHR) multispectral imagery enables a fine delineation of objects and a possible use of texture information. Other sensors provide a lower spatial resolution but an enhanced spectral or temporal information, permitting one to consider richer land cover semantics. So as to benefit from the complementary characteristics of these multimodal sources, a decision late fusion scheme is proposed. This makes it possible to benefit from the full capacities of each sensor, while dealing with both semantic and spatial uncertainties. The different remote-sensing modalities are first classified independently. Separate class membership maps are calculated and then merged at the pixel level, using decision fusion rules. A final label map is obtained from a global regularization scheme in order to deal with spatial uncertainties while conserving the contrasts from the initial images. It relies on a probabilistic graphical model involving a fit-to-data term related to merged class membership measures and an image-based contrast-sensitive regularization term. Conflict between sources can also be integrated into this scheme.

Two experimental cases are presented. In the first case one considers the fusion of VHR multispectral imagery with lower spatial resolution hyperspectral imagery for fine-grained land cover classification problem in dense urban areas. In the second case one uses SPOT 6/7 satellite imagery and Sentinel-2 time series to extract urban area footprints through a two-step process: classifications are first merged in order to detect building objects, from which a urban area prior probability is derived and eventually merged to Sentinel-2 classification output for urban footprint detection.

Keywords

Late fusion; Decision fusion; Multimodal remote sensing; Classification; Land cover; Very high spatial resolution; Hyperspectral; Time series; Urban area; Urban footprint

11.1 Introduction

The last years have witnessed the emergence of a large variety of new sensors with various characteristics. The possibility to collect different kinds of observations over the same area has considerably increased: remote sensing can now be considered generically multimodal [1]. Those sensors can use different modalities (radar, Lidar or optical). They can be air-borne or satellite-borne. Even for the same modality, they can exhibit very distinct characteristics: for instance, optical sensors show a large range of spectral configurations (number of spectral bands, position and width of the spectral bands), spatial resolutions, coverage and, for spaceborne sensors, revisit times (i.e., minimum delay between two possible consecutive acquisitions over the same area, thus conditioning the possibility to capture genuine time series). As a consequence, combining remote-sensing data with different characteristics is a standard remote-sensing problem that has been extensively investigated in the literature [2]. The overall aim consists in fusing multisensor information as a means of combining the respective advantages of each sensor. Complementary observations can thus be exploited for land cover mapping purposes, which is a core remote-sensing application and the necessary input for a large number of public policies and environmental models. Combining existing sensors can mitigate limitations of any one particular sensor for various land cover issues [3,4].

This chapter specifically focuses on the fusion of one data type, exhibiting a very high spatial resolution, with another one, exhibiting a lower spatial resolution but enhanced complementary characteristics. Indeed, very high spatial resolution (VHR) multispectral imagery enables an accurate spatial delineation of objects and a possible use of texture information for enhanced class discrimination [5]. On the other hand, sensors with lower spatial resolutions offer enhanced spectral or temporal information, making it possible to consider richer land cover semantics. To illustrate this problem, two use cases will be considered, accompanied by two methodological contributions:

• The first one considers the fusion of very high spatial resolution multispectral imagery with lower spatial resolution hyperspectral imagery for detailed urban land cover classification. Hyperspectral imagery provides an accurate spectral information but generally with a low geometric precision. Hence, it can provide finer land cover semantics classification results, while a VHR image helps to retrieve the geometric contours of such classes. Combining both data sources helps to reach better accuracy scores at the highest spatial resolution of both datasets.
• The second one integrates very high spatial resolution multispectral/monodate SPOT 6/7 satellite imagery (classified with a convolutional neural network, CNN), and Sentinel-2 time series (classified with a random forest). The final aim is urban footprint detection. In this case, VHR imagery provides texture information and fine object delineation, while the Sentinel-2 time series gives access to better contextual information.

In both cases, the land cover fusion scheme targets benefiting from the complementary characteristics of these multimodal sources.

Existing data fusion approaches will be analyzed in Sect. 11.1.1. From this review, existing methods will be discussed, and a fusion strategy elaborated (Sect. 11.1.2). This proposed framework will then be presented in detail (Sect. 11.2), before being applied to the two above-mentioned use cases (Sects. 11.3 and 11.4, respectively).

11.1.1 Review of the Main Data Fusion Methods

Fusion of heterogeneous data sources have been widely investigated in the remote-sensing literature (e.g., [6–9]). Fusion can be carried out at three different levels [10]:

• The observation level: early fusion.
• The attribute/feature level: intermediate fusion.
• The decision level: late fusion.

11.1.1.1 Early fusion – fusion at the observation level

Fusion can be achieved at the observation level, i.e., through the direct joint analysis of the pixel values (with or without calibration procedures). For that purpose, pan-sharpening is a well-known technique that integrates the geometric details of a high resolution panchromatic image and the color information of a low spatial resolution multispectral (or hyperspectral) image to produce a high spatial resolution multispectral (or hyperspectral) image. Pan-sharpening methods usually use the panchromatic image to replace the high frequency part in the low resolution image [11]. Other fusion algorithms have been proposed to merge multispectral (or hyperspectral) and panchromatic (but also multispectral and hyperspectral) images to combine complementary characteristics in terms of spatial and spectral resolutions [12–14]). A review of such methods can be found in [14]. Eventually, super-resolution is another approach relying on early fusion of several sensors [15].

11.1.1.2 Intermediate fusion – fusion at the attribute/feature level

Data sources can be merged at the feature level. Features (spectral indices, texture-based, etc.) are computed for each source separately or for both of them and fed into the same classifier through a unique feature set [16]. Examples of remote-sensing pipelines involving fusion at the attribute level can be found in [17–22]. For instance, [18] proposed a conditional random field (CRF) model for building detection using InSAR and orthoimage features. Reference [20] merged Lidar and optical aerial image features for forest stand extraction: the proposed approach involves several steps (segmentation, classification, and regularization) and fusion is performed at each of them. Improvement is noticed for each step. More recently, deep convolutional neural networks were used to perform data fusion at the attribute level [23]. Reference [22] applied for instance deep forests to Lidar and hyperspectral imagery features. A detailed review can be found in [24]. Several datasets and challenges have been released in the last decades, under the aegis of the IEEE GRSS society [25–27].

11.1.1.3 Late fusion – fusion at the decision level

Late decision fusion happens after the classification process: the outputs of multiple independent classifiers are combined in order to provide a more reliable decision. Such classification results can be either label maps or class membership probability maps. Various late decision fusion methods have been proposed. Most of them can be divided into different categories: consensus rules (majority voting), probabilistic approaches (Bayesian fusion), credibilist or evidential ones, and possibilist ones.

The probabilistic, evidential, and credibilist decision fusion approaches are generic and can be applied to different fusion problems. They only require class “membership” measures (probabilities or belief masses depending on the approach) for each source and for each class, or at least a confidence measure for each source.

Possibilist methods use fuzzy logic-based fusion rules [28–30]. They require one to define weights [31] in order to better deal with the uncertainty of the different sources. Such generic approaches have been applied to remote-sensing data [32].

Evidential approaches are a generalization of probabilistic ones. They include the well-known Dempster–Shafer fusion rule [33]. In remote sensing, this rule has often been used to merge classifications or alarm detection results. For instance, [34–36] applied the Dempster–Shafer rule to combine several different supervised building detectors based on different remote-sensing modalities (optical, Lidar, radar). This rule was also used by [37] for the fusion of different road obstacle detectors in the context of intelligent vehicle development. Reference [38] used the Dempster–Shafer rule for the fusion of urban footprints detected at different dates out of satellite archival images. References [39,40] applied this rule to detecting changes in an unsupervised classification context.

Another evidential fusion rule is the Yager rule. It was applied by [41] to combine different road obstacle detectors for vehicle navigation. Other evidential rules have been proposed more recently: the Dezert–Smarandache rule may be mentioned [42]. Other new rules have also been proposed by [43,44]: they extend Dempster–Shafer rules in order to achieve a better management of the conflict between sources. However efficient in some cases, Dempster–Shafer remains a theoretical complex framework that does not easily apply when dealing with heterogeneous and multiple data.

Another important issue consists in defining the input of the different fusion rules. Different situations can be taken into account depending on whether a global confidence measure is affected for each source, or, conversely, whether class posterior probabilities for a given source are directly available for each pixel (directly provided by the initial classifier). In the former situation, global confidence measures affected to each source are calculated from the confusion matrices (indeed, a confusion matrix provides the probability of an object labeled class A by one source to belong in fact to class B). Validation data are thus necessary to calculate these weights. Furthermore, for evidential methods and especially Dempster–Shafer methods, uncertainty classes (i.e. unions of original classes) must necessarily be defined. Hence, belief masses associated to these classes must be computed. If a global confidence is associated to a source, some solutions have been proposed. For instance, we have Appriou's method [45], or more recently the method presented in [46,47]. If a class membership measure is affected to each pixel for each source and for each class, it is also possible to derive belief masses for these uncertainty (union) classes. Finally, [38] propose an alternative way to integrate these two kinds of information.

Last but not least, the last category of late fusion approaches consists in supervised learning. They automatically learn from training examples the best way to merge input sources. Per-source posterior class probabilities are concatenated and considered as a feature vector. This feature vector is then provided as an input to a classifier which is trained to perform the best possible fusion. Thus, they can be considered as being at the interplay between late and intermediate fusion (classifiers previously applied to each source can then be considered as a kind of feature generators, then referred to as auto-context classification [48]). Such approaches have been used with different classifiers: random forest [49,50], Adaboost [50], support vector machines [51,52]. These supervised learning-based methods enable good results but require a sufficient amount of training data to model the classes and avoid over-fitting (especially for deep learning).

In addition, it must here be said that most fusion methods mentioned in this section generate measures to assess the conflict between two sources.

Late approaches have often been applied to remote-sensing fusion problems [53–56,32,51,57–61]. Fusion methods operating at the decision level can be applied in two situations, considering if they try to merge multiple classifiers applied to either the same data source, or to multiple sources. For instance, [53] combined neuronal and statistical maximum likelihood classifiers using several consensus theory rules (i.e., majority voting, complete agreement) to classify multispectral and hyperspectral images. References [57,58] merged posterior probabilities from maximum likelihood classifications of optical images with prior information about classes derived out of, respectively, digital terrain models or digital surface models, as well as information from existing land cover databases. Reference [55] investigated the fusion of multitemporal thematic mapper images, using decision fusion-based methods (i.e., joint likelihood and weighted majority fusion). A characterization of the spatial organization of SAR image elements is investigated by [54] merging the responses of multiple low-level detectors applied to the same image within a Dempster–Shafer scheme. Reference [32] investigated the use of fuzzy decision rules to combine the classification results of a conjugate gradient neural network and a fuzzy classifier over an IKONOS image. Reference [61] combined convolutional neural networks and random forest classifiers using a multiplication Bayesian scheme.

11.1.2 Discussion and Proposal of a Strategy

Fusion can be performed at different levels. Fusion at the observation level, e.g., pan or multisharpening, is limited to specific situations where it has a real physical meaning (e.g., hyperspectral and multispectral images acquired simultaneously). It is not generic enough.

Fusion at the observation level or at the decision level is more generic and also eligible to the present fusion problem. Two main issues remain: (i) for both, the spatial scale of analysis and, subsequently, the interpolation process; (ii) for feature-based approaches, the ability to correctly handle the various data sources in the decision process. In the case of imbalanced feature sets, supervised techniques such as random forests or support vector machines, even with feature selection strategies, may prefer the data source generating the larger number of attributes. Thus, the process will not fully benefit from the advantages of all datasets.

As a consequence, late fusion strategy is adopted in this chapter. Indeed, contrary to intermediate level fusion methods, it makes it possible to initially process each input data source independently through specific optimal methods. Moreover, it even enables one to use already existing results from available operational land cover classification services, as long as they provide class membership confidence measures. Besides, especially in this last situation, it can be used without any ground truth (training) information, contrary to intermediate level fusion methods.

Most existing decision fusion methods do not explicitly take into account the fact that input data sources have different spatial resolutions, and thus do not explicitly deal with both semantic and spatial uncertainties. Spatial uncertainty handling here consists in removing classification noise, and enforcing that the classification result follows as closely as possible the natural borders in the original images. Such a task can be cast in the form of a smoothing problem. Local smoothing methods exist: majority voting, Gaussian and bilateral filtering [62], as well as probabilistic relaxation [63] are possible. The majority vote can be used in particular when a segmentation of the area is available: the major class is assigned to the segment. The vote can also be weighted by class probabilities of the different pixels. The probabilistic relaxation is another local smoothing method that aims at homogenizing probabilities of a pixel according to its neighbors. It is an iterative algorithm in which the class probability at each pixel is updated at each iteration in order to have it closer to the probabilities of its neighbors.

However, these local smoothing methods are generally outperformed by global regularization strategies [64,20]. Global regularization methods consider the whole image by connecting each pixel to its neighbors. They traditionally adopt Markov random fields (MRFs): the labels at different locations are not considered to be independent and the global solution can be retrieved with the simple knowledge of the close neighborhood for each pixel. The optimal configuration of labels is retrieved when finding the maximum a posteriori over the entire field [65,64]. The problem is therefore considered as a minimization procedure of a global energy over the whole image. Despite a simple neighborhood encoding (pairwise relations are often preferred), the optimization procedure propagates over large distances. Global regularization is often considered as a post-processing step within a classification process. It has been associated to late fusion in recent works, as for instance in [66,67].

As a consequence, so as to benefit from the complementarity of a very high spatial resolution sensor with another one, exhibiting lower spatial resolution but enhanced complementary characteristics, the proposed fusion framework involves (i) fusion at decision level, (ii) associated to global regularization. It mostly relies on existing state-of-the-art methods, but combines them in order to cope both with semantic and spatial uncertainties. Besides, it is flexible enough to integrate several fusion rules and be applied to various use cases.

11.2 Proposed Framework

A late fusion framework is proposed in order to benefit both from low spatial resolution data (but enhanced spectrally or temporally) and very high spatial resolution multispectral monodate data. It aims at dealing both with semantic and spatial uncertainties. It consists in three main steps, presented in Fig. 11.1.

(A) Classification of each original source and generation of the posterior class probabilities: the two input data sources are first automatically labeled, independently, by specific adapted processes. At the end, a posterior class probability map is generated for each of them. Here, predictions have already been computed and are considered as granted.
(B) Per pixel fusion of the posterior probabilities at the decision level: a per-pixel decision fusion is applied to these maps so as to combine them into a more accurate decision at the highest spatial resolution. Different fusion rules are considered. This step aims at dealing with semantic uncertainties between sources. A conflict measure between sources can also be generated at this step (Sect. 11.2.1).
(C) Global regularization: The final classification map is retrieved as a result of a global regularization step of the merged class membership probability map obtained at the previous step. It allows one to deal with spatial uncertainties between both sources, to reduce remaining noise and to take into account the contrast information of the original image and subsequently to follow as closely as possible the natural borders in images (Sect. 11.2.2).

11.2.1 Fusion Rules

11.2.1.1 Fuzzy rules

The first tested fusion approach is based on fuzzy rules [30]. Fuzzy rule theory states that a fuzzy set A in a reference set of classes L $L$ is a set of ordered pairs:

A=[(c,P(c)A(x)|c∈L)], $A = [(c, P_{A}^{(c)} (x) | c \in L)],$

(11.1)

where the membership probability of A in P $P$ is given by P(c)A:L→[0,1] $P_{A}^{(c)} : L \to [0, 1]$ . The measure of conflict (1−K $1 - K$ ) between two sources is given as [68]

K=supc∈Lmin(P(c)A(x),P(c)B(x)). $K = \sup_{c \in L} \min (P_{A}^{(c)} (x), P_{B}^{(c)} (x)) .$

(11.2)

In order to account for the fact that fuzzy sets with a strong fuzziness possibly hold unreliable information, each fuzzy set i is weighted according to a pointwise confidence measure wi $w_{i}$ [32]:

wi=∑nk=0,k≠iHαQE(Pk)(n−1)∑nk=0HαQE(Pk). $w_{i} = \frac{\sum_{k = 0, k \neq i}^{n} H_{α Q E} (P_{k})}{(n - 1) \sum_{k = 0}^{n} H_{α Q E} (P_{k})} .$

(11.3)

n is the number of sources and HαQE $H_{α Q E}$ is a fuzziness measure called α-quadratic entropy (QE) [31]. Each fuzzy set i is weighted by the fuzziness degree of all other fuzzy sets (i.e., classifications). If the fuzziness degree of the other sets is high, the weight of a given source i will be high too. For all the following fusion rules, the fuzzy sets have been weighted as P˜(c)A(x)=wA⋅P(c)A(x) ${\tilde{P}}_{A}^{(c)} (x) = w_{A} \cdot P_{A}^{(c)} (x)$ , P˜(c)B(x)=wB⋅P(c)B(x) ${\tilde{P}}_{B}^{(c)} (x) = w_{B} \cdot P_{B}^{(c)} (x)$ , P(c)A(x) $P_{A}^{(c)} (x)$ and P(c)B(x) $P_{B}^{(c)} (x)$ being the original membership probabilities and wA $w_{A}$ , wB $w_{B}$ their corresponding pointwise measures.

In further experiments, all fuzzy rules have been tested as input using the probabilities weighted by the pointwise measure.

The following fusion rules based on fuzzy logic were considered:

1. Minimum rule (Min) as the intersection of two fuzzy sets PA $P_{A}$ and PB $P_{B}$ , given by the minimum of their membership probabilities (conjunctive behavior):

∀c∈L(PA∩PB)(c)(x)=P(c)fusion(x)=min(PA(c)(x),P(c)B(x)). $\forall c \in L {(P_{A} \cap P_{B})}^{(c)} (x) = P_{f u s i o n}^{(c)} (x) = \min ({P_{A}}^{(c)} (x), P_{B}^{(c)} (x)) .$

(11.4)

2. Maximum rule (Max) as the union between the two fuzzy sets PA $P_{A}$ and PB $P_{B}$ , given by the maximum of their membership probabilities (disjunctive behavior):

∀c∈L(PA∪PB)(c)(x)=P(c)fusion(x)=max(PA(c)(x),P(c)B(x)). $\forall c \in L {(P_{A} \cup P_{B})}^{(c)} (x) = P_{f u s i o n}^{(c)} (x) = \max ({P_{A}}^{(c)} (x), P_{B}^{(c)} (x)) .$

(11.5)

3. Compromise operator:

P(c)fusion(x)=⎧⎩⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪max(T1,min(T2,(1−K)))if(1−K)≠1,max(P(c)A(x),P(c)B(x))if (1−K)=1. $P_{f u s i o n}^{(c)} (x) = {\begin{matrix} \begin{matrix} \max (T_{1}, \min (T_{2}, (1 - K))) \\ if (1 - K) \neq 1, \end{matrix} \\ \begin{matrix} \max (P_{A}^{(c)} (x), P_{B}^{(c)} (x)) \\ if (1 - K) = 1 . \end{matrix} \end{matrix}$

(11.6)

T1=min(P(c)A(x),P(c)B(x))K $T_{1} = \frac{\min (P_{A}^{(c)} (x), P_{B}^{(c)} (x))}{K}$ , T2=max(P(c)A(x),P(c)B(x)) $T_{2} = \max (P_{A}^{(c)} (x), P_{B}^{(c)} (x))$ .
It can be noticed that the operator behavior is conjunctive when the conflict between A and B is low (1−K≈0 $1 - K \approx 0$ ), and disjunctive when the conflict is high (1−K≈1 $1 - K \approx 1$ ). When the conflict is partial, the operator behaves in a compromise way [68].
4. Compromise modified: Since a pure compromise fusion rule would favor T1 $T_{1}$ , [69] proposed to measure the intra-class conflict as fc=abs(Cbest1−Cbest2) $f_{c} = abs (C_{best1} - C_{best2})$ with Cbest1=argmaxc∈LP(c)S(x) $C_{best1} = {argmax}_{c \in L} P_{S}^{(c)} (x)$ and Cbest2=argmaxc∈L∖Cbest1P(c)S(x) $C_{best2} = {argmax}_{c \in L ∖ C_{best1}} P_{S}^{(c)} (x)$ . They set a conflict threshold tc $t_{c}$ (e.g., tc=0.25 $t_{c} = 0.25$ for experiments in Sect. 11.3) to be used as follows:

Algorithm 1 Compromise rule according to [69]

5. Prioritized operators (referred to as Prior 1 for Eq. (11.7) and Prior 2 for Eq. (11.8)):

P(c)fusion(x)=max(P(c)A(x),min(P(c)B(x),K)), $P_{f u s i o n}^{(c)} (x) = \max (P_{A}^{(c)} (x), \min (P_{B}^{(c)} (x), K)),$

(11.7)

P(c)fusion(x)=min(P(c)A(x),max(P(c)B(x),(1−K))). $P_{f u s i o n}^{(c)} (x) = \min (P_{A}^{(c)} (x), \max (P_{B}^{(c)} (x), (1 - K))) .$

(11.8)

If the conflict between A and B is high (i.e., K≈0 $K \approx 0$ ), only P(c)A(x) $P_{A}^{(c)} (x)$ is taken into account (prioritized) and P(c)B(x) $P_{B}^{(c)} (x)$ is considered as a specific piece of information. Thus, the fusion result depends on the order of the sources A and B.
6. An accuracy dependent (AD) operator [32], integrating both local and global confidence measurements:

P(c)fusion(x)=max(min(wi.P(c)i(x),f(c)i(x)),i∈[1,n]), $P_{f u s i o n}^{(c)} (x) = \max (\min (w_{i} . P_{i}^{(c)} (x), f_{i}^{(c)} (x)), i \in [1, n]),$

(11.9)

where f(c)i $f_{i}^{(c)}$ is the global confidence of source i regarding class c, Pi $P_{i}$ is a class membership of source i, and wi $w_{i}$ is a normalization factor (see Eq. (11.3)). This operator ensures that only reliable sources are taken into consideration for each class, via the predefined coefficients f(c)i $f_{i}^{(c)}$ . The idea seems interesting. Nevertheless, the final result depends on the reliability of the classifier and also on the availability of ground truth data, which is mandatory to generate the f(c)i $f_{i}^{(c)}$ term.

11.2.1.2 Bayesian combination and majority vote

A straightforward approach is to sum or multiply the input class membership probabilities as a Bayesian sum (∼ majority vote) or product [70]:

P(c)fusion_sum(x)=P(c)A(x)+P(c)B(x), $P_{f u s i o n_s u m}^{(c)} (x) = P_{A}^{(c)} (x) + P_{B}^{(c)} (x),$

(11.10)

P(c)fusion_product(x)=P(c)A(x)×P(c)B(x). $P_{f u s i o n_p r o d u c t}^{(c)} (x) = P_{A}^{(c)} (x) \times P_{B}^{(c)} (x) .$

(11.11)

Those rules will be referred to as Bayesian sum and Bayesian product, respectively.

11.2.1.3 Margin-based rules

The aim of these rules is to take into account the confidence, measured by the classification margin, of each source. The classification margin is defined for each pixel x and each source s as the difference between the two highest class probabilities:

margin(s)(x)=P(Cbest1)s(x)−P(Cbest2)s(x), $m a r g i n^{(s)} (x) = P_{s}^{(C_{best1})} (x) - P_{s}^{(C_{best2})} (x),$

(11.12)

with C(s)best1(x)=argmaxc∈LP(c)s(x) $C_{best1}^{(s)} (x) = {argmax}_{c \in L} P_{s}^{(c)} (x)$ and C(s)best2(x)=argmaxc∈L∖{C(s)best1(x)}Ps(c)(x) $C_{best2}^{(s)} (x) = {argmax}_{c \in L ∖ {C_{best1}^{(s)} (x)}} {P_{s}}^{(c)} (x)$ .

Fusion can then be carried out preferring for each pixel the most confident source, i.e. the one with the highest margin. This fusion rule (referred to as margin-Max) selects, for each pixel, the source for which the margin between the two highest probabilities is the highest, with sources S=A,B $S = A, B$ and the classes L={ci}1⩽i⩽n $L = {c_{i}}_{1 ⩽ i ⩽ n}$ :

∀x,∀c∈LP(c)fusion(x)=P(c)Sbest(x), $\forall x, \forall c \in L P_{f u s i o n}^{(c)} (x) = P_{S_{b e s t}}^{(c)} (x),$

(11.13)

where Sbest=argmaxS∈Cmargin(s)(x) $S_{b e s t} = {argmax}_{S \in C} m a r g i n^{(s)} (x)$ .

The classifier confidence information provided by the margin can also be used to weight the class probabilities of each source in the Bayesian sum and product (respectively, margin Bayesian sum weighted, margin Bayesian product weighted):

P(c)fusion_sum(x)=P(c)A(x)⋅margin(A)(x)+PB(c)(x)⋅margin(c)B(x)margin(A)(x)+margin(B)(x), $P_{f u s i o n_s u m}^{(c)} (x) = \frac{P_{A}^{(c)} (x) \cdot m a r g i n^{(A)} (x) + P^{B (c)} (x) \cdot m a r g i n_{B}^{(c)} (x)}{m a r g i n^{(A)} (x) + m a r g i n^{(B)} (x)},$

(11.14)

P(c)fusion_product(x)=(P(c)A(x))margin(A)(x)margin(A)(x)+margin(B)(x)×(P(c)B(x))margin(B)(x)margin(A)(x)+margin(B)(x). $P_{f u s i o n_p r o d u c t}^{(c)} (x) = {(P_{A}^{(c)} (x))}^{\frac{m a r g i n^{(A)} (x)}{m a r g i n^{(A)} (x) + m a r g i n^{(B)} (x)}} \times {(P_{B}^{(c)} (x))}^{\frac{m a r g i n^{(B)} (x)}{m a r g i n^{(A)} (x) + m a r g i n^{(B)} (x)}} .$

(11.15)

11.2.1.4 Dempster–Shafer evidence theory

According to the Dempster–Shafer (DS) formalism, an information from a source s for a class c can be given as a mass function mc|mc∈[0,1] $m_{c} | m_{c} \in [0, 1]$ [33]. Dempster–Shafer's evidence theory rule assumes simple classes c∈L $c \in L$ as well as composed classes, which were hence limited to two simple classes at most [69].

Masses associated to each simple class are directly the class membership probabilities:

m(c)s(x)=P(x)s. $m_{s}^{(c)} (x) = P_{s}^{(x)} .$

(11.16)

For mixed classes, ∀c1,c2∈L,∀ $\forall c_{1}, c_{2} \in L, \forall$ pixel x and ∀s∈S $\forall s \in S$ , two versions were tested, denoted DSV1 $D S V_{1}$ and DSV2 $D S V_{2}$ , respectively:

m(c1∪c2)s(x)=(P(c1)s(x)+P(c2)s(x))×(1−max(P(c1)s(x),P(c2)s(x)))+min(P(c1)s(x),P(c2)s), $\begin{matrix} m_{s}^{(c_{1} \cup c_{2})} (x) & = (P_{s}^{(c_{1})} (x) + P_{s}^{(c_{2})} (x)) \\ \times (1 - \max (P_{s}^{(c_{1})} (x), P_{s}^{(c_{2})} (x))) \\ + \min (P_{s}^{(c_{1})} (x), P_{s}^{(c_{2})}), \end{matrix}$

(11.17)

m(c1∪c2)s(x)=12(P(c1)s(x)+P(c2)s(x))×(1−max(P(c1)s(x),P(c2)s(x)))+min(P(c1)s(x),P(c2)s). $\begin{matrix} m_{s}^{(c_{1} \cup c_{2})} (x) & = \frac{1}{2} (P_{s}^{(c_{1})} (x) + P_{s}^{(c_{2})} (x)) \\ \times (1 - \max (P_{s}^{(c_{1})} (x), P_{s}^{(c_{2})} (x))) \\ + \min (P_{s}^{(c_{1})} (x), P_{s}^{(c_{2})}) . \end{matrix}$

(11.18)

This leads to a mass m(c1∪c2)s(x)∈[0,1] $m_{s}^{(c_{1} \cup c_{2})} (x) \in [0, 1]$ , being 1 if P(c1)s=0 $P_{s}^{(c_{1})} = 0$ , P(c2)s=1 $P_{s}^{(c_{2})} = 1$ or P(c2)s=0 $P_{s}^{(c_{2})} = 0$ , P(c1)s=1 $P_{s}^{(c_{1})} = 1$ . In both versions, all masses are normalized such that ∑(s)c∈Lm(s)c(x)=1 $\sum_{c \in L}^{(s)} m_{c}^{(s)} (x) = 1$ .

The fusion rule is based on the following conflict measure between two sources A and B:

K(x)=∑c,d∈L′c∩d≠∅(s)=m(c)A(x)m(c)B(x), $K (x) = \sum_{\begin{matrix} c, d \in L^{'} \\ c \cap d \neq \emptyset \end{matrix}}^{(s)} = m_{A}^{(c)} (x) m_{B}^{(c)} (x),$

(11.19)

c,d∈L $c, d \in L$ being mixed classes with c∩d=∅ $c \cap d = \emptyset$ .

The fusion is performed by

m(c)fusion(x)=11−K(x)∑c1,c2∈L′c1∩c2=c(s)m(c)A(x)m(c)B(x). $m_{f u s i o n}^{(c)} (x) = \frac{1}{1 - K (x)} \sum_{\begin{matrix} c_{1}, c_{2} \in L^{'} \\ c_{1} \cap c_{2} = c \end{matrix}}^{(s)} m_{A}^{(c)} (x) m_{B}^{(c)} (x) .$

(11.20)

11.2.1.5 Supervised fusion rules: learning based approaches

In addition to the previous standard fusion rules, learning-based supervised methods were also tested [51,52,23]. Such methods consist in learning how to best merge both sources (based on a ground truth). A classifier is trained to label feature vectors corresponding to the concatenation of class membership measures from both sources. Thus, such a strategy can be considered at the interplay between late and intermediate fusion (classifiers applied independently to each source can then be considered as a kind of feature generators). It is similar to auto-context approaches. A drawback stems from the fact that they require a significant amount of reference data.

In next experiments, two classifiers were considered for supervised fusion: random forests (RFs) [71] and support vector machines (SVMs) [72] with a linear or a radial basis function (rbf) kernel.

11.2.2 Global Regularization

After fusion rules have been applied at the pixel level, a spatial regularization of the obtained classification map is performed. This regularization here aims at dealing with spatial uncertainties between both sources. Indeed, the fusion result still contains noisy patches, especially in transition areas between neighboring classes. Besides, considering original image information, such a regularization also enables one to preserve real-world contours more accurately.

A global regularization strategy is adopted [66]. The problem is expressed using an energetic graphical model and solved as a min-cut problem. Indeed, such a formulation has been used successfully for many purposes related to image processing in the last years [73].

11.2.2.1 Model formulation(s)

The problem is formulated in terms of an energy E that has to be minimized over the whole image I in order to retrieve a labeling C of the entire image which corresponds to a minimum of E. As commonly adopted in the literature, E consists of two terms, one related to the data fidelity, and one to prior spatial knowledge, setting constraints on class transitions between the different pairs of neighboring pixels N. Several options were considered for the different energy terms. We have

E(Pfusion,Cfusion,C)=∑x∈IEdata(C(x))+λ∑x,y∈Nx≠yEreg(C(x),C(y)), $E (P_{f u s i o n}, C_{f u s i o n}, C) = \sum_{x \in I} E_{d a t a} (C (x)) + λ \sum_{\begin{matrix} x, y \in N \\ x \neq y \end{matrix}} E_{r e g} (C (x), C (y)),$

(11.21)

where

Edata(C(x))=f(Pfusion(C(x))),Ereg(C(x)=C(y))=g(Pfusion(C(x)),Pfusion(C(y)),Cfusion(x),Cfusion(y),I(x),I(y)),Ereg(C(x)≠C(y))=h(Pfusion(C(x)),Pfusion(C(y)),Cfusion(x),Cfusion(y),I(x),I(y)). $E_{d a t a} (C (x)) = f (P_{f u s i o n} (C (x))), E_{r e g} (C (x) = C (y)) = g (P_{f u s i o n} (C (x)), P_{f u s i o n} (C (y)), C_{f u s i o n} (x), C_{f u s i o n} (y), I (x), I (y)), E_{r e g} (C (x) \neq C (y)) = h (P_{f u s i o n} (C (x)), P_{f u s i o n} (C (y)), C_{f u s i o n} (x), C_{f u s i o n} (y), I (x), I (y)) .$

The final label map corresponds to the configuration C which minimizes E over I.

The data term is a fit-to-data attachment term. It relies on the probability distribution Pfusion $P_{f u s i o n}$ , defined by a function f(.) $f (.)$ . The function f ensures that if the probability for a pixel x to belong to class C(x) $C (x)$ is close to 1, Edata $E_{d a t a}$ will be small. It will not impact the total energy E. Conversely, if the probability for a pixel x to belong to class C(x) $C (x)$ is low, Edata $E_{d a t a}$ will be near its maximum and will penalize such a configuration. The following options were tested:

Option 1: f(t)=−log(t), $Option 1 : f (t) = - \log (t),$

(11.22)

Option 2: f(t)=1−t. $Option 2 : f (t) = 1 - t .$

(11.23)

Earlier experiments [20] verified that Option 2 tends to smooth the classification map more than Option 2. Thus, the data term will be selected among these options depending on the targeted application and on the input data. Option 1 will be selected to keep small regions as long as they are relevant according to class probabilities, while Option 2 will be used to obtain smoother maps with wider flat areas [20].

The regularization term Ereg $E_{r e g}$ defines the interactions between a pixel x and its eight neighbors, setting a constraint to smooth the initial classification map. Several options were also considered. They were all based on an enhanced Potts model [64] in order to guarantee smoother label changes.

• A simple Potts model is defined by

Ereg(C(x)=C(y))=0,Ereg(C(x)≠C(y))=1. $E_{r e g} (C (x) = C (y)) = 0, E_{r e g} (C (x) \neq C (y)) = 1 .$

(11.24)

• This model can be modified to integrate an image contrast constraint. It accounts for the fact that label changes should be less strongly penalized in high-contrast areas of the image:

Ereg(C(x)=C(y))=0,Ereg(C(x)≠C(y))=(1−γ)+γV(x,y,ε). $E_{r e g} (C (x) = C (y)) = 0, E_{r e g} (C (x) \neq C (y)) = (1 - γ) + γ V (x, y, ε) .$

(11.25)

γ=0 $γ = 0$ yields a pure Potts model, while γ=1 $γ = 1$ puts all weight on the contrast term. The contrast component accounts for the fact that label changes need to be penalized in high frequencies areas. V(x,y,ε) $V (x, y, ε)$ is the term that integrates the contrast of the image (as defined below).
• Another model was proposed:

Ereg(C(x)=C(y))=0,Ereg(C(x)≠C(y))=(1−γ).(1−Pfusion(Cfusion(x))β)+γV(x,y,ε). $E_{r e g} (C (x) = C (y)) = 0, E_{r e g} (C (x) \neq C (y)) = (1 - γ) . (1 - P_{f u s i o n} {(C_{f u s i o n} (x))}^{β}) + γ V (x, y, ε) .$

(11.26)

This re-writing of the Potts model regarding the regularization term handles more efficiently the smoothing procedure. Indeed, when C(x)≠C(y) $C (x) \neq C (y)$ , Ereg(C(x)≠C(y)) $E_{r e g} (C (x) \neq C (y))$ becomes a function of Pfusion $P_{f u s i o n}$ and V. If Pfusion(Cfusion(x)) $P_{f u s i o n} (C_{f u s i o n} (x))$ is close to 1, decision fusion gives a high confidence for x to belong to the class Cfusion $C_{f u s i o n}$ . Then, Ereg $E_{r e g}$ becomes only dependent on V, which will decide whether the configuration Cfusion $C_{f u s i o n}$ is favored or not. Conversely, if Pfusion(Cfusion(x)) $P_{f u s i o n} (C_{f u s i o n} (x))$ is close to 0, Ereg $E_{r e g}$ is high, and the configuration Cfusion $C_{f u s i o n}$ is prone to be rejected. A Potts model is obtained if γ=0 $γ = 0$ and β=0 $β = 0$ .

The term V(x,y,ε) $V (x, y, ε)$ for image contrast is the same for all regularization term formulations. It is based on [66,74] and defined by

V(x,y)=1dim∑i∈[0,dim]Vi(x,y)ε, $V (x, y) = \frac{1}{d i m} \sum_{i \in [0, d i m]} V_{i} {(x, y)}^{ε},$

(11.27)

Vi(x,y)=exp(−(Ii(x)−Ii(y))22⋅MeanGrad(I)) $V_{i} (x, y) = \exp (\frac{- {(I_{i} (x) - I_{i} (y))}^{2}}{2 \cdot M e a n G r a d (I)})$ and MeanGrad(I)=1Card(N)∑x,y∈N;x≠y(Ii(x)−Ii(y))2 $M e a n G r a d (I) = \frac{1}{C a r d (N)} \sum_{x, y \in N; x \neq y} {(I_{i} (x) - I_{i} (y))}^{2}$ . dim is the dimension of the image I, Ii(x) $I_{i} (x)$ is the intensity for x in band i in the multispectral image I; ε∈[0,∞[ $ε \in [0, \infty [$ .

11.2.2.2 Optimization

Once the energy E has been defined, it has to be minimized in order to get a labeling configuration C (i.e., a classification map) of the entire image, which corresponds to a minimum of the energy E. This model can be expressed as a graphical model and solved as a min-cut problem [73,75].

The graph-cut algorithm employed here is the quadratic pseudo-Boolean optimization (QPBO)¹ [75,76]. QPBO is a classical graph-cut method that builds a probabilistic graph where each pixel is a node. The minimization is computed by finding the minimal cut. Contrary to several standard graph-cut methods for which the pairwise term Eregul $E_{r e g u l}$ can only consider the two configurations (C(x)=C(y)) $(C (x) = C (y))$ and (C(x)≠C(y)) $(C (x) \neq C (y))$ , QPBO enables one to integrate more constraints, defining a pairwise term Eregul $E_{r e g u l}$ differently according to these four configurations (C(x)=0,C(y)=0) $(C (x) = 0, C (y) = 0)$ , (C(x)=0,C(y)=1) $(C (x) = 0, C (y) = 1)$ , (C(x)=1,C(y)=0) $(C (x) = 1, C (y) = 0)$ and (C(x)=1,C(y)=1) $(C (x) = 1, C (y) = 1)$ .

QPBO performs binary classification. Extension to the multiclass problem is performed using an α-expansion routine [73]. Each label α is visited in turn and a binary labeling is solved between that label and all others, thus flipping the labels of some pixels to α. These expansion steps are iterated until convergence and at the end the algorithm returns a labeling C of the entire image which corresponds to a minimum of the energy E.

11.2.2.3 Parameter tuning

The regularization term E (Eq. (11.21)) is controlled by up to four parameters, depending on the retained formulation for Ereg $E_{r e g}$ : λ, γ, β and ε. Each of them is attached to a particular sub-term of E.

• λ∈[0,∞[ $λ \in [0, \infty [$ is a trade-off parameter between the terms Edata $E_{d a t a}$ and Ereg $E_{r e g}$ . The more λ increases the more is the regularization effect. The choice of this parameter will depend on the distribution of the decision fusion map to be optimized.
• γ∈[0,1] $γ \in [0, 1]$ is a trade-off parameter between the basic energy model and the rectified model, integrating the contrast measure.
• ε∈[0,∞[ $ε \in [0, \infty [$ is a parameter controlling the influence of the contrast measure in the energy term.
• β∈[0,∞[ $β \in [0, \infty [$ is a trade-off parameter between the smoothing criterion and the importance of Cfusion $C_{f u s i o n}$ in the model. If β is high, the smoothing criterion is predominant and the model approximates a Potts model. On the opposite, if β is low, the model will tend to follow the classification given by Cfusion $C_{f u s i o n}$ .

A simple Potts model can be obtained using the following parameterizations:

• γ=0 $γ = 0$ and β→+∞ $β \to + \infty$ ,
• γ=1 $γ = 1$ and β→+∞ $β \to + \infty$ and ε=0 $ε = 0$ .

A greedy way for parameter optimization was presented in [66]. The value λopt $λ_{opt}$ maximizing the classification result for a Potts model is assumed to be the same value as the one maximizing the fusion model classification result. Hence, λopt $λ_{opt}$ is first computed using a simple Potts model. Then, with λopt $λ_{opt}$ and γ=0 $γ = 0$ , βopt $β_{opt}$ is found. Similarly, using λopt $λ_{opt}$ and γ=1 $γ = 1$ , the value ϵopt $ϵ_{opt}$ is computed. Lastly, the trade-off parameter γopt $γ_{opt}$ maximizing results of the model is chosen in the [0,1] $[0, 1]$ interval. The process is iterated, optimizing parameters in the same order at each iteration according to the current parameter set.

Such a strategy can be performed by quantitative cross-validation when sufficient reference data is available. Otherwise, it can be empirically performed by qualitative (visual) evaluation of the results. The set of parameters yielding the nicest and smoothest possible result while following the real object contours can thus be identified. This solution is relevant when regularization also targets improving the visual quality and the interpretability of classification results in operational contexts.

In practice, a set of parameters defined for a classification problem and a decision fusion rule is stable enough to be used in other, similar situations.

11.3 Use Case #1: Hyperspectral and Very High Resolution Multispectral Imagery for Urban Material Discrimination

11.3.1 Introduction

This first use case concerns the joint use of hyperspectral and very high resolution (VHR) multispectral imagery for fine urban land cover classification. Indeed, several applications require fine-grained knowledge about urban land cover and especially urban material maps [77,78]. As no geodatabases contain such information, remote-sensing techniques are urgently required.

Mapping urban environments requires VHR optical images. Indeed, such a spatial resolution is necessary to individualize and precisely delineate urban objects and to consider sharper geometrical details (e.g., [79,80]). However, VHR sensors have generally a poor spectral configuration (usually four bands, blue–green–red–near infrared), limiting their ability to discriminate fine classes [81–84], compared to superspectral or hyperspectral (HS) sensors. Unfortunately, the latter generally exhibit a lower spatial resolution. To overcome the weaknesses of both sensors, HS and VHR multispectral (MS) images can be jointly integrated to benefit from their complementary characteristics and subsequently efficiently separate the classes of interest. Thus, the fusion of such sensors should enhance the classification performance at the highest spatial resolution.

It here may be recalled that early fusion (at observation level) i.e., image sharpening [14], could be applied within this context. However, late fusion is more generic and still valid even for images not acquired simultaneously and processed by specific land cover labeling approaches.

11.3.2 Fusion Process

As mentioned earlier, the method is based on three main steps:

1. Classification of HS and MS images and generation of the posterior class probabilities: the two images were classified independently. A SVM classifier with a radial basis function (rbf) kernel [72] was used. The posterior class probabilities were retrieved with the Platt technique [85]. The SVM classifier was used as a baseline, since it was shown to provide good results for this kind of data. However, other supervised classifiers could be used (e.g., random forest) as well as specific methods dedicated to HS imagery such as spectral unmixing [86] (endmember abundances would substitute standard class probabilities).
2. Fusion at the decision level: A decision fusion was applied to these posterior class probability maps to combine them at the highest resolution. Different fusion rules listed in Sect. 11.2.1 were tested: fuzzy decision rules (Min, Max, compromise, prioritized, accuracy dependent), Bayesian combination (sum and product based rules), evidence theory (Dempster–Shafer rule), margin theory (margin-Max rule).
3. Final optimization: This last step consists in performing a global regularization of the classification map obtained at previous step so as to deal with spatial uncertainties between both sources. The graphical model introduced in Sect. 11.2.2 was used: Option 1 (f(t)=−log(t) $f (t) = - \log (t)$ ) was retained for the data term, while the contrast sensitive regularization term was formulated following Eq. (11.26).
The parameters differ from the Potts configuration, which over-smooths the decision fusion classification: γ=0.5 $γ = 0.5$ , β=1 $β = 1$ , and ε=1 $ε = 1$ . For λ, two configurations were tuned, depending on the decision rule: λ=0.1 $λ = 0.1$ for the Min and Dempster–Shafer rules, and λ=10 $λ = 10$ for the compromise rule.

11.3.3 Datasets

Experiments were performed over three datasets captured over the cities of Pavia (Italy), and Toulouse (France; see Fig. 11.2). For all datasets, a SVM classifier was trained using 50 samples per class extracted from the images.

Figure 11.2 Datasets and corresponding ground truth with labels. From left to right, Toulouse Center, Pavia University, and Pavia Center.

Concerning Pavia city (Italy), two datasets called “Pavia University” and “Pavia Center” were used. They are free datasets widely used by the hyperspectral community and available on line.² Initially captured by a ROSIS hyperspectral sensor, these datasets have, respectively, 103 and 102 spectral bands from 430 to 860 nm. Pavia University is a 335 × 605 pixels image, Pavia Center is a 715 × 1096 pixels image, and both have a GSD of 1.3 m. Both scenes are composed of nine land cover classes (Fig. 11.2): Asphalt, Meadows, Gravel, Trees, Painted Metal Sheets, Bare Soil, Bitumen, Self-Blocking Bricks, Shadows for Pavia University and Water, Trees, Meadows, Self-Blocking Bricks, Bare soil, Asphalt, Bitumen roofing, Tiles roofing, Shadows for Pavia Center. MS images were generated for a Pleiades satellite spectral configuration (limited to three bands, red–green–blue), with a GSD of 1.3 m, while HS images were resampled at a lower spatial resolution of 7.8 m and at the full original spectral range (i.e., 103 and 102 bands), so that their pan-sharpening ratio would be the same as for the Toulouse dataset.

The third dataset is called “Toulouse Center” (France). It was captured over the city Toulouse in 2012 by Hyspex sensors [87]. It has 405 spectral bands ranging from 400 to 2500 nm, and an initial GSD of 1.6 m. Its associated land cover is composed of 15 classes (Fig. 11.2): Slate roofing, Asphalt, Cement, Water, Pavements, Bare soil, Gravel roofing, Metal roofing 1, Metal roofing 2, Tiles roofing, Grass, Trees, Railway tracks, Rubber roofing, Shadows. MS and HS images were created for the fusion purpose; a MS image using Pleiades satellite spectral configuration (four bands, red (R)–green (G)–blue (B)–near infrared (NIR)), with a GSD of 1.6 m, and a HS image which is a resampled version of the original image at a spatial resolution of 8 m [88].

11.3.4 Results and Discussion

11.3.4.1 Source comparison

The MS image is characterized by a high spatial resolution and few bands, while the HS one has a low spatial resolution and a hundred(s) of bands. As expected, the SVM classifier applied over these images led to:

• sharp object delineation in the MS image due to its good spatial resolution, but also a lot of artifacts (see Figs. 11.3, 11.4, and 11.5);

Figure 11.3 **Pavia University** classification results using the best decision fusion rule. From left to right, SVM classification of HS image, SVM classification of MS image, classification fusion by Min rule, global classification regularization.

Figure 11.4 **Pavia Center** classification results using the best decision fusion rule. From left to right, SVM classification of HS image, SVM classification of MS image, classification fusion by Min rule, global classification regularization.

Figure 11.5 **Toulouse Center** classification results using the best decision fusion rule. From left to right, SVM classification of HS image, SVM classification of MS image, classification fusion by Min rule, global classification regularization.

• a good discrimination of the different classes in the HS image. However, blurry object delineation is also noticed, due to its low spatial resolution (see Figs. 11.3, 11.4, and 11.5).

The corresponding classification accuracies are listed in Table 11.2: better results are retrieved using the HS image.

11.3.4.2 Decision fusion classification

10 different decision fusion rules were first tested and compared over the three datasets. The quantitative results provided in Table 11.1 lead us to consider the compromise, Bayesian product, margin-Max and Dempster–Shafer rules to be the most efficient rules. The comparison must also take into consideration the visual inspection of the results, as ground truth data remains very limited on these datasets. For Pavia University, four of the best accuracies were reached for Min, compromise, Bayesian product, and Dempster–Shafer rules. In practice, the Min/Compromise rules give the most satisfactory rendering, especially regarding the Self-Blocking Bricks class which is a conflicting class (see Fig. 11.3, magenta color class). The two other rules seem to overestimate this class and to a greater extent consider the HS classification map in the fusion process. This explains their higher accuracy (Table 11.1). The Min rule acts in a cautious way when taking the best of the lowest memberships, while the compromise rule acts depending on the degree of conflict between sources. The Bayesian product rule is a good and simple trade-off if the initial classification maps are not highly conflicting. Otherwise, the result will be degraded by wrong information.

Table 11.1

Classification accuracies (in %) after fusion procedure, 10 fusion rules at decision level are compared. (OA = Overall Accuracy; F-score = mean F-score.)

blank cell	Pavia University			Pavia Center			Toulouse Center
blank cell	OA	Kappa	F-score	OA	Kappa	F-score	OA	Kappa	F-score
Max	92.8	90.7	90.6	98.5	97.8	96.0	75.6	62.4	69.8
Min	96.1	94.9	95.1	98.6	98.0	96.3	72.2	58.7	65.8
Compromise	96.1	95.0	95.0	98.8	98.3	96.7	73.6	60.2	68.0
Prior1	94.7	93.1	93.4	98.2	97.5	95.3	71.3	57.7	65.5
Prior2	92.8	90.7	90.6	98.5	97.8	96.0	75.6	62.4	69.8
AD	95.0	93.5	93.5	99.0	98.7	97.7	75.8	58.1	28.3
Sum Bayes	95.0	93.5	93.2	98.7	98.1	96.5	75.7	62.7	70.5
Prod Bayes	96.6	95.5	95.6	99.0	98.6	97.2	74.5	61.4	69.8
Margin-Max	94.0	92.2	92.0	98.8	98.3	96.6	75.6	62.5	69.6
Dempster–Shafer V1	96.4	95.4	95.3	98.9	98.5	97.1	74.6	61.5	69.8

Concerning Pavia Center, all the rules seem accurate (Fig. 11.4, e.g., with the Dempster–Shafer rule), with an overall accuracy higher than 98% (Table 11.1). When visually inspecting the results, all rules gave similar good results excepting Prior 1, showing a result guided by the HS classification map rather than the MS one.

The Toulouse dataset is the largest one, with up to 15 classes. This explains the lower accuracies reached for this dataset. The best results were given by the Max, Prior 2, Bayesian sum and Dempster–Shafer rules. In practice, the Max, prior and sum rules seem to overestimate certain classes; especially tile roofing and vegetation. The best qualitative results are given by the Min, compromise and Dempster–Shafer rules. Despite a satisfactory accuracy, the AD rule exhibits many misclassifications regarding tile roofing (i.e., underestimation), metal roofing 1 (i.e., overestimation), and an erroneous detection of the gravel roofing. This is mainly due to the global accuracy measure which is included in the rule and calculated thanks to a limited ground truth data.

However, due to the very limited amount of reference data, the quantitative accuracies do not necessarily transcribe the real potential of the fusion rules. The best ones from a quantitative and practical qualitative point of view are the compromise, the Bayesian product and the Dempster–Shafer rules.

In this study, VHR-MS images as well as HS ones at lower resolution were generated from VHR HS original images. Thus, working on such synthetic datasets leads to quite optimistic results, but this is sufficient to assess the different fusion rules. Besides, the fusion method is flexible enough for instance to integrate a specific process to deal with shadows in a diachronic acquisition context.

11.3.4.3 Regularization

Global regularization was applied to enhance the classification results and eliminate the artifacts. Table 11.2 presents the optimization results for the best fusion rules per dataset. Indeed, the optimization procedure permits one to enhance further the classification. Quantitatively, it slightly enhances the decision fusion classification (by 1–2%) but offers a better visual rendering with an elimination of the artifacts, a better decimation of the classes borders, and a regularization of the scattered pixels (Figs. 11.3, 11.4 and 11.5). These optimized maps seem better in modeling the real scene. The optimization effect is more visible over Pavia University and Toulouse Center. Concerning Pavia Center, the decision fusion gives already good results and thus, the optimized maps are only slightly improved (Table 11.2). Results obtained over the Pavia datasets are comparable to other studies (e.g., [17]). For Pavia University; the painted metal sheets are better recovered and no mismatches with the surrounding road are noticeable. The proposed method permits one to extract some bitumen buildings that were difficult to differentiate from roads (i.e., upper right and lower right, Fig. 11.5), even if the gravel buildings could still be better refined. For Pavia center, the global rendering is enhanced with a minimization of the classification artifacts.

Table 11.2

Classification accuracy of images HS and MS separately, after decision fusion, and after global regularization. For each dataset, results are provided for the fusion rule achieving the best final results after global regularization.

	Image HS classification	Image MS classification	Decision fusion	After regularization
	Pavia University (Min rule)
OA (%)	94.7	68.8	96.1	97.0
Kappa (%)	93.1	61.6	94.9	96.1
F-score (%)	93.4	72.8	95.1	96.3
	Pavia Center (Dempster–Shafer V1 rule)
OA (%)	98.2	92.0	98.9	99.3
Kappa (%)	97.5	89.0	98.5	99.0
F-score (%)	95.3	83.5	97.1	98.0
	Toulouse Center (Compromise rule)
OA (%)	71.2	69.2	73.5	74.6
Kappa (%)	57.6	53.8	60.2	61.5
F-score (%)	65.4	55.9	68.0	70.9

11.3.5 Conclusion

Several decision fusion methods were tested and compared. Among the fuzzy rules, the Min and Compromise rules are the most efficient. The Max rule often leads to misclassifications due to the fact it pays more confidence to the highest membership. The prioritized rules favor a source rather than the other. Indeed, the reliability is not ensured, as noticed for Prior 1, which gives confidence to the less reliable source. The AD rule accuracy is too dependent on the ground truth reliability: it gives encouraging results for Pavia datasets, but the accuracy was not sufficient for Toulouse dataset. The Bayesian sum and product rules can be interesting in the case of low conflict between sources, since they give acceptable results over Pavia Center and Toulouse. Concerning the proposed margin-based rule, it performs well over Pavia center, and correctly over Toulouse. However, it is not sufficient over Pavia University. Finally, the Dempster–Shafer rule has homogeneous performance over the three datasets, leading always to interesting results.

Even if the decision fusion enables one to increase the classification accuracy compared to the initial maps, the results remain affected by classification artifacts and unclear borders. The final maps are either guided by one of the initial maps or by both: the final result is, therefore, a better version of the initial maps. The optimization procedure gives encouraging results, with clear borders among the different classes, and artifacts elimination.

The method also has the possibility to integrate other decision rules in a fully tunable way. The optimization model is simple and flexible and could be modified according to the used dataset and the spatial resolution of the data sources. In further work one will investigate the explicit use of conflict measures from the fusion step within the regularization framework. At the moment, the optimization parameter selection is rather manual; some automation could be included, and other contrast measures could be tested to improve the accuracy.

11.4 Use Case #2: Urban Footprint Detection

11.4.1 Introduction

This second use case focuses on the detection of urban areas out of SPOT 6 and Sentinel-2 satellite imagery. Mapping urban areas is important to monitor urban sprawl and soil imperviousness, and to predict their further evolution [89,90]. Remote sensing is highly relevant for such a regular and continuous monitoring over time. Supervised classification approaches using satellite imagery have been extensively studied in order to automate the process of land cover (LC) classification [91–93,38], but often rely only on one sensor.

Urban and peri-urban areas are complex and heterogeneous landscapes containing impervious areas, trees, grass, bare ground, and water [94,95]. “Artificialized areas” can be defined as irreversibly impervious areas, including buildings and roads, but also small enclosed pervious structures such as gardens, backyards, and green public spaces [96]. There is no unique clear definition of the urban area or footprint. It generally corresponds to a simplification of the artificialized area: road networks outside of built areas are then excluded.

This study aims at detecting such area automatically out of multisource remote-sensing data, trying to follow the real-world city boundary contour as closely as possible. Isolated built-up areas should also be retrieved.

The remote-sensing paradigm has drastically changed in the very last years with the advent of new sensors exhibiting enhanced spectral, spatial, swath or revisit period characteristics, making it possible to acquire datasets at country scale in a limited time. SPOT 6/7 and Sentinel-2 are examples of these new sensors. They will be used in this use case. Indeed, on one hand, they are freely available over the whole French territory thanks to the Théia initiative³ and GEOSUD Equipex.⁴ On the other hand, they exhibit complementary characteristics:

• VHR sensors, e.g., SPOT 6/7, enable the delineation of small features and the use of texture information. However, they often do not have enough spectral information to distinguish fine Land cover types.
• Sentinel-2 sensors exhibit more spectral bands coupled with an important revisit frequency but with a rather limited geometric resolution (10–20 m).

Last years have witnessed the advent of deep learning methods, and especially Convolutional Neural Networks (CNNs) [97–99]. Such approaches have shown their superiority compared to standard classification processes. Indeed, thanks to their end-to-end process, CNNs directly learn optimized (spectral and textural) features (convolution filters) for each classification problem as well as the best way to use them (i.e., the classification model). Besides, implicit features directly take into account the context and thus perform a multiscale analysis of the image. As a counterpart, they require a huge amount of training data.

New studies as regards urban footprint detection have been initiated by the advent of Sentinel data. Sentinel-2 optical images exhibit excellent spectral and temporal characteristics. They are perfectly tailored for land cover production. In [91], Sentinel-2 time series are directly classified by a Random Forest for the yearly extraction of 20-class land cover maps. The method presented in [38] can also be applied to such time series, classifying each date independently before merging the results by a Dempster–Shafer process. It must here be said that, as Sentinel-2 exhibits 10 m GSD for some bands, several studies have tried to use both their radiometric and texture information to detect urban areas [92,100]. However, here, it was decided to focus on Sentinel-2 specificities (enhanced spectral characteristics and time series) and not to exploit the texture information poorer than the one from SPOT 6/7.

To summarize, deep learning approaches are optimal to analyze SPOT 6/7 images. Their spatial resolution makes it possible to try to detect urban elements (e.g. buildings) and to use them to derive urban areas. For Sentinel-2 data, it is more interesting to focus on their specificities (enhanced spectral characteristics and time series). Thus, the fusion of such sources would combine their advantages to reduce spatial and semantic uncertainties. The late fusion scheme proposed in Sect. 11.2 is adapted, considering again that original data have been classified earlier and independently by specific methods. Besides, it can enable the integration of existing land cover maps produced such as Théia's ones based on [91].

11.4.2 Proposed Framework: A Two-Step Urban Footprint Detection

The proposed workflow (Fig. 11.6) consists of three steps.

1. The SPOT 6/7 image and the Sentinel-2 time series are individually classified according to a 5-class nomenclature: buildings, roads, water, forest, and other vegetation. A membership to each class is provided per pixel.
2. The two classification results are merged at the decision level, aiming at the best detection of building objects.
3. These detected buildings are considered as seeds of urban areas: they are used as prior knowledge for being in an urban area, which is then merged with a binary urban/non-urban Sentinel-2 classification within a second fusion still at the decision level.

Figure 11.6 Proposed framework for urban footprint detection.

Both fusions (for the 5-class and the binary classifications) are performed following the scheme presented in Sect. 11.2.

11.4.2.1 Initial classifications

Both sources are classified individually. The Sentinel-2 time series is labeled using a random forest (RF) classifier trained from 50,000 samples per class. RF is used to have a framework similar to the one from [91], of which the LC maps are intended to be available at a national scale. The SPOT 6/7 image is classified using a deep Convolutional Neural Network (CNN) [98], because of its high ability to efficiently exploit context and texture information from VHR image. The training process of the CNN used 10,000 samples per class (from which 10% were kept for cross-validation). Both classifiers produce membership probabilities for the five classes.

11.4.2.2 First regularization

Both fusions (i.e., for the 5-class nomenclature and the urban/non-urban one) are performed according to Sect. 11.2, involving a per-pixel decision fusion followed by a spatial regularization.

1. Per pixel fusion: several rules are compared. In addition to the rules tested in Sect. 11.3, supervised ones, consisting in learning from the ground truth how to best merge both sources, are investigated for the 5-class problem. A classifier is trained to label feature vectors corresponding to the concatenation of class membership measures from both sources. Two classifiers are compared for fusion: RF and Support Vector Machine (SVM):
- • RF, with 10,000 training samples per class and 100 trees;
- • SVM with a linear kernel, with 10,000 training samples per class;
- • SVM with a radial basis kernel function (rbf) with a lower number of 500 training samples per class due to practical reasons of higher calculation times. Parameters of the SVM model were optimized using cross-validation.
2. Regularization: In order to smooth the fusion result, still containing noisy labels, and make it follow real-world image contours more accurately, a global regularization is performed. The graphical model introduced in Sect. 11.2.2 is used: Option 2 (f(t)=1−t $f (t) = 1 - t$ ) is retained for the data term, while the contrast-sensitive regularization term is formulated as in Eq. (11.25).
For the 5-class fusion, the contrast term is calculated on the SPOT 6/7 image blurred by a Gaussian filter of standard deviation 2 to obtain smoother contours. For the binary classification fusion, it is calculated on the Sentinel-2 image, also blurred by a Gaussian filter of standard deviation 2.

11.4.2.3 Binary classification and fusion

The “urban/non-urban” fusion requires one to derive binary class probabilities from results of the previous steps. Buildings from the 5-class fusion result are considered as seeds of urban areas and used to define a prior to be in an urban area (see Fig. 11.7): a linearly decreasing function assigning a probability is applied surrounding all buildings. This probability to be in an urban area starts from 1 and reaches a value of 0 after a distance of 100 m to a building.

Figure 11.7 From left to right, buildings detected from the 5-class fusion scheme (red), distance to these buildings (black: low → white: high), prior probability to be in an urban area (black: low → white: high).

Class posterior probabilities from the Sentinel-2 image RF 5-class classification are converted to a binary classification with an urban area class (u), and a non-urban area class (¬u): P(u)=P(b)+P(r) $P (u) = P (b) + P (r)$ and P(¬u)=1−P(u) $P (\neg u) = 1 - P (u)$ , with P(b) $P (b)$ and P(r) $P (r)$ the class probabilities for buildings and roads, respectively.

Then, the prior probability map to be in an urban area according to previous building detection is merged together with binary class probabilities from the Sentinel-2 RF classifier.

11.4.3 Data

Sentinel-2 offers both a spectral configuration improved over the usual multispectral sensors and has an ability to acquire time series (5 day revisit). In further experiments, only the 10 spectral bands having a 10 or 20 m GSD are used. All are upsampled to 10 m GSD. Six dates (namely, August 15th 2016, January 25th 2017, March 16th 2017, April 12th 2017 and May 25th 2017) are kept. They were retained both because of their low cloud cover and in order to have different seasons/appearances of land cover classes.

SPOT 6/7 includes four spectral bands (red–green–blue–near infrared) pan-sharpened to 1.5 m. A single date (April 16th) is used.

A ground truth of five classes is generated (Fig. 11.8) from available national reference geo databases (training and evaluation: the number of pixels used as training samples is very small compared to the total amount of samples). Buildings, roads and water areas are extracted from IGN's BD Topo®⁵) topographic database, forests from IGN's BD Forêt®⁶ database and crops from the Référentiel Parcellaire Graphique⁷ of the French Ministry of Agriculture.

Figure 11.8 Left: SPOT 6/7 image. Right: the associated ground truth.

Experiments are performed over a test area spanning 648 km² in Finistère, North Western France. This study area contains both urban, peri-urban, rural, and natural landscapes.

11.4.4 Results

11.4.4.1 Five-class classifications

For the sake of visibility, the results are shown over a restricted area of 0.64 km² (Figs. 11.10, 11.9 and 11.11). There, the original classifications exhibit several errors: the impact of data fusion can be clearly demonstrated. Quantitative evaluation over image is for five tiles of 3000 × 3000 m size, totaling an area of 45 km², distributed over the entire 648 km² study zone. Each classification is compared to the class labels of the ground truth (Fig. 11.8). Evaluation measures for individual classifications, fusion and regularization, all using five classes, are shown in Table 11.3.

Figure 11.9 The two initial classifications. (A) SPOT 6 CNN classification. (B) S2 RF classification. The SPOT 6/7 image is superimposed to the classifications.

Table 11.3

Accuracy scores (in %) for the first step. OA = overall accuracy, AA = average accuracy, Fm = Mean F-score, Fb = F-score for buildings. Fusion rules are described in Sect. 11.2.1.

Method	Kappa	OA	AA	Fm	Fb
Original classifications
Sentinel-2	72.0	83.7	81.5	64.6	52.2
SPOT 6	73.3	85.2	70.8	63.4	62.5
Per pixel fusion (before regularization)
Fuzzy Min	79.1	88.4	84.7	76.7	73.8
Fuzzy Max	77.3	87.4	84.2	73.2	70.5
Compromise	78.8	88.2	84.5	76.2	72.5
Prior 1	73.3	85.2	70.8	63.4	62.5
Prior 2	73.3	85.2	70.8	63.4	62.5
Bayesian Sum	78.3	87.9	84.8	74.8	71.9
Bayesian Product	79.1	88.5	85.0	76.7	73.8
Dempster–Shafer V1	79.1	88.4	85.1	76.5	73.7
Dempster–Shafer V2	79.0	88.4	85.1	76.4	73.7
Margin Maximum	77.7	87.6	84.5	73.4	70.2
Margin Bayesian Product	78.4	88.0	84.7	75.1	72.0
Margin Bayesian Sum	78.0	87.7	84.6	74.0	71.0
RF	81.8	90.0	90.1	81.2	81.6
SVM linear	80.5	89.3	88.6	77.9	80.4
SVM rbf	81.0	89.6	89.1	79.1	83.4
Fusion and regularization
Fuzzy Min	75.1	85.8	82.3	73.9	73.8
SVM rbf	81.4	89.8	89.3	79.8	83.9

Initial classifications. Original classifications confirm the initial observation that the SPOT 6/7 CNN result tends to preserve small objects. However, some confusions between (bare soil) crops and built-up areas occur. The Sentinel-2 RF classification overall behaves better, but it mixes buildings and roads due to its coarse spatial resolution. The results from the individual classifications on the SPOT 6/7 and Sentinel-2 images are shown on Fig. 11.9. This area was selected to illustrate problems of both classifications and improvements obtained with fusion and regularization (better results are observed in all other areas). Confusion between water and built areas can be noticed. This phenomenon is caused by the fact that the water area database used to generate training samples included ponds at the bottom of careers appearing white and very similar to built-up areas.

Per pixel fusion. This first fusion is performed at the SPOT 6/7 resolution (1.5 m) as it aims at achieving the most fine detection of building objects.

Among the classic fusion rules proposed by [69], the Min fuzzy rule produces the best results, following the objects' borders most precisely while producing the least class confusions (Fig. 11.10 and Table 11.3). Considering Fig. 11.10, all rules managed to eliminate the wrongly classified building patch (top-left), preferring the Sentinel-2 classification over the SPOT 6/7 one. The Min fusion rule follows the field contours a bit less smoothly than the other ones. The industrial area (at the center of the displayed area) is still confused with water in both results (such a confusion can be explained by the presence of very white water area training samples).

Figure 11.10 Fusion results before regularization. (A) Min rule. (B) SVM rbf with buffer class.

The RF/SVM supervised fusions initially tend to produce patches of buildings rather than separate buildings due to missing training data (and thus constraints) between buildings and roads (Fig. 11.8). Indeed the ground truth contains gaps between buildings and so the classifiers tend to aggregate individual buildings as there are no training data available around building to prevent the classifier from doing it. Adding an additional sixth buffer class “around buildings” helps to refine the contours and to obtain a higher level of detail than just using the Min or Bayes rules. It preserves more details of individual buildings, but can cause some confusion between buildings and this “buffer” class in the center of very wide buildings in industrial areas (see Fig. 11.10). It can also erase small patches of buildings. However, this remains quite an exception and is not a real problem for the final goal of detection of urban areas. The same observations are made on all areas, even those on which the supervised fusion model is not trained. Thus, the result of the supervised fusion using a rbf SVM classifier with buffer class is used in subsequent steps.

Regularization. The parameters are as follows: λ=10 $λ = 10$ , γ=0.7 $γ = 0.7$ , ε=50 $ε = 50$ . They were selected by cross-validation, optimizing visual qualities of the regularization result. The contrast term is calculated from the SPOT 6/7 image: it especially aims at obtaining the finest detection of building objects. The result is shown in Fig. 11.11. The regularization performs well, smoothing out small noisy patches and yielding a visually more appealing result. However, accuracy measures of the regularization remain similar to the ones of the fusion (cf. Table 11.3).

Figure 11.11 SVM rbf fusion after regularization.

11.4.4.2 Urban footprint extraction

Raw urban area maps directly derived using the binary class probabilities are shown in Fig. 11.12. Again, they are first merged using a per pixel fusion rule. The supervised learning based fusion approaches cannot be used here since no training data for urban/non-urban areas was available. Several rules yield visually similar results; the Min fuzzy fusion rule is eventually chosen.

Figure 11.12 Urban area: input dilated detected building mask (top left), Sentinel-2 classification (top right) and fusion result before (bottom left) and after (bottom right) regularization.

Global regularization is then performed with the same (λ,γ,ε) $(λ, γ, ε)$ parameters as for the 5-class regularization, but the image contrast term is now calculated from the Sentinel-2 image. The result obtained is shown in Fig. 11.12. Compared to Sentinel-2 detection, roads and small misclassifications have been removed. Compared to the dilated building mask from previous step, the objects' borders fit better to the true urban area borders.

As no true reference data for urban areas is available, strict evaluation is not possible. However, the detected artificialized area can be compared to binary ground truth maps derived from the following other related databases:

• Dilated BD Topo® ground truth: Buildings of this database used as ground truth for the 5-class classification were dilated by 20 m.
• OpenStreetMap (OSM) data which uses refined Corine land cover data;
• The CESBIO land cover OSO product⁸ [91], gathering the classes “urban diffuse”, “dense urban” and “industrial and commercial zones”.

Strict quantitative evaluation is not possible, but such data can provide some hints: accuracies are provided in Table 11.4. Besides, a visual comparison with the different sources is shown in Fig. 11.13.

Table 11.4

Accuracy measures: F-score for the buildings class (F-score_u), Kappa, overall accuracy (OA) and intersection over union for the buildings class (IoU_u).

Classification	Ground Truth	F-Score_u [%]	Kappa [%]	OA [%]	IoU_u [%]
Binary dilated buildings	BD Topo®	86.7	83.2	94.5	76.5
	OSO	56.8	50.1	86.8	39.7
	OSM	58.3	51.9	87.3	41.2
Binary Sentinel-2	BD Topo®	65.2	56.4	85.9	48.3
	OSO	63.9	58.3	89.3	46.9
	OSM	52.6	45.5	86.0	35.7
Fusion Min	BD Topo®	79.8	75.2	92.4	66.3
	OSO	66.9	62.2	91.2	50.3
	OSM	62.9	57.6	90.2	45.9
Regularization	BD Topo®	79.7	75.1	92.5	66.2
	OSO	67.4	62.8	91.5	50.9
	OSM	64.4	59.4	90.7	47.4

Figure 11.13 Binary evaluation for our urban footprint. (A) Comparison to dilated BD Topo®. (B) Comparison to OSM. (C) Comparison to OSO.

The following aspects can be underlined:

• The derived OSM ground truth is rather simplified and contains few, rough urbanized patches.
• The derived OSO ground truth is partial, containing gaps of artificial areas within urban patches. This is due to the resolution and methodology used for the classification and the selection of classes considered as urban.
• The dilated BD Topo® ground truth is the most truthful, as expected. However, it is also approximate and exceeds the urban area due to the dilatation used to generate it.

The dilated BD Topo® ground truth generally yields the highest agreement with classifications in terms of accuracy measures. The fusion and regularization steps improve the accuracy measures over the individual input classifications with the exception of the binary regularization input, which can be explained by the fact that both have been produced by dilatation.

11.4.5 Conclusion

A framework was proposed to detect urban areas. Sentinel-2 and SPOT 6/7 data were classified individually in five topographic classes. Decision-level fusion and regularization then enable one to obtain a result preserving high geometric details while reducing misclassifications. Results presented in this study were presented on one dataset, but the processing chain was applied to another region (Gironde department in South Western France) exhibiting a different (climatic and topographic) landscape. It led to similar conclusions, showing high generalization potential despite varying behaviors of the initial classifiers.

Traditional fusion methods enable artifact removal but can keep confusion between buildings and roads. In contrast, supervised fusion methods enable an enhanced detection of buildings at the price of a new artificial “building buffer” class. Such an introduction comes at the cost of introducing a certain additional amount of semantic uncertainty. However, new class confusions mostly occur between the vegetation and forest classes, which is not a problem here. Buildings could thus be extracted with a higher amount of semantic details.

Second, the urban area can be approximated by fusion and regularization merging the urban / non-urban class membership probabilities of the Sentinel-2 classification and a urban prior measure derived from previously detected buildings. A simple function was used to derive a probability map of urbanized area. Improvements could be made having a more advanced urban membership prior measure, decreasing faster for uncertain buildings.

Furthermore, the promising results of supervised fusion would justify the use of CNNs for fusion. Although such a strategy looks also promising for the identification of artificialized areas, it would have to face missing ground truth data and heterogeneous, user- and application-dependent definitions of such areas.

11.5 Final Outlook and Perspectives

A fusion framework was proposed to merge very high spatial resolution, monodate, multispectral images with time series of images exhibiting an enhanced spectral configuration but with a lower spatial resolution. It mostly relies on existing state-of-the-art methods, combined in order to cope both with semantic and with spatial uncertainties. Besides, it is flexible enough to integrate other fusion rules. It is a late fusion strategy, permitting one to initially process each input data source independently through specific methods, and even to use already calculated results from existing operational land cover classification services. Besides, in this case, it can be used without any ground truth information, contrary to intermediate level fusion methods.

The proposed framework was applied to two different use cases. For each of them, classification results were improved. Several fusion rules were tested. Good results were reached for several of them, but the best results were obtained by the “Minimum” fuzzy rule, by Dempster–Shafer rules, and, when sufficient training data is available, by supervised learning-based methods.

At present, the proposed fusion framework has been applied to only two sources, but it could easily be extended to more sources. Besides, for the moment, it has been tested only for optical images exhibiting different characteristics, but it is generic enough to be applied to input data of different modalities. It has also been used only to merge class membership probability maps from classification results, but it would be interesting to apply it to other kinds of results. Especially for the lower spatial resolution source, it would be relevant to use class abundances obtained from an unmixing process (applied to the hyperspectral case or time series of data).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 11: Decision Fusion of Remote-Sensing Data for Land Cover Classification

Create new playlist

Sign In

Sign Up