Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Chapter 12: Cross-modal Learning by Hallucinating Missing Modalities in RGB-D Vision

Index

Accuracy,

classification, 361, 364, 397

detection, 80, 104, 125

localization, 4, 68, 87, 236, 253

positioning, 237, 251

Adversarial,

training, 17

Adversarial variational Bayes (AVB), 13, 17

Adversarially learned inference (ALI), 13, 19

Aerial image scenes, 62

Alarm detection, 344

Amazon Mechanical Turk (AMT), 151

Ambient light sensors, 249

Anchor point relation (APR), 225

Annotated datasets, 280

Application programming interface (API), 246

Architecture, 11, 20, 22, 25, 26, 29, 105, 106, 109, 110, 112, 117, 121, 209, 214–216, 220, 238, 241, 244, 249, 387, 388, 391, 392

baseline, 388

CNNs, 15

FuseNet, 57

fusion, 103, 109, 125, 214, 215

model, 27

multimodal, 31

network, 43–45, 48, 49, 81, 386

optimal, 109

performance, 109

TRPN, 112

Artificial intelligence (AI), 240

Artificial neural network (ANN), 71, 247

Automated drone, 257

Automated localization, 239

Automatic guided vehicles (AGV), 251

Automatic multimodal coding systems, 3

Autonomous driving (AD), 1, 10, 102, 160, 202, 238, 244, 252, 260, 261, 384

Autonomous driving vehicles, 202, 207, 248, 252

Autonomous drone, 192

Autonomous mobile robots (AMR), 251, 262

Baseline,

architecture, 388

SegNet architecture, 60

TRPN, 122

Bayesian fusion, 344

Benchmark datasets, 11, 49

BicycleGAN, 151

BigLittle architecture, 244

Binary class probabilities, 368, 373

Binary classification, 356, 367, 368

Binary classification fusion, 368

Bonnland dataset, 321, 331

Bounding box (BB), 4, 43, 80, 81, 85–87, 92, 93, 95, 96, 98, 102, 103, 112, 113, 115, 117, 118

Broadcast GNSS signal, 219

Building information modeling (BIM), 264

Buildings,

enhanced detection, 376

key geometric features, 326

Bundle adjustment (BA), 224

Camera,

coordinate system, 210

localization, 81

motion, 160, 162, 173

networks, 254

parameters, 160, 263

sensor, 213

smart, 210

standard, 210

system, 245

tracking, 223

trajectory, 235

Candidate architecture, 242

Cellular networks, 216

Central fusion unit, 216

Centralized architecture, 203, 204, 207–209, 212–215, 219–224, 227–229, 232–235, 240–247, 249, 250

Centralized fusion, 214

CEVA deep neural network (CDNN), 242

Cityscape dataset, 32, 34, 154

Classes,

borders, 363

classification, 368

land cover, 359, 369

probabilities, 81, 82, 84, 85, 347, 351, 352, 354, 368

target, 32

Classification, 69, 77, 80–82, 91, 92, 95, 107, 225, 238, 241, 247, 253, 317, 321, 324, 344, 346–349, 354, 356, 358, 361, 363, 364, 367, 369, 370, 372, 375, 376, 387, 391

accuracy, 361, 364, 397

dataset, 6, 281

fusion, 3

land cover, 6, 346, 377

layer, 84, 113

map, 354, 355

margin, 351

methods, 387

models, 107, 365

performance, 357

problem, 357, 365

process, 344, 347

scene, 11, 308, 317, 323, 331, 337

standard, 365

supervised, 365

target, 108, 112

tasks, 67

unsupervised, 345

Classification scores (CS), 112, 115

Classifier, 4, 46, 47, 81, 82, 84, 85, 317, 322, 344–346, 351, 353, 367, 372

CNNs, 80

confidence, 352

head, 48

network, 46

trained, 321

CMOS camera, 210, 263

CNNs, 11, 43–45, 47, 67, 81, 98, 106, 222, 253, 365

architecture, 15

classifier, 80

deep, 42

training, 285

training process, 283

Coarse tracking, 167, 168, 170, 171, 181, 182

Color,

cameras, 152, 264

image, 4, 75, 136, 138, 139, 144, 146, 147, 149, 151, 152, 154

segmentation images, 146

space fusion, 54

Computation architectures, 202

Computer aided design (CAD), 251

Computer vision (CV), 1–3, 137, 201, 221, 247, 384

Concat fusion, 123

Concat layer, 116

Concatenation, 12, 13, 29, 104, 105, 111, 119, 120, 130, 353, 367

Concatenation fusion, 105, 109, 111

Concatenation fusion scheme, 4, 104

Conditional adversarial networks, 146, 147

Conditional random field (CRF), 344

Conduct multitask learning, 117

Confusion, 372, 376

matrix, 55, 61, 345

Connected components (CC), 323

Consistency loss, 139

Constrain, 10

Constructive Solid Geometry (CSG), 330

Convolutional,

layers, 30, 45, 46, 48, 55, 62, 69, 73, 75–77, 81, 83, 84, 110–113, 116–118, 147, 388

networks, 44

neural networks, 3–6, 10, 42, 66–68, 70, 75, 79–81, 84, 92, 222, 253, 332, 346, 365

Convolutional accelerators (CA), 246

Convolutional neural network (CNN), 6, 10, 67, 106, 138, 222, 253, 365, 367

Cooperative fusion, 203, 204, 207–209, 212–215, 219–224, 227–229, 232–235, 240–247, 249, 250

Credibilist decision fusion, 344

Crime scene, 256

Cross correlation function (CCF), 225

Cross entropy loss, 285

Cumulative distribution functions (CDF), 315

Cumulative matching characteristic (CMC), 151

Cumulative weight brightness transfer function (CWBTF), 137

Curb detection, 252

Cyber physical systems (CPS), 262

CycleGAN, 21, 22, 145

Data fusion, 44, 60, 212, 214, 215, 227, 229, 230, 233, 234, 252, 253, 259, 260, 344, 370

algorithms, 227, 230

detection, 260

filter, 203

multimodal, 11

system, 227

Data modalities, 385

Dataset,

annotation, 142

augmentation, 154

classification, 6, 281

EuRoC, 231

GTSRB, 81

KAIST, 118

multimodal, 231, 386

pedestrian, 107

statistics, 294

target, 298

traffic sign, 68

Decentralized architecture, 203, 204, 207–209, 212–215, 219–224, 227–229, 232–235, 240–247, 249, 250

Decoder, 10, 13, 14, 19, 24–27, 36, 44, 47

depth, 29

features, 47

functions, 27

networks, 10, 13, 27

Deconvolutional layer, 147

Deep,

autoencoder architecture, 387

CNNs, 42

convolutional neural networks, 138, 246, 344

convolutional neural networks architecture, 106

learning, 2, 42, 98, 222, 224, 236–238, 241, 245, 253, 266, 280, 281

algorithms, 280

approaches, 222, 384, 387

CNN framework, 3

methods, 49

models, 47, 51

techniques, 280

Deep learning accelerator (DLA), 245

Deep neural network (DNN), 12, 19, 104, 236, 246, 280, 283, 337

Depth,

branches, 48

channel, 397

cues, 384

data, 49, 57, 58, 62, 384, 392, 398

data stream, 386, 387

decoder, 29

estimation, 11, 29–32, 87

methods, 31

task, 32

feature, 58, 388

feature maps, 391, 392

filtering, 51, 53, 54, 57, 59

fusion, 42, 44, 47, 50, 53

fusion approaches, 57

fusion networks, 42

image, 12, 36, 42, 43, 47, 48, 50, 51, 53, 56, 60, 61, 397

information, 44, 48, 53, 58, 384

map, 26–29, 34, 36, 211, 236, 313, 314

measure, 3

modality, 397

model, 393

network, 387, 393, 395

perception, 383, 384

reconstruction, 10

relative, 384

representation, 48

scene, 47

sensors, 210, 248, 384, 397

slice, 76

stream, 391, 399

stream network, 393, 395

values, 29, 48, 51

variables, 171

Detection, 109, 118, 125, 128, 216, 259, 260, 263, 332, 333, 337, 343, 344, 364, 366, 372, 374

accuracy, 80, 104, 125

algorithm, 244

data fusion, 260

loss term, 115, 119, 123

multimodal pedestrian, 4, 103, 104, 108, 109, 112, 116, 125, 130

pedestrian, 4, 101–103, 105–110, 121, 123, 124, 138, 280

performances, 119, 123, 125

results, 103, 108, 119, 120, 123, 124

target, 105, 112

traffic sign, 81

Differential GNSS, 207, 219, 220

Digital elevation model (DEM), 235, 258

Digital signal processors (DSP), 242

Digital surface models (DSM), 49, 50

Direct memory access (DMA), 244

Direct sparse odometry (DSO), 160, 165

Discriminative features, 4

Discriminator network, 19, 147

Distillation loss, 386, 396

Drone, 163, 187, 189, 190, 200, 202, 212, 242, 247, 251, 256–258

devices, 253

Dropout layer, 69, 73, 75–77, 83, 84

Dynamic vision sensor (DVS), 210

eBee drone, 258

Electronic control units (ECU), 261

Encoder, 10, 13, 14, 18, 19, 24–27, 36, 44, 47, 48

Encoder network, 13, 139, 147

Ensemble classifier, 58

Erroneous detection, 363

ETRIMS dataset, 333

Euclidean loss, 387, 391

EuRoC dataset, 5, 160, 163, 186, 189, 191, 193, 196, 255

Evidential fusion rule, 345

Expectation maximization (EM), 309

Extended Kalman filter (EKF), 213, 228

Feature,

fusion, 110, 111, 119, 130

layers, 119

learning, 80, 81, 92, 98, 103, 110

maps, 4, 6, 27, 76, 95, 104, 105, 110, 111, 113, 115–117, 119–121, 123, 130, 387, 390, 394, 396

modalities, 12

Floating diffusion, 208

Floating Point Unit (FPU), 244

Functional safety features, 244

FuseNet, 48, 55, 57, 58, 60, 62

architecture, 57

NRG, 60–62

RGB, 57, 58, 62

Fusion, 372

algorithms, 217, 343

approaches, 49, 62, 343, 349

architecture, 103, 109, 125, 214, 215

classifications, 3, 216

depth, 42, 44, 47, 50, 53

feature, 110, 111, 119, 130

filter, 227, 260

framework, 376

functions, 111, 119

layer, 112

methods, 346, 347, 363, 377

model classification result, 356

multimodal, 2, 4, 103, 105, 110

multimodal feature, 119

pose, 224

problem, 346

process, 3, 358, 362

purpose, 360

result, 350, 353, 367, 368

rules, 6, 344, 345, 347–353, 358, 363, 372, 373, 376, 377

stages, 109

step, 364

strategy, 343

supervised, 353, 372, 376

system, 235

unit, 214

Fuzzy classifier, 346

Generative adversarial network (GAN), 4, 5, 13, 15, 16, 19, 22, 24, 136, 138, 145, 147, 154, 236

Generator network, 147

Geometrical features, 222, 234

Global positioning system (GPS), 205

GNSS, 5, 201–203, 205, 216, 218, 220, 226, 232, 233, 235, 237, 249, 252, 265

accuracy, 219

antennas, 206

measurements, 233

position, 207

receiver, 219

satellites, 206

services, 206

signal, 206, 216, 219, 220

spoofing attack, 233

Graphic processing units (GPU), 242

Gravity direction, 173, 174, 182, 185, 192, 193, 196

Ground sampling distance (GSD), 51

Ground truth, 20, 28, 29, 32, 34, 56, 60–62, 91, 93, 96, 137, 187, 238, 254, 266, 347, 351, 353, 361, 363, 364, 367, 369, 370, 372, 374–377

bounding box, 93

labels, 49

traffic sign, 96

Halfway fusion, 109, 125, 129

Hallucinated feature maps, 387, 393

Hallucination,

learning, 387, 393

loss, 392

network, 6, 385, 386, 391–393, 395–397, 399

network learning process, 396

Handcrafted features, 387

Heterogeneous sensor, 201, 247, 248, 253, 266

Heterogeneous sensor data fusion, 212, 216

Heterogeneous sensor fusion, 239

Hidden layer, 69, 71, 73, 75–77, 83, 84

Hierarchical architectures, 203, 204, 207–209, 212–215, 219–224, 227–229, 232–235, 240–247, 249, 250

High dynamic range (HDR), 244

High performance computer (HPC), 246

HOG features, 79, 107

Holographic processing unit (HPU), 250

HS classification, 362, 363

Hyspex sensors, 360

Image signal processors (ISP), 242

ImageNet classes, 285, 293

ImageNet dataset, 118, 333

IMU measurements, 162, 172, 173, 232, 260

Independent classifiers, 344

Indian regional navigation satellite system (IRNSS), 206

Indoor,

datasets, 62

environments, 216, 218, 237

images, 45

images segmentation, 45

images semantic segmentation, 62

localization, 216, 237, 259, 265

navigation, 218

pedestrian navigation, 232

positioning, 220

scene, 51, 57, 62

scene understanding, 211

semantic segmentation, 49

situations, 220

Inertial data fusion, 230

Inertial measurement unit (IMU), 2, 5, 160, 203

Inertial navigation system (INS), 201

Inertial sensors, 5, 203, 204, 207–209, 212–215, 219–224, 227–229, 232–235, 240–247, 249, 250, 259

Information fusion, 108

Infrared cameras, 136, 253

Infrared sensors, 4

Insufficient training data, 80

Integrate channel features (ICF), 106

Integrated circuit (IC), 209

Intellectual property (IP), 241

ISPRS dataset, 43, 51, 53, 58, 62

ISPRS Vaihingen dataset, 58, 60

Iterative dual correspondence (IDC), 225

Joint multitask training, 12

Joint supervision, 106

KAIST,

dataset, 108, 118

multimodal dataset, 109

testing dataset, 105, 118, 121, 123, 125

Kalman filter (KF), 227

Keyframe pose, 182

Kinematic GNSS, 207

Label decoders, 28

Label encoders, 28

Land cover, 49, 342, 343, 346, 357, 360, 365, 366

classes, 359, 369

classification, 6, 346, 377

fusion scheme, 343

urban, 357

Laser imaging detection, 201

Layer, 45, 46, 71–73, 75–77, 83, 84, 110, 116, 333, 335, 336, 387, 388, 390, 391

classification, 84, 113

fusion, 112

multimodal feature fusion, 110

network, 73

Layer for classification, 106

Learned features, 92, 98

Learning, 4, 13, 15, 67, 68, 75, 125, 130, 235, 238, 247, 253, 266, 281–283, 285, 385, 392, 393, 396, 399

algorithms, 68, 69

deep, 2, 42, 98, 222, 224, 236–238, 241, 245, 253, 266, 280, 281

embedding models, 293

feature, 80, 81, 92, 98, 103, 110

hallucination, 387, 393

machine, 244, 387

multimodal feature, 110

multitask, 10–12, 36, 130

package, 3

paradigm, 385, 391, 392, 399

problems, 24, 236

procedure, 385

process, 68, 69, 387, 392, 393, 396

rate, 29, 74, 119, 285, 394

representations, 6, 286, 398

rule, 73

supervised, 68, 69, 73, 75–77, 83, 84

unsupervised, 68–70, 73, 75–77, 83, 84

Least class confusions, 372

LiDAR, 2, 42, 201, 211–213, 218, 225, 226, 231, 234, 235, 237, 252, 262, 265, 342, 344, 345

measurements, 225, 234

sensors, 5, 202, 211, 212, 216, 225, 237

technology, 256

ToF, 251

Local features, 221, 222

Local networks, 262

Local steering kernel (LSK), 108

Localization, 203, 204, 207–209, 212–215, 219–224, 227–229, 232–235, 240–247, 249, 250

accuracy, 4, 68, 87, 236, 253

algorithms, 249

applications, 247, 253

approaches, 218

camera, 81

error, 96, 97, 218

indoor, 216, 237, 259, 265

maps, 96

methods, 217, 255

multimodal, 5, 202, 218, 221, 226, 227, 231, 237, 239, 248, 255, 259, 261, 265, 266

pedestrian, 202, 255, 259

performance, 96

problem, 223

process, 235, 236

purposes, 202, 222, 231

step, 217

system, 238, 239, 259, 260, 266

techniques, 238, 251–253, 255

Localizing ground penetrating radar (LGPR), 255

Long term evolution (LTE), 249

Loss,

function, 26, 28, 74, 75, 147, 385, 386, 391, 392, 395

multitask, 103, 105, 117

hallucination, 392

localization, 391

Loss term, 113

detection, 119

for classification, 113

for detection, 112, 115, 123

for scene prediction, 123

segmentation supervision, 117

Machine learning, 3, 66, 67, 221, 222, 243, 246, 247, 256, 280

Machine learning performance, 243

Margin loss, 282

Markov random fields (MRF), 347

Masked depth maps, 393

Maximally stable color regions (MSCR), 148

Maximally stable extremal region (MSER), 148, 221

Maximization fusion, 111

Mean intersection over union (MIOU), 30

Megapixel,

cameras, 258

RGB camera, 258

video camera, 250

Micro aerial vehicle (MAV), 160

MIRFlickr dataset, 295, 298, 304

Misclassifications, 363, 364, 374

Miss rate, 118, 119, 123

Mixed reality, 262

Mobile industry processor interface (MIPI), 245

Modalities, 6, 10, 12, 26, 29, 235, 237, 238, 251, 265, 384–386, 388, 390, 393, 394, 396, 398

Modalities feature, 12

Model,

architecture, 27

depth, 393

performance, 399

scene prediction, 113

spatiotemporal features, 387

Monocular camera, 162

Monocular video camera, 233

Monomodal localization, 218

Motion features, 388

Motion sensors, 223

Multicore architecture, 246

Multilayer perceptrons, 72

Multimodal,

aggregated feature, 108

architecture, 31

characteristics, 113

conversion, 36

data, 5, 6, 12, 281, 388

data fusion, 11

deep learning, 2, 13

deep learning techniques, 11

feature,

fusion, 119

fusion architectures, 4

learning, 110

maps, 103, 110–113, 124

fusion, 2, 4, 103, 105, 110

fusion architectures, 105

image, 5, 19, 103, 105, 108, 113, 116, 118, 121

image generation, 139

image translation, 145

localization, 5, 202, 218, 221, 226, 227, 231, 237, 239, 248, 255, 259, 261, 265, 266

networks, 12

pedestrian, 117, 118, 120, 124

detection, 4, 103, 104, 108, 109, 112, 116, 125, 130

detectors, 104, 105, 109, 130

detectors performance, 122

detectors quantitative performance, 118

performances, 111

reconstruction, 150

retrieval, 5

scene understanding, 2, 3

segmentation, 117, 130

segmentation supervision, 116, 124

architectures, 116, 117, 130

infusion architectures, 104

joint training, 105

networks, 124

semantic segmentation, 3

sensing technology, 108

thermal image generation, 150

Multimodality, 146

Multiple,

data modalities, 386

layers, 72

modalities, 31

multispectral datasets, 138

sensors, 90, 250

stream architecture, 388

stream networks, 388

Multipurpose applications, 230

Multiscale discriminator architecture, 21

Multispectral,

datasets, 140, 142

pedestrian detection performance, 4

semantic segmentation, 154

sensors, 368

ThermalWorld dataset, 137, 154

Multistage pipeline, 80

Multitask,

learning, 10–12, 36, 130

framework, 10

methods, 12

scheme, 106

setting, 21

loss function, 103, 105, 117

network, 10

training, 28

Navigation purposes, 205, 218

Network,

architecture, 43–45, 48, 49, 81, 386

branches, 48, 58

classifier, 46

configurations, 390

depth, 387, 393, 395

distillation, 386

hallucination, 6, 385, 386, 391–393, 395–397, 399

in semantic segmentation, 42

input, 47

interface communication tasks, 241

layer, 73

multitask, 10

optimization, 149

parameters, 49

performance, 50, 56, 77

processing, 386

SegNet, 48

single, 80

structure, 21, 47, 48, 55

Neural networks, 4, 12, 13, 20, 42, 48, 51, 67–73, 75, 81, 82, 84, 91, 98, 242, 244, 286

architecture, 62, 238

classification, 253, 266

convolutional, 3–6, 10, 42, 66–68, 70, 75, 79–81, 84, 92, 222, 253, 332, 346, 365

Neural processing unit (NPU), 242

Next generation mobile network (NGMN), 217

Nighttime scenes, 113, 115, 121–124, 128

Nighttime segmentation prediction, 117

Noisy,

depth, 397

depth data, 397

thermal images, 108

training data, 283

NRG based network, 61

NTU dataset, 397

NTU RGB, 386, 387, 390, 393, 394, 399

Object detection, 2, 4, 11, 44, 67, 79, 91, 92, 98, 106, 222, 236, 384, 400

Obstacle detection, 256

Online localization, 201, 217

Optimal,

architecture, 109

fusion architecture, 116

segmentation fusion scheme, 125

Optimum performances, 286

Outdoor localization, 218, 232, 238

Outdoor scenes, 149

Paired training data, 22

Particular filter (PF), 229

Pavia datasets, 363, 364

Peak performance, 241

Pedestrian,

dataset, 107

detection, 4, 101–103, 105–110, 121, 123, 124, 138, 280

joint training, 110

learning task, 103

methods, 108

performance, 106

localization, 202, 255, 259

multimodal, 117, 118, 120, 124

Perception sensors, 202, 207, 234, 265

Performance,

architecture, 109

classification, 357

comparable, 388

comparison, 5, 125, 283, 294

gain, 32, 106, 123, 125, 384

improvements, 36, 294

localization, 96

measure, 152, 397

metrics, 31, 241

model, 399

multimodal pedestrian detection, 119, 124, 125

multitask, 36

network, 50, 56, 77

pedestrian detection, 106

positioning, 220

scene prediction, 121, 122

segmentation supervision, 117, 125

Photometric camera parameters, 167

Pixelwise classification, 333

Pooling layer, 47, 69, 73, 75–77, 81, 83, 84, 113

Portable cameras, 237

Pose, 203, 204, 207–209, 212–215, 219–224, 227–229, 232–235, 240–247, 249, 250

change, 226

estimate, 168, 181

estimation, 6, 182, 192, 221, 225, 246, 256, 263, 308–310, 312–314, 337

fusion, 224

information, 238

regression, 203, 204, 207–209, 212–215, 219–224, 227–229, 231–235, 240–247, 249, 250

relative, 168, 176, 202, 223–225, 309, 310

Position accuracy, 204, 220

Positioning,

accuracy, 237, 251

indoor, 220

performance, 220

sensors, 201

Posterior class probabilities, 348, 358

Potential relocalization, 263

Power networks, 256

Powerful computing architecture, 266

Pretrained SegNet, 57

Privileged information, 385, 386, 392, 399

learning theories, 385

source, 392

theory, 386

PRO cameras, 139, 140, 144

Probabilistic autoencoder, 17

Professional mapping drone, 257

Programmable vision accelerators (PVA), 244

Pseudo random code (PRC), 206

Quadratic entropy (QE), 349

Random forest, 4, 6, 67–69, 73, 75–77, 79, 81–84, 91, 92, 95, 98, 317, 343, 345, 346, 353, 358, 365, 367

Random forest classification, 82, 92

Recognition performance, 91–93, 152

RegDB dataset, 138

Region proposal network (RPN), 112

ReID datasets, 140

ReID performance, 136, 138, 139, 151, 152, 154

Relative,

accuracy, 258

depth, 384

features, 317

pose, 168, 176, 202, 223–225, 309, 310

pose error, 189

Relocalization, 223

Remote sensing, 1–3, 6, 44, 45, 47, 49, 62, 342, 344, 364

Remote sensing modalities, 6

ResNet layer, 390

RF classifier, 321, 368

RGB,

feature maps, 393

FuseNet, 57, 58, 62

image, 11, 26–29, 32, 34, 42, 47, 48, 51, 53

image decoders, 28

image encoder, 28

image generation from depth, 28

SegNet, 57, 58

stream networks, 395

Robot localization, 234

Root mean square error (RMSE), 187

Safety features, 266

Salient features, 225

Satisfactory accuracy, 363

Scene,

building, 313

classification, 11, 308, 317, 323, 331, 337

complexity, 317

components, 2

conditions, 115, 121, 122

decomposition, 323, 324

depth, 47

geometry, 5, 47, 236

information, 117

mapping, 202, 255, 256

prediction, 122

model, 113

networks, 113, 121

performance, 121, 122

reconstruction, 2, 234, 308

segmentation, 47

understanding, 2, 10, 201, 202, 249, 384

Scene prediction network (SPN), 113, 121

Segmentation, 11, 12, 42, 47, 116, 117, 125, 141, 147, 344, 347

images, 147

masks supervision learning, 103

multimodal, 117, 130

outputs, 117

prediction, 117

problems, 43

process, 57

scene, 47

semantic, 2, 4, 10–12, 26, 27, 29–33, 41, 42, 81, 109, 116

supervision, 103, 104, 110, 116–118, 124, 125, 130

joint training, 103, 124

loss term, 117

performance, 117, 125

thermal, 146–148

SegNet, 48, 56, 57, 60, 62

HSD, 58

network, 48

NRG, 58, 60

NRG classifier, 61

NRGD, 58

NRGD outperforms, 58

RGB, 57, 58

RGB network, 62

RGBD, 57

RGBN, 57

Semantic,

feature, 222

feature maps, 110

information, 1, 2, 105, 203, 204, 207–209, 212–215, 219–224, 227–229, 231–235, 240–247, 249, 250

labels, 10–12, 26, 28–30, 32, 34, 144

scene understanding, 42

segmentation, 2, 4, 10–12, 26, 27, 29–33, 41, 42, 81, 109, 116

Sensors,

data, 251

data processing, 213

depth, 210, 248, 384, 397

fusion architectures, 214

multiple, 90, 250

multispectral, 368

networks, 216

positioning, 201

specifications, 204

Shape distribution histogram (SDH), 107

Siamese network, 42, 62

Siamese network structures, 48

SIFT features, 79

Simultaneous location and mapping (SLAM), 5, 201, 243

Single,

block, 213

channel, 148

color image, 139, 146, 147, 152

exposure, 209

frame, 243

fusion processor, 214

image, 95

image classifier, 42

input color image, 152

instruction, 209

model, 330

network, 80

patch, 91

sample, 79

sensor, 213, 232

Smart,

camera, 210

glasses, 202, 205, 207, 248, 249, 251

phone, 66–68, 89, 90, 97, 205, 218, 220, 241–243, 248, 249, 253, 254, 259, 260

phone localization application, 254

Social networks, 241

Software Development Kit (SDK), 249

Spaceborne sensors, 342

Spoofed GNSS signal, 233

Standard dynamic random access memory (SDRAM), 245

Stanford dataset, 43, 50, 51, 56

Static random access memory (SRAM), 246

Stereo camera, 210, 231, 251, 262

Stereo camera sensor, 251

Stereo thermal images, 107

Stochastic gradient descent (SGD), 18, 55

Student network, 392

Supervised,

classification, 365

deep learning, 281

fusion, 353, 372, 376

learning, 68, 69, 73, 75–77, 83, 84

learning algorithms, 87

Supervision, 4, 23, 24, 105, 109, 116, 117, 124, 130, 386

Supervision information, 106

Supervision segmentation, 103, 104, 110, 116–118, 124, 125, 130

Support Vector Machine (SVM), 353, 367

SVM classifier, 358, 360

SVM supervised fusions, 372

Synthetic datasets, 363

Target,

classes, 32

classification, 108, 112

dataset, 298

detection, 105, 112

Teacher learning phase, 393

Teacher network, 386, 387, 391, 392

Temperature sensors, 203

Text embedding space, 284, 298, 302

Text embeddings, 5, 281, 283–286, 289, 293, 294, 298, 299, 302, 304

Thermal,

cameras, 136, 138, 152, 154

images, 107, 108, 121, 122, 136, 138, 139, 142, 144, 145, 148, 149, 151–153

segmentation, 146–148

streams, 111, 120, 121, 123

Toulouse dataset, 359, 363, 364

Traffic signs, 67, 68, 81, 85–89, 91, 92, 94–98

bounding boxes, 91

dataset, 68

detection, 81

Trained,

classifier, 321

CNN, 284

model, 285

network, 139

TRPN, 112, 113, 116, 119, 123–125, 130

architecture, 112

baseline, 122

feature maps, 125

models, 119, 123

Ultrasonic sensors, 253

Underwater localization purposes, 233

Unmanned aerial vehicles (UAV), 2, 202, 233

Unmanned ground vehicle (UGV), 237

Unmatched detections, 118

Unpaired RGB images, 30

Unpooling layers, 47

Unscented Kalman filter (UKF), 213, 229

Unscented transform (UT), 229

Unsupervised,

classification, 345

feature learning methods, 80

learning, 68–70, 73, 75–77, 83, 84

Urban,

footprint detection, 365

land cover, 357

land cover classification, 343, 357

scene classification, 317

scene reconstruction, 6, 337

scenes, 102, 222

Vaihingen dataset, 50

Vehicle localization, 226

Vehicle pose, 233

VGG network, 44, 45, 47

VHR sensors, 357, 365

Video classification, 390

Video graphics array (VGA), 210

Visible features, 108

Visible pedestrian detection, 105, 106, 109

Vision processing unit (VPU), 246

Weak features, 111, 121

WebVision, 287, 293

Weighted parallel iterative closed point (WPICP), 226

Wide video graphics array (WVGA), 231

WiFi communication networks, 216

WiFi localization, 232

Window classification, 107

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

Table of Contents for
Index