Chapter 1: Introduction to Multimodal Scene Understanding
Chapter 2: Deep Learning for Multimodal Data Fusion
2.3. Basics of Multimodal Deep Learning: VAEs and GANs
2.4. Multimodal Image-to-Image Translation Networks
2.5. Multimodal Encoder–Decoder Networks
Chapter 5: Multimodal Fusion Architectures for Pedestrian Detection
5.4. Experimental Results and Discussion
Chapter 6: Multispectral Person Re-Identification Using GAN for Color-to-Thermal Image Translation
Chapter 7: A Review and Quantitative Evaluation of Direct Visual–Inertial Odometry
7.3. Background: Nonlinear Optimization and Lie Groups
7.4. Background: Direct Sparse Odometry
7.5. Direct Sparse Visual–Inertial Odometry
7.6. Calculating the Relative Jacobians
Chapter 8: Multimodal Localization for Embedded Systems: A Survey
8.2. Positioning Systems and Perception Sensors
8.3. State of the Art on Localization Methods
8.4. Multimodal Localization for Embedded Systems
Chapter 9: Self-Supervised Learning from Web Data for Multimodal Retrieval
9.3. Multimodal Text–Image Embedding
9.6. Retrieval on InstaCities1M and WebVision Datasets
9.7. Retrieval in the MIRFlickr Dataset
9.8. Comparing the Image and Text Embeddings
9.9. Visualizing CNN Activation Maps
9.10. Visualizing the Learned Semantic Space with t-SNE
Chapter 10: 3D Urban Scene Reconstruction and Interpretation from Multisensor Imagery
10.2. Pose Estimation for Wide-Baseline Image Sets
10.5. Scene and Building Decomposition
10.7. Conclusion and Future Work
Chapter 11: Decision Fusion of Remote-Sensing Data for Land Cover Classification
11.4. Use Case #2: Urban Footprint Detection
11.5. Final Outlook and Perspectives
Chapter 12: Cross-modal Learning by Hallucinating Missing Modalities in RGB-D Vision
12.3. Generalized Distillation with Multiple Stream Networks