Table of Contents

Cover image

Title page

Copyright

List of Contributors

Chapter 1: Introduction to Multimodal Scene Understanding

Abstract

1.1. Introduction

1.2. Organization of the Book

References

Chapter 2: Deep Learning for Multimodal Data Fusion

Abstract

2.1. Introduction

2.2. Related Work

2.3. Basics of Multimodal Deep Learning: VAEs and GANs

2.4. Multimodal Image-to-Image Translation Networks

2.5. Multimodal Encoder–Decoder Networks

2.6. Experiments

2.7. Conclusion

References

Chapter 3: Multimodal Semantic Segmentation: Fusion of RGB and Depth Data in Convolutional Neural Networks

Abstract

3.1. Introduction

3.2. Overview

3.3. Methods

3.4. Results and Discussion

3.5. Conclusion

References

Chapter 4: Learning Convolutional Neural Networks for Object Detection with Very Little Training Data

Abstract

Acknowledgement

4.1. Introduction

4.2. Fundamentals

4.3. Related Work

4.4. Traffic Sign Detection

4.5. Localization

4.6. Clustering

4.7. Dataset

4.8. Experiments

4.9. Conclusion

References

Chapter 5: Multimodal Fusion Architectures for Pedestrian Detection

Abstract

Acknowledgement

5.1. Introduction

5.2. Related Work

5.3. Proposed Method

5.4. Experimental Results and Discussion

5.5. Conclusion

References

Chapter 6: Multispectral Person Re-Identification Using GAN for Color-to-Thermal Image Translation

Abstract

Acknowledgements

6.1. Introduction

6.2. Related Work

6.3. ThermalWorld Dataset

6.4. Method

6.5. Evaluation

6.6. Conclusion

References

Chapter 7: A Review and Quantitative Evaluation of Direct Visual–Inertial Odometry

Abstract

7.1. Introduction

7.2. Related Work

7.3. Background: Nonlinear Optimization and Lie Groups

7.4. Background: Direct Sparse Odometry

7.5. Direct Sparse Visual–Inertial Odometry

7.6. Calculating the Relative Jacobians

7.7. Results

7.8. Conclusion

References

Chapter 8: Multimodal Localization for Embedded Systems: A Survey

Abstract

8.1. Introduction

8.2. Positioning Systems and Perception Sensors

8.3. State of the Art on Localization Methods

8.4. Multimodal Localization for Embedded Systems

8.5. Application Domains

8.6. Conclusion

References

Chapter 9: Self-Supervised Learning from Web Data for Multimodal Retrieval

Abstract

Acknowledgements

9.1. Introduction

9.2. Related Work

9.3. Multimodal Text–Image Embedding

9.4. Text Embeddings

9.5. Benchmarks

9.6. Retrieval on InstaCities1M and WebVision Datasets

9.7. Retrieval in the MIRFlickr Dataset

9.8. Comparing the Image and Text Embeddings

9.9. Visualizing CNN Activation Maps

9.10. Visualizing the Learned Semantic Space with t-SNE

9.11. Conclusions

References

Chapter 10: 3D Urban Scene Reconstruction and Interpretation from Multisensor Imagery

Abstract

10.1. Introduction

10.2. Pose Estimation for Wide-Baseline Image Sets

10.3. Dense 3D Reconstruction

10.4. Scene Classification

10.5. Scene and Building Decomposition

10.6. Building Modeling

10.7. Conclusion and Future Work

References

Chapter 11: Decision Fusion of Remote-Sensing Data for Land Cover Classification

Abstract

11.1. Introduction

11.2. Proposed Framework

11.3. Use Case #1: Hyperspectral and Very High Resolution Multispectral Imagery for Urban Material Discrimination

11.4. Use Case #2: Urban Footprint Detection

11.5. Final Outlook and Perspectives

References

Chapter 12: Cross-modal Learning by Hallucinating Missing Modalities in RGB-D Vision

Abstract

12.1. Introduction

12.2. Related Work

12.3. Generalized Distillation with Multiple Stream Networks

12.4. Experiments

12.5. Conclusions and Future Work

References

Index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset