Multimodal Spatial Intelligence for Interacting in a Dynamic World

Abstract:
Artificial intelligence and machine learning are enjoying a period of tremendous progress, driven in large part by scale, compute, and learnable neural representations. However, such innovations have yet to translate to the physical world, as technologies such as self-driving vehicles are still restricted to limited deployments. In this talk, I will argue that autonomy requires spatial three-dimensional understanding integrated with intuitive physical models of a changing world. To do so, I will discuss a variety of models that revisit classic "analysis by synthesis" approaches to scene understanding, taking advantage of recent advances in differentiable rendering and simulation. But to enable data-driven autonomy for safety-critical applications, I will also argue that the community needs new perspectives on data curation and annotation. Toward this end, I will discuss approaches that leverage multimodal vision-language models to better characterize datasets and models.

Bio:
Deva Ramanan is a Professor in the Robotics Institute at Carnegie- Mellon University and the former director of the CMU Center for Autonomous Vehicle Research. His research interests span computer vision and machine learning, with a focus on visual recognition. He was awarded the David Marr Prize in 2009, the PASCAL VOC Lifetime Achievement Prize in 2010, the IEEE PAMI Young Researcher Award in 2012, named one of Popular Science's Brilliant 10 researchers in 2012, named a National Academy of Sciences Kavli Fellow in 2013, won the Longuet-Higgins Prize for fundamental contributions in computer vision in both 2018 and 2024, and was recognized for best paper finalist / honorable mention awards in CVPR 2019, ECCV 2020, and ICCV 2021. His work is supported by NSF, ONR, DARPA, as well as industrial collaborations with Intel, Google, and Microsoft.

He served at the program chair of the IEEE Computer Vision and Pattern Recognition (CVPR) 2018. He is on the editorial board of the International Journal of Computer Vision (IJCV) and is an associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). He regularly serves as a senior program committee member for CVPR, the International Conference on Computer Vision (ICCV), and the European Conference on Computer Vision (ECCV). He also regularly serves on NSF panels for computer vision and machine learning.