Yunhao (Andy) Ge     葛云皓

Research Scientist @ NVIDIA

Email: yunhaog at nvidia dot com

    [About Me] [News] [Publications] [Experience]


About Me

I am a Research Scientist at NVIDIA's Deep Imagination Research group. I have broad research interests in Computer Vision and Robotics, with a recent focus on building multimodal foundation models for physical AI. My research has been integrated into several products for NVIDIA, including NVIDIA Cosmos and NVIDIA Edify. I received my Ph.D. in Computer Science from University of Southern California advised by Prof. Laurent Itti, and was honored with the Amazon ML Fellowship. I was a Visiting Ph.D. Student at Stanford Vision and Learning Lab (SVL) advised by Prof. Jiajun Wu.

Previously, I was fortunate to intern/work at Google Research, Google Cloud AI, Microsoft Research, United Imaging Intelligence, and Flexiv Robotics. Before that, I got my M.Sc. degree at Robotics Institute at Shanghai Jiao Tong University.

                                                               


News & Updates


Research Highlights

Multimodal Foundation Model for Physical AI

Pre-training: Cosmos Predict2 | Cosmos-Predict1
Post-training: Cosmos-Transfer1

 
 

Physical Scene Generation & Understanding

Generation: GenUSD | Edify3D | Edify Image | BEHAVIOR Vision Suite | 3D Copy Paste | Scenethesis | ArtiScene
Understanding: Visual Fact Checker | Describe Anything

Controllable Generation

DreamDistribution | Neural-Sim | Group-Supervised Learning | DALL-E for Detection | Pose Augmentation

 
 

Selected Publications [Google Scholar]

                                                    
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

NVIDIA (Yunhao Ge: core contributor)

[paper] [project page] [code] [huggingface] [video]

Cosmos: World Foundation Model Platform for Physical AI

NVIDIA (Yunhao Ge: core contributor)

Best AI + Best overall of CES 2025

[paper] [project page] [code] [huggingface] [video] [Demo API]

Describe Anything: Detailed Localized Image and Video Captioning

Long Lian, Yifan Ding, Yunhao Ge, Sifei Liu, Hanzi Mao, Boyi Li, Marco Pavone, Ming-Yu Liu, Trevor Darrell, Adam Yala, Yin Cui
ICCV 2025.

[paper] [code] [project page] [demo]

Edify 3D: Scalable High-Quality 3D Asset Generation

NVIDIA (Yunhao Ge: core contributor)

[paper] [project page] [video]

Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

NVIDIA (Yunhao Ge: core contributor)

[paper] [project page] [video]

GenUSD: 3D scene generation made easy

Core contributor
SIGGRAPH 2024.

[paper] [project page]

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Yunhao Ge*, Yihe Tang*, Jiashu Xu*, Cem Gokmen*, Chengshu Li, Wensi Ai, Benjamin Jose Martinez, Arman Aydin, Mona Anvari, Ayush K Chakravarthy, Hong-Xing Yu, Josiah Wong, Sanjana Srivastava, Sharon Lee, Shengxin Zha, Laurent Itti, Yunzhu Li, Roberto Martín-Martín, Miao Liu, Pengchuan Zhang, Ruohan Zhang, Li Fei-Fei, Jiajun Wu
(*=equal contribution)
CVPR 2024 (IEEE Conference on Computer Vision and Pattern Recognition).

[paper] [code] [project page] [tools]

Highlight

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Yunhao Ge, Xiaohui Zeng, Jacob Samuel Huffman, Tsung-Yi Lin, Ming-Yu Liu, Yin Cui
CVPR 2024 (IEEE Conference on Computer Vision and Pattern Recognition).

[paper] [video] [project page]

DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models

Brian Nlong Zhao, Yuhang Xiao*, Jiashu Xu*, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet, Yunhao Ge (*=co-second author, †=equal contribution)
ICLR 2025.

[paper] [code] [project page]

3D Copy-Paste: Physically-Plausible Object Insertion for Monocular 3D Detection

Yunhao Ge, Hong-Xing Yu, Cheng Zhao, Yuliang Guo, Xinyu Huang, Liu Ren, Laurent Itti, Jiajun Wu
NeurIPS 2023 (Advances in Neural Information Processing Systems).

[paper] [code] [project page]

DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection
Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation

Yunhao Ge*, Jiashu Xu*, Brian Nlong Zhao, Neel Joshi, Laurent Itti, Vibhav Vineet (*=equal contribution)
arXiv:2206.09592, 2022.

[paper(Beyond Generation)] [paper(DALL-E for Detection)] [code]

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Yunhao Ge*, Jie Ren*, Andrew Gallagher, Yuxiao Wang, Ming-Hsuan Yang, Hartwig Adam, Laurent Itti, Balaji Lakshminarayanan, and Jiaping Zhao (*=equal contribution)
CVPR 2023 (IEEE/ CVF International Conference on Computer Vision and Pattern Recognition).

[paper] [code] [project page]

Neural-Sim: Learning to Generate Training Data with NeRF

Yunhao Ge, Harkirat Behl*, Jiashu Xu*, Suriya Gunasekar, Neel Joshi, Yale Song, Xin Wang, Laurent Itti, and Vibhav Vineet (*=equal contribution as second author)
ECCV 2022 (European Conference on Computer Vision).

[paper] [code]

Zero-shot Synthesis with Group-Supervised Learning

Yunhao Ge, Sami Abu-El-Haija, Gan Xin and Laurent Itti
ICLR 2021 (International Conference on Learning Representations).

[paper] [code] [project page] [Fonts Dataset] [USC Viterbi Press] [知乎] [AI科技评论]
[ USC News ] [ Tech Xplore ] [ Technology Networks ]


Intern & Work Experience

Google Research, Los Angeles, USA (May. 2022 - Dec. 2022)

Google Cloud AI, Mountain View, USA (Aug. 2021 - May 2022)

Microsoft Research, Redmond, USA (May. 2021 - Aug. 2021)

UII America, Inc, Boston, USA (May. 2020 - Aug. 2020)

Flexiv Ltd, Shanghai, China (May. 2019 - Aug. 2019)

United Imaging Intelligence Co., Ltd, Shanghai, China (Jun. 2018 - Apr. 2019)


Scholarships


Academic Service

Reviewer of the following conferences/journals:

NeurIPS 2023, 2022, 2021
CVPR 2023, 2022
ECCV 2022
ICCV 2023, 2021
ICLR 2023, 2022
ICML 2022
WACV 2023
IEEE Transactions on Medical Imaging (TMI)
IEEE Access
Applied Optics


Last update: August 15, 2025