【AI Seminar】2025.03.25 Video Understanding and Generation with Multimodal Foundation Models - Prof. Ming-Hsuan Yang

20250325_MingHsuanYang.png

20250325_MingHsuanYang.png

Topic: Video Understanding and Generation with Multimodal Foundation Models

Speaker: Ming-Hsuan Yang.  Department of Electrical Engineering and Computer Science, University of California, Merced

Time: 2025/03/25 (Tue) 14:10-16:00

Venue: The Management Building, 11F, AI Lecture Hall

Join Online: https://reurl.cc/b3M6rd or scan QR code on poster 

About the Speaker:  

Ming-Hsuan Yang is a Professor at the University of California, Merced, and a Research Scientist at Google DeepMind. He has received numerous prestigious awards, including the Google Faculty Award 2009, the NSF CAREER Award 2012, and the Nvidia Pioneer Research Award 2017 and 2018. He received Best Paper Honorable Mention at UIST 2017, Best Paper Honorable Mention at CVPR 2018, Best Student Paper Honorable Mention at ACCV 2018, Longuet-Higgins Prize (Test-of-Time Award) at CVPR 2023, Best Paper at ICML 2024, and Test-of-Time Award from at WACV 2025. Yang is an Associate Editor-in-Chief of IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) and an Associate Editor for the International Journal of Computer Vision (IJCV). Previously, he was the Editor-in-Chief of Computer Vision and Image Understanding (CVIU) and Program Co-Chair for ICCV 2019. He is a Fellow of IEEE, ACM, and AAAI.

Abstract:

Recent advances in vision and language models have significantly improved visual understanding and generation tasks. In this talk, I will present our latest research on designing effective tokenizers for transformers and our efforts to adapt frozen large language models for diverse vision tasks. These tasks include visual classification, video-text retrieval, visual captioning, visual question answering, visual grounding, video generation, stylization, outpainting, and video-to-audio conversion. If time permits, I will also discuss our recent findings on learning diffusion models and dynamic 3D vision.


Organizers: 
College of Intelligent Computing & Artificial Intelligence Research Center

 No registration needed.