I am a third-year Master’s student at CS Department, Fudan University advised by Prof. Weifeng Ge. Previously, I received my Bachelor’s Degree in the CS Department, Southeast University, where I worked with Prof. Ding Ding. My research primarily focuses on Multimodal Large Langauge Models and their broad applications (Visual Question Answering, Video Understanding, Embodied-AI, Image/Video Generation, etc.). I’m currently working with Prof. Lifu Huang on building unified models for video understanding and generation.
I am looking for a potential Ph.D. position enrolling in Fall 2025. Welcome to reach out to me if interested :)
📝 Publications and Preprints
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models. [Preprint]
Haibo Wang, Zhiyang Xu, Yu Cheng, Shizhe Diao, Yufan Zhou, Yixin Cao, Qifan Wang, Weifeng Ge, Lifu Huang.
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering. [ACM Multimedia 2024]
Haibo Wang, Chenghang Lai, Yixuan Sun, Weifeng Ge.
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge. [ECCV 2024]
Haibo Wang, Weifeng Ge.
Pixel level Semantic Correspondence through Layout aware Representation Learning and Multi scale Matching Integration. [CVPR 2024] [Paper]
Yixuan Sun*, Zhangyue Yin*, Haibo Wang, Yan Wang, Xipeng Qiu, Weifeng Ge, Wenqiang Zhang.
Object-Centric Cross-Modal Knowledge Reasoning for Future Event Prediction in Videos. [IEEE TCSVT 2024] [Paper]
Chenghang Lai, Haibo Wang, Weifeng Ge, Xiangyang Xue.
IVRSandplay: An Immersive Virtual Reality Sandplay System Coupled with Hand Motion Capture and Eye Tracking. [CSCWD 2023] [Paper]
Haibo Wang, Ding Ding, Yuhao Liu, Chi Wang.
🎖 Honors and Awards
- 2024.10, China National Scholarship (Top 1%)
👩💻 Academic Services
- Reviewer: ICLR 2025.