Qingpei Guo

Ant Group

About me

I am currently a staff algorithm engineer of Ant Group, leading a team of over ten individuals for research and development in multi-modal LLMs. My main interest is to design and deliver cognitive computing services, including computer vision, visual language alignment and multimodal video understanding. I have published 30+ papers in top-tier conferences and journals, including CVPR/ICML/SIGIR/ECCV/IJCAI/AAAI.

I am currently looking for full-time algorithm engineers and research interns. Please contact me (qingpei.gqp@antgroup.com) with your CV if you are interested!

🔥 News

2025.05: 🎉🎉 We proposed Ming-Omni | [], a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation.
2025.06: 🎉🎉 3 papers are accepted by ICCV 2025!

📝 Selected Publications

(Out of 30+ publications)

🧠 Omni LLMs

sym

Ming-Omni: A Unified Multimodal Model for Perception and Generation

Project lead.

Project | |

sym

M2-omni: Advancing omni-mllm for comprehensive modality support with competitive performance

Qingpei Guo, Kaiyou Song, Zipeng Feng, Ziping Ma, Qinglong Zhang, Sirui Gao, Xuzheng Yu, Yunxiao Sun, Tai-Wei Chang, Jingdong Chen, Ming Yang, Jun Zhou

🤖 ModelScope

📸 Multimodal LLMs

CVPR 2025

sym

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding

Yudong Han, Qingpei Guo^*, Liyuan Pan, Liu Liu, Yu Guan, Ming Yang

Corresponding author

CVPR 2025

sym

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating HumanAnnotator Trajectories

Muzhi Zhu, Yuzhuo Tian, Hao Chen, Chunluan Zhou, Qingpei Guo^*, Yang Liu, Ming Yang, Chunhua Shen

Corresponding author

sym

M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining

Qingpei Guo, Furong Xu, Hanxiao Zhang, Wang Ren, Ziping Ma, Lin Ju, Jian Wang, Jingdong Chen, Ming Yang

Project | |🤖 ModelScope

SIGIR 2024

sym

M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval

XingningDong, ZipengFeng, ChunluanZhou, XuzhengYu, MingYang, Qingpei Guo^*

Corresponding author

ICML 2024

sym

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

Ziping Ma, Furong Xu, Jian Liu, Ming Yang, Qingpei Guo^*

Corresponding author

NIPS 2024

sym

Referencing Where to Focus: Improving Visual Grounding with Referential Query

Yabing Wang, Zhuotao Tian, Qingpei Guo, Zheng Qin, Sanping Zhou, Ming Yang, Le Wang

CVPR 2024

sym

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

Shiyu Xuan, Qingpei Guo, Ming Yang, Shiliang Zhang

🛠️ Application of MLLMs

ACL 2024

sym

HOTVCOM: Generating buzzworthy comments for videos

Yuyan Chen, Yiwen Qian, Songzhou Yan, Jiyuan Jia, Zhixu Li, Yanghua Xiao, Xiaobo Li, Aaron Xuxiang Tian, Ming Yang, Qingpei Guo^*

Corresponding author

💬 Invited Talks

2024.07, My speech about our model Ming-Omni.

📌 Patents

US Patents

Guo Q, Chu W. Coding apparatuses, and data processing methods and apparatuses: U.S. Patent Application 18/348,122[P]. 2024-4-25.

Guo Q. Method and system for facilitating recognition of vehicle parts based on a neural network: U.S. Patent 11,475,660[P]. 2022-10-18

Others

一种鲁棒的基于深度学习的连续情绪跟踪方法-CN106919903B

图像处理的方法及装置-CN111524150B

用于训练特征提取模型的方法、特征提取方法和装置- CN116522142A

基于自然交互的隐式身份认证方法-CN106888204B

用于确定文本和视频之间的相似度的方法和装置-CN116958868A

基于图片的意图检测方法及装置-CN115512340A

通过计算机执行的、用于识别车辆部件的方法及装置-CN110705590B

视频特征模型训练方法及装置、视频特征提取方法及装置-CN116721375A

一种通过多图形处理器计算对比损失的方法和装置-CN117556273A

一种聊天机器人应答方法和装置-CN110457456A

一种多模态模型的训练方法及装置-CN117541894A

训练内容理解模型和内容生成模型的方法及装置-CN117235534A

视频编辑方法及装置-CN117315056A

用于确定文本和视频之间的相似度的方法和装置-CN117556276A

大模型的训练方法和装置-CN117521759A

一种获取多模态特征方法和装置-CN117521017A

编码装置、数据处理方法及装置-CN115062782A

透過電腦執行的、用於車輛零件識別的神經網路系統、透過神經網路系統進行車輛零件識別的方法、進行車輛零件識別的裝置和計算設備-TWI742382B

📖 Educations

2014 - 2017, Master of Computer Applied Technology Institute of Software, Chinese Academy of Sciences.
2010 - 2014, Bachelor of Telecommunications Engineering Huazhong University of Science and Technology
2010 - 2014, Bachelor of Business Management (Dual Degrees) Huazhong University of Science and Technology

🎖 Fellowships and Awards

Master Enterprise Supervisor of Chinese Academy of Sciences the Enterprise mentor.
Master Enterprise Supervisor of Fudan University.
The first place of ICDAR MLT Text Localization.
Third place in ICCV COCO Panoptic Segmentation Challenge.
Project Li Pei Bao - Fully automated claims settlement without intervention for the first time.
Project Ding Sun Bao - Shenzhen Fintech Innovation Award.
Reviewer of TPAMI/ CVPR/ ICCV/ IJCAI/ ECCV/ ACL/ ACM MM/ CoLM.
China National Scholarship.
Extreme Ownership Award- Ant Group Annual Awards