About me

I am currently a staff algorithm engineer of Ant Group, leading a team of over ten individuals for research and development in multi-modal LLMs. My main interest is to design and deliver cognitive computing services, including computer vision, visual language alignment and multimodal video understanding. I have published 30+ papers in top-tier conferences and journals, including CVPR/ICML/SIGIR/ECCV/IJCAI/AAAI.

I am currently looking for full-time algorithm engineers and research interns. Please contact me (qingpei.gqp@antgroup.com) with your CV if you are interested!

🔥 News

  • 2025.05:  🎉🎉 We proposed Ming-Omni | [], a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation.

  • 2025.06:  🎉🎉 3 papers are accepted by ICCV 2025!

📝 Selected Publications

(Out of 30+ publications)

🧠 Omni LLMs

sym

M2-omni: Advancing omni-mllm for comprehensive modality support with competitive performance

Qingpei Guo, Kaiyou Song, Zipeng Feng, Ziping Ma, Qinglong Zhang, Sirui Gao, Xuzheng Yu, Yunxiao Sun, Tai-Wei Chang, Jingdong Chen, Ming Yang, Jun Zhou

🤖 ModelScope

📸 Multimodal LLMs

CVPR 2025
sym

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding

Yudong Han, Qingpei Guo*, Liyuan Pan, Liu Liu, Yu Guan, Ming Yang

Corresponding author

Code |

CVPR 2025
sym

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating HumanAnnotator Trajectories

Muzhi Zhu, Yuzhuo Tian, Hao Chen, Chunluan Zhou, Qingpei Guo*, Yang Liu, Ming Yang, Chunhua Shen

Corresponding author

Code |

sym

M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining

Qingpei Guo, Furong Xu, Hanxiao Zhang, Wang Ren, Ziping Ma, Lin Ju, Jian Wang, Jingdong Chen, Ming Yang

Project | |🤖 ModelScope

SIGIR 2024
sym

M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval

XingningDong, ZipengFeng, ChunluanZhou, XuzhengYu, MingYang, Qingpei Guo*

Corresponding author

ICML 2024
sym

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

Ziping Ma, Furong Xu, Jian Liu, Ming Yang, Qingpei Guo*

Corresponding author

NIPS 2024
sym

Referencing Where to Focus: Improving Visual Grounding with Referential Query

Yabing Wang, Zhuotao Tian, Qingpei Guo, Zheng Qin, Sanping Zhou, Ming Yang, Le Wang

CVPR 2024
sym

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

Shiyu Xuan, Qingpei Guo, Ming Yang, Shiliang Zhang

Code|

🛠️ Application of MLLMs

ACL 2024
sym

HOTVCOM: Generating buzzworthy comments for videos

Yuyan Chen, Yiwen Qian, Songzhou Yan, Jiyuan Jia, Zhixu Li, Yanghua Xiao, Xiaobo Li, Aaron Xuxiang Tian, Ming Yang, Qingpei Guo*

Corresponding author

💬 Invited Talks

  • 2024.07, My speech about our model Ming-Omni.

📌 Patents

US Patents

Guo Q, Chu W. Coding apparatuses, and data processing methods and apparatuses: U.S. Patent Application 18/348,122[P]. 2024-4-25.

Guo Q. Method and system for facilitating recognition of vehicle parts based on a neural network: U.S. Patent 11,475,660[P]. 2022-10-18

Others

一种鲁棒的基于深度学习的连续情绪跟踪方法-CN106919903B

图像处理的方法及装置-CN111524150B

用于训练特征提取模型的方法、特征提取方法和装置- CN116522142A

基于自然交互的隐式身份认证方法-CN106888204B

用于确定文本和视频之间的相似度的方法和装置-CN116958868A

基于图片的意图检测方法及装置-CN115512340A

通过计算机执行的、用于识别车辆部件的方法及装置-CN110705590B

视频特征模型训练方法及装置、视频特征提取方法及装置-CN116721375A

一种通过多图形处理器计算对比损失的方法和装置-CN117556273A

一种聊天机器人应答方法和装置-CN110457456A

一种多模态模型的训练方法及装置-CN117541894A

训练内容理解模型和内容生成模型的方法及装置-CN117235534A

视频编辑方法及装置-CN117315056A

用于确定文本和视频之间的相似度的方法和装置-CN117556276A

大模型的训练方法和装置-CN117521759A

一种获取多模态特征方法和装置-CN117521017A

编码装置、数据处理方法及装置-CN115062782A

透過電腦執行的、用於車輛零件識別的神經網路系統、透過神經網路系統進行車輛零件識別的方法、進行車輛零件識別的裝置和計算設備-TWI742382B

📖 Educations

🎖 Fellowships and Awards