Multi-level, Multi-aspect, Multi-modal

We live in a multi-modal world, we learn, we think and we express through multiple modalities. Therefore for AI systems they should have the ability to understand the multi-modal world.
Our research efforts for building AI systems focus on understanding from multi-level, multi-aspect and multi-modal.

AIM3 Research

Vision, Audio & Language

Building deep learning models that can reason about images, videos, audio, and text.

3D & Embodied

Understanding and interacting with 3D environments and physical entities.

Affective & Social Computing

Modeling human emotions and social behaviors, and interpersonal dynamics via AI.

AI Safety

Ensure AI behave reliably, ethically, and aligned with human values.

Awards

ACM Multimedia 2024 Honorable Mention Demo Award
2017-2024 TRECVID (Video to Text Description) Grand Challenge (Rank 1st)
3-5th CVPR/ECCV Affective Behavior Analysis in-the-wild (Rank 1st)
CVPR 2021 ActivityNet Entities Object Localization (Rank 1st)
2018-2020 CVPR “ActivityNet Dense Captioning Events in Videos” (Rank 1st)
CVPR 2020 The End-of-End-to-End A Video Understanding Pentathlon (Rank 2nd)
2017-2019 Audio-Visual Emotion Challenge (Rank 1st)
ICCV 2019 Outstanding Method Award in VATEX Video Captioning Challenge
2019 之江杯全球人工智能大赛视频内容描述生成 (第一名，30万元奖金)
CVPR 2019 ActivityNet Large Scale Activity Recognition Challenge (ANET) Temporal Captioning Task (Winner)
ACM Multimedia 2019 Audio-Visual Emotion Challenge (Winner)
CVPR 2018 ActivityNet Large Scale Activity Recognition Challenge (ANET) Temporal Captioning Task (Winner)
ACM Multimedia 2017 Best Grand Challenge Paper Award
2017 ACM Multimedia (Video to Language) Grand Challenge (Rank 1st)
2016 ACM Multimedia (Video to Language) Grand Challenge (Rank 1st)
2016 Audio-Visual Emotion Challenge (AVEC) (Rank 2nd)
2016 MediaEval Movie Emotion Impact Challenge (Rank 1st)
2016 Chinese Multimodal Emotion Challenge (MEC) (Rank 2nd)
2016 NLPCC Chinese Weibo Stance Detection (Rank 1st)
"Spoken English Assistant" system in IBM Bluemix computing contest (2nd Place Price)
2015 ImageCLEF (Image Sentence Generation) Evaluation (Rank 1st)

Multi-level, Multi-aspect, Multi-modal

AIM3 Research

Demos