Multi-level, Multi-aspect, Multi-modal

We live in a multi-modal world, we learn, we think and we express through multiple modalities. Therefore for AI systems they should have the ability to understand the multi-modal world.
Our research efforts for building AI systems focus on understanding from multi-level, multi-aspect and multi-modal.

AIM3 Reseach

Vision & Language
Building deep learning models that can reason about images, videos, and text.
3D & Embodied
Understanding and interacting with 3D environments and physical entities.
Affective Computing
Recognizing, interpreting, and responding to human emotions.
Music Intelligence
Generating lyrics, melodies, and synthesizing music and singing.

Demos

Awards
  • 2017-2023 TRECVID (Video to Text Description) Grand Challenge (Rank 1st)
  • 3-5th CVPR/ECCV Affective Behavior Analysis in-the-wild (Rank 1st)
  • CVPR 2021 ActivityNet Entities Object Localization (Rank 1st)
  • 2018-2020 CVPR “ActivityNet Dense Captioning Events in Videos” (Rank 1st)
  • CVPR 2020 The End-of-End-to-End A Video Understanding Pentathlon (Rank 2nd)
  • 2017-2019 Audio-Visual Emotion Challenge (Rank 1st)
  • ICCV 2019 Outstanding Method Award in VATEX Video Captioning Challenge
  • 2019 之江杯全球人工智能大赛视频内容描述生成 (第一名,30万元奖金)
  • CVPR 2019 ActivityNet Large Scale Activity Recognition Challenge (ANET) Temporal Captioning Task (Winner)
  • ACM Multimedia 2019 Audio-Visual Emotion Challenge (Winner)
  • CVPR 2018 ActivityNet Large Scale Activity Recognition Challenge (ANET) Temporal Captioning Task (Winner)
  • ACM Multimedia 2017 Best Grand Challenge Paper Award
  • 2017 ACM Multimedia (Video to Language) Grand Challenge (Rank 1st)
  • 2016 ACM Multimedia (Video to Language) Grand Challenge (Rank 1st)
  • 2016 Audio-Visual Emotion Challenge (AVEC) (Rank 2nd)
  • 2016 MediaEval Movie Emotion Impact Challenge (Rank 1st)
  • 2016 Chinese Multimodal Emotion Challenge (MEC) (Rank 2nd)
  • 2016 NLPCC Chinese Weibo Stance Detection (Rank 1st)
  • "Spoken English Assistant" system in IBM Bluemix computing contest (2nd Place Price)
  • 2015 ImageCLEF (Image Sentence Generation) Evaluation (Rank 1st)