Multi-level, Multi-aspect, Multi-modal

We live in a multi-modal world, we learn, we think and we express through multiple modalities. Therefore for AI systems they should have the ability to understand the multi-modal world.
Our research efforts for building AI systems focus on understanding from multi-level, multi-aspect and multi-modal.

AIM3 Research

Vision, Audio & Language
Building deep learning models that can reason about images, videos, audio, and text.
3D & Embodied
Understanding and interacting with 3D environments and physical entities.
Affective & Social Computing
Modeling human emotions and social behaviors, and interpersonal dynamics via AI.
AI Safety
Ensure AI behave reliably, ethically, and aligned with human values.

Demos

Awards
  • ACM Multimedia 2024 Honorable Mention Demo Award
  • 2017-2024 TRECVID (Video to Text Description) Grand Challenge (Rank 1st)
  • 3-5th CVPR/ECCV Affective Behavior Analysis in-the-wild (Rank 1st)
  • CVPR 2021 ActivityNet Entities Object Localization (Rank 1st)
  • 2018-2020 CVPR “ActivityNet Dense Captioning Events in Videos” (Rank 1st)
  • CVPR 2020 The End-of-End-to-End A Video Understanding Pentathlon (Rank 2nd)
  • 2017-2019 Audio-Visual Emotion Challenge (Rank 1st)
  • ICCV 2019 Outstanding Method Award in VATEX Video Captioning Challenge
  • 2019 之江杯全球人工智能大赛视频内容描述生成 (第一名,30万元奖金)
  • CVPR 2019 ActivityNet Large Scale Activity Recognition Challenge (ANET) Temporal Captioning Task (Winner)
  • ACM Multimedia 2019 Audio-Visual Emotion Challenge (Winner)
  • CVPR 2018 ActivityNet Large Scale Activity Recognition Challenge (ANET) Temporal Captioning Task (Winner)
  • ACM Multimedia 2017 Best Grand Challenge Paper Award
  • 2017 ACM Multimedia (Video to Language) Grand Challenge (Rank 1st)
  • 2016 ACM Multimedia (Video to Language) Grand Challenge (Rank 1st)
  • 2016 Audio-Visual Emotion Challenge (AVEC) (Rank 2nd)
  • 2016 MediaEval Movie Emotion Impact Challenge (Rank 1st)
  • 2016 Chinese Multimodal Emotion Challenge (MEC) (Rank 2nd)
  • 2016 NLPCC Chinese Weibo Stance Detection (Rank 1st)
  • "Spoken English Assistant" system in IBM Bluemix computing contest (2nd Place Price)
  • 2015 ImageCLEF (Image Sentence Generation) Evaluation (Rank 1st)