Research Overview

Vision, Audio, Language

Building deep learning models that can reason about images, videos, audio, and text.

Understanding and interacting with 3D environments and physical entities.

Affective & Social

Modeling human emotions and social behaviors, and interpersonal dynamics via AI.

Ensure AI behave reliably, ethically, and aligned with human values.

Vision, Audio & Language

Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation

Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline

Be with you (与你同在)

AI Song Contest 2023

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis

Affective & Social Computing

ESCoT: Towards Interpretable Emotional Support Dialogue Systems

ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains

Emotionally Situated Text-to-Speech Synthesis in User-Agent Conversation

On The Humanity Of Conversational AI: Evaluating The Psychological Portrayal Of LLMs

3D & Embodied

Scaling Large Motion Models with Million-Level Human Motions

Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?

Explore and Tell: Embodied Visual Captioning in 3D Environments

QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

AI Safety

Gpt-4 Is Too Smart To Be Safe: Stealthy Chat With Llms Via Cipher

BiasAsker: Measuring the Bias in Conversational AI System

Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models

All Languages Matter: On the Multilingual Safety of LLMs

New Job, New Gender? Measuring the Social Bias in Image Generation Models