Home Research Publication People Join Us

Research Overview

Vision, Audio, Language
Building deep learning models that can reason about images, videos, audio, and text.
3D & Embodied
Understanding and interacting with 3D environments and physical entities.
Affective & Social
Modeling human emotions and social behaviors, and interpersonal dynamics via AI.
AI Safety
Ensure AI behave reliably, ethically, and aligned with human values.
Vision, Audio & Language
Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation
Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline
Be with you (与你同在)
AI Song Contest 2023
Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis
Affective & Social Computing
ESCoT: Towards Interpretable Emotional Support Dialogue Systems
ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains
Emotionally Situated Text-to-Speech Synthesis in User-Agent Conversation
On The Humanity Of Conversational AI: Evaluating The Psychological Portrayal Of LLMs
3D & Embodied
Scaling Large Motion Models with Million-Level Human Motions
Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?
Explore and Tell: Embodied Visual Captioning in 3D Environments
QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds
AI Safety
Gpt-4 Is Too Smart To Be Safe: Stealthy Chat With Llms Via Cipher
BiasAsker: Measuring the Bias in Conversational AI System
Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models
All Languages Matter: On the Multilingual Safety of LLMs
New Job, New Gender? Measuring the Social Bias in Image Generation Models

Copyright © 2024 AIM3 Lab