Publications

2024


  • SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models
    Yuxun Tang, Yuning Wu, Jiatong Shi, Qin Jin
    Interspeech 2024
  • TokSing: Singing Voice Synthesis based on Discrete Tokens
    Yuning Wu, Chunlei zhang, Jiatong Shi, Yuxun Tang, Shan Yang, Qin Jin
    Interspeech 2024
  • Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing
    Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, Shinji Watanabe
    Interspeech 2024
  • The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
    Xuankai Chang,Jiatong Shi,Jinchuan Tian,Yuning Wu,Yuxun Tang,Yihan Wu,Shinji Watanabe,Yossi Adi,Xie Chen,Qin Jin
    Interspeech 2024
  • Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective
    Zihao Yue, Liang Zhang, Qin Jin
    ACL 2024
  • Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline
    Dingyi Yang, Chunru Zhan, Ziheng Wang, Biao Wang, Tiezheng Ge, Bo Zheng, Qin Jin
    ACL 2024
  • ESCoT: Towards Interpretable Emotional Support Dialogue Systems
    Tenggan Zhang, Xinjie Zhang, Jinming Zhao, Li Zhou, Qin Jin
    ACL 2024
  • Respond in my Language: Mitigating Language Inconsistency in Response Generation based on Large Language Models
    Liang Zhang, Qin Jin, Haoyang Huang, Dongdong Zhang, Furu Wei
    ACL 2024
  • ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains
    Zhaopei Huang, Jinming Zhao, Qin Jin
    IJCAI 2024
  • Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition
    Fengyuan Zhang, Xinjie Zhang, Zhaopei Huang, Qin Jin
    ICME 2024
  • UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
    Yuting Mei, Linli Yao, Qin Jin
    ICMR 2024

2023


  • Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation
    Zihao Yue, Anwen Hu, Liang Zhang, Qin Jin
    NeurIPS 2023
  • Explore and Tell: Embodied Visual Captioning in 3D Environments
    Anwen Hu, Shizhe Chen, Liang Zhang, Qin Jin
    ICCV 2023
  • Prompt-Oriented View-agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World
    Boshen Xu, Sipeng Zheng, Qin Jin
    ACM MM 2023
  • Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences
    Dingyi Yang, Hongyu Chen, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Qin Jin
    ACM MM 2023
  • Emotionally Situated Text-to-Speech Synthesis in User-Agent Conversation
    Yuchen Liu, Haoyu Zhang, Shichao Liu, Xiang Yin, Zejun Ma, Qin Jin
    ACM MM 2023
  • InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation
    Anwen Hu, Shizhe Chen, Liang Zhang, Qin Jin
    ACL 2023
  • UniLG: A Unified Structure-aware Framework for Lyrics Generation
    Tao Qian, Zhong Tian, Jiatong Shi, Yuning Wu, Shuan Guo, Xiang Yin, Qin Jin
    ACL 2023
  • Attractive Storyteller: Stylized Visual Storytelling with Unpaired Text
    Dingyi Yang, Qin Jin
    ACL 2023
  • Movie101: A New Movie Understanding Benchmark
    Zihao Yue, Qi Zhang, Anwen Hu, Liang Zhang, Ziheng Wang, Qin Jin
    ACL 2023
  • Rethinking Benchmarks for Cross-modal Image-Text Retrieval
    Weijing Chen, Linli Yao, Qin Jin
    SIGIR 2023
  • Knowledge Enhanced Model for Live Video Comment Generation
    Jieting Chen, Junkai Ding, Wenping Chen, Qin Jin
    ICME 2023
  • Open-Category Human-Object Interaction Pre-training via Language Modeling Framework
    Sipeng Zheng, Boshen Xu, Qin Jin
    CVPR 2023
  • MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
    Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Nicholas Jing Yuan, Qin Jin, Jianlong Fu, Baining Guo
    CVPR 2023
  • CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
    Linli Yao, Weijing Chen, Qin Jin
    WWW 2023
  • PHONEIX: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor
    Yuning Wu, Jiatong Shi, Tao Qian, Dongji Gao, Qin Jin
    ICASSP 2023
  • Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language
    Yuqi Liu, Luhui Xu, Pengfei Xiong, Qin Jin
    AAAI 2023
  • Accommodating Audio Modality in CLIP for Multimodal Processing
    Ludan Ruan, Anwen Hu, Yuqing Song, Liang Zhang, Sipeng Zheng, Qin Jin
    AAAI 2023
  • MPMQA: Multimodal Question Answering on Product Manuals
    Liang Zhang, Anwen Hu, Jing Zhang, Shuo Hu, Qin Jin
    AAAI 2023
  • Multi-Modal Knowledge Hypergraph for Diverse Image Retrieval
    Yawen Zeng, Qin Jin, Tengfei Bao, Wenfeng Li
    AAAI 2023

2022


  • Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval
    Liang Zhang, Anwen Hu, Qin Jin
    NeurIPS 2022
  • DialogueEIN: Emotion Interaction Network for Dialogue Affective Analysis
    Yuchen Liu, Jinming Zhao, Jingwen Hu, Ruichen Li, Qin Jin
    COLING 2022
  • TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
    Yuqi Liu, Pengfei Xiong, Luhui Xu, Shengming Cao, Qin Jin
    ECCV 2022
  • Few-shot Action Recognition with Hierarchical Matching and Contrastive Learning
    Sipeng Zheng, Shizhe Chen, Qin Jin
    ECCV 2022
  • Unifying Event Detection and Captioning as Sequence Generation via Pre-Training
    Qi Zhang, Yuqing Song, Qin Jin
    ECCV 2022
  • Progressive Learning for Image Retrieval with Hybrid-Modality Queries
    Yida Zhao, Yuqing Song, Qin Jin
    SIGIR 2022
  • VRDFormer: End-to-End Video Visual Relation Detection with Transformers
    Sipeng Zheng, Shizhe Chen, Qin Jin
    CVPR 2022
  • M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database
    Jinming Zhao, Tenggan Zhang, Jingwen Hu, Yuchen Liu, Qin Jin, Xinchao Wang, Haizhou Li
    ACL 2022
  • Image Difference Captioning with Pre-Training and Contrastive Learning
    Linli Yao, Weiying Wang, Qin Jin
    AAAI 2022
  • Training strategies for automatic song writing: a unified framework perspective
    Tao Qian, Jiatong Shi, Shuai Guo, Peter Wu, Qin Jin
    ICASSP 2022
  • MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition
    Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang, Haizhou Li
    ICASSP 2022

Earlier


  • Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
    Yuqing Song, Shizhe Chen, Qin Jin, Wei Luo, Jun Xie, and Fei Huang
    ACM MM 2021
  • Question-controlled Text-aware Image Captioning
    Anwen Hu, Shizhe Chen, Qin Jin
    ACM MM 2021
  • Enhancing Neural Machine Translation with Dual-Side Multimodal Awareness
    Yuqing Song, Shizhe Chen, Qin Jin, Wei Luo, Jun Xie, and Fei Huang
    IEEE Transactions on Multimedia 2021
  • Speech Emotion Recognition via Multi-Level Cross-Modal Distillation
    Ruichen Li, Jinming Zhao, Qin Jin
    Interspeech 2021
  • Missing Modality Imagination Network for Emotion Recognition with Uncertain Missing Modalities
    Jinming Zhao, Ruichen Li, Qin Jin
    ACL 2021
  • MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation
    Jingwen Hu, Yuchen Liu, Jinming Zhao, Qin Jin
    ACL 2021
  • Towards Diverse Paragraph Captioning for Untrimmed Videos
    Yuqing Song, Shizhe Chen, Qin Jin
    CVPR 2021
  • Sequence-To-Sequence Singing Voice Synthesis With Perceptual Entropy Loss
    Jiatong Shi, Shuai Guo, Nan Huo, Yuekai Zhang, Qin Jin
    ICASSP 2021
  • Context-aware Goodness of Pronunciation for Computer-Assisted Pronunciation Trainings
    Jiatong Shi, Nan Huo, Qin Jin
    Interspeech 2020
  • VideoIC: A Video Interactive Comments Dataset and Multimodal Multitask Learning for Comments Generations
    Weiying Wang, Jieting Chen, Qin Jin
    ACM MM 2020
  • ICECAP: Information Concentrated Entity-aware Image Captioning
    Anwen Hu, Shizhe Chen, Qin Jin
    ACM MM 2020
  • Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching
    Jingjun Liang, Ruichen Li, Qin Jin
    ACM MM 2020
  • Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs
    Shizhe Chen, Qin Jin, Peng Wang, Qi Wu
    CVPR 2020
  • Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning
    Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu
    CVPR 2020
  • Better Captioning With Sequence-Level Exploration
    Jia Chen, Qin Jin
    CVPR 2020
  • Skeleton-based Interactive Graph Network for Human Object Interaction Detection
    Sipeng Zheng, Shizhe Chen, Qin Jin
    ICME 2020
  • Unsupervised Bilingual Lexicon Induction from Mono-lingual Multimodal Data
    Shizhe Chen, Qin Jin, Alexandar Hauptmann
    AAAI 2019
  • Cross-culture Multimodal Emotion Recognition with Adversarial Learning
    Jingjun Liang, Shizhe Chen, Jinming Zhao, Qin Jin, Haibo Liu, Li Lu
    ICASSP 2019
  • Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Video
    Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin, Zhaoyang Zeng, Bei Liu, Jianlong Fu, Alexander Hauptmann
    CVPR 2019 ActivityNet Large Scale Activity Recognition Challenge
  • From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots
    Shizhe Chen, Qin Jin, Jianlong Fu
    IJCAI 2019
  • Generating Video Descriptions With Latent Topic Guidance
    Shizhe Chen, Qin Jin, Jia Chen, Alexander G. Hauptmann
    IEEE Transactions on Multimedia 2019
  • Speech Emotion Recognition in Dyadic Dialogues
    Jinming Zhao, Shizhe Chen, Jingjun Liang, Qin Jin
    Interspeech 2019
  • Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
    Yuqing Song, Shizhe Chen, Qin Jin
    ACM MM 2019
  • Visual Relation Detection with Multi-Level Attention
    Sipeng Zheng, Shizhe Chen, Qin Jin
    ACM MM 2019
  • Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
    Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, Jin Zhou
    ACM MM 2019
  • Relation Understanding in Videos
    Sipeng Zheng, Xiangyu Chen, Shizhe Chen, Qin Jin
    ACM MM Grand Challenge: Relation Understanding in Videos 2019
  • Adversarial Domain Adaption for Multi-Cultural Dimensional Emotion Recognition in Dyadic Interactions
    Jinming Zhao, Ruichen Li, Jingjun Liang, Qin Jin
    AVEC 2019
  • Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019
    Shizhe Chen, Yida Zhao, Yuqing Song, Qin Jin, Qi Wu
    ICCV VATEX Video Captioning Challenge 2019
  • YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension
    Weiying Wang, Yongcheng Wang, Shizhe Chen, Qin Jin
    EMNLP 2019
  • RUC_AIM3 at TRECVID 2019: Video to Text
    Yuqing Song, Yida Zhao, Shizhe Chen, Qin Jin
    NIST TRECVID 2019
  • Semi-supervised Multimodal Emotion Recognition With Improved Wasserstein GANs
    Jingjun Liang, Shizhe Chen, Qin Jin
    APSIPA ASC 2019
  • RUC+CMU: System Report for Dense Captioning Events in Videos
    Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin, Alexandar Hauptmann
    CVPR ActivityNet Large Scale Activity Recognition Challenge 2018
  • Class-aware Self-Attention for Audio Event Recognition
    Shizhe Chen, Jia Chen, Qin Jin, Alexandar Hauptmann
    ICMR 2018 (Best Paper Runner-up)
  • Multimodal Dimensional and Continuous Emotion Recognition in Dyadic Video Interactions
    Jinming Zhao, Shizhe Chen, Qin Jin
    Pacific-Rim Conference on Multimedia (PCM) 2018
  • iMakeup: Makeup Instructional Video Dataset for Fine-grained Dense Video Captioning
    Xiaozhu Lin, Qin Jin, Shizhe Chen, Yuqing Song, Yida Zhao
    Pacific-Rim Conference on Multimedia (PCM) 2018
  • Multi-modal Multi-cultural Dimensional Continues Emotion Recognition in Dyadic Interactions
    Jinming Zhao, Ruichen Li, Shizhe Chen, Qin Jin
    ACM MM Audio-Visual Emotion Challenge (AVEC) Workshop 2018
  • Video Captioning with Guidance of Multimodal Latent Topics
    Shizhe Chen, Jia Chen, Qin Jin, Alexandar Hauptmann
    ACM MM 2017
  • Knowing Yourself: Improving Video Caption via In-depth Recap
    Qin Jin, Shizhe Chen, Jia Chen, Alexandar Hauptmann
    ACM MM 2017 (Best Grand Challenge Paper)
  • Multimodal Multi-task Learning for Dimensional and Continuous Emotion Recognition
    Shizhe Chen, Qin Jin, Jinming Zhao, Shuai Wang
    ACM MM Audio-Visual Emotion Challenge (AVEC) Workshop 2017
  • Generating Video Descriptions with Topic Guidance
    Shizhe Chen, Jia Chen, Qin Jin
    ICMR 2017
  • Emotion Recognition with Multimodal Features and Temporal Models
    Shuai Wang, Wenxuan Wang, Jinming Zhao, Shizhe Chen, Qin Jin, Shilei Zhang, Yong Qin
    ICMI 2017
  • Facial Action Units Detection with Multi-Features and-AUs Fusion
    Xinrui Li, Shizhe Chen, Qin Jin
    Automatic Face & Gesture Recognition (FGR) 2017
  • Boosting Recommendation in Unexplored Categories by User Price Preference
    Jia Chen, Qin Jin, Shiwan Zhao, Shenghua Bao, Li Zhang, Zhong Su, Yong Yu
    ACM Transactions on Information Systems (TOIS) 2016
  • Video Emotion Recognition in the Wild Based on Fusion of Multimodal Features
    Shizhe Chen, Xinrui Li, Qin Jin, Shilei Zhang, Yong Qin
    ICMI 2016
  • Describing Videos using Multi-modal Fusion
    Qin Jin, Jia Chen, Shizhe Chen, Yifan Xiong
    ACM MM 2016
  • Semantic Image Profiling for Historic Events: Linking Images to Phrases
    Jia Chen, Qin Jin, Yifan Xiong
    ACM MM 2016
  • Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction
    Shizhe Chen, Qin Jin
    ACM MM 2016
  • History Rhyme: Searching Historic Events by Multimedia Knowledge
    Yifan Xiong, Jia Chen, Qin Jin, Chao Zhang
    ACM MM 2016
  • Detecting Violence in Video using Subclasses
    Xirong Li, Yujia Huo, Qin Jin, Jieping Xu
    ACM MM 2016
  • Generating Natural Video Descriptions via Multimodal Processing
    Qin Jin, Junwei Liang, Xiaozhu Lin
    Interspeech 2016
  • Improving Image Captioning by Concept-based Sentence Reranking
    Xirong Li, Qin Jin
    Pacific-Rim Conference on Multimedia (PCM) 2016 (Best Paper Runner-up)
  • Video Description Generation using Audio and Visual Cues
    Qin Jin, Junwei Liang
    ICMR 2016
  • Exploitation and Exploration Balanced Hierarchical Summary for Landmark Images
    Jia Chen, Qin Jin, Shenghua Bao, Junfeng Ye, Zhong Su, Shimin Chen, Yong Yu
    IEEE Transactions on Multimedia 2015
  • Lead Curve Detection in Drawings with Complex Cross-Points
    Jia Chen, Min Li, Qin Jin, Yongzhe Zhang, Shenghua Bao, Zhong Su, Yong Yu
    Neurocomputing 2015
  • Image Profiling for History Events on the Fly
    Jia Chen, Qin Jin, Yong Yu, Alexander G. Hauptmann
    ACM MM 2015
  • Persistent B+-Trees in Non-Volatile Main Memory
    Shimin Chen, Qin Jin
    VLDB 15
  • Semantic Concept Annotation for User Generated Videos Using Soundtracks
    Qin Jin, Junwei Liang, Xixi He, Gang Yang, Jieping Xu, Xirong Li
    ICMR 2015
  • Speech Emotion Recognition With Acoustic And Lexical Features
    Qin Jin, Chengxin Li, Shizhe Chen, Huimin Wu
    ICASSP 2015
  • Detecting Semantic Concepts In Consumer Videos Using Audio
    Junwei Liang, Qin Jin, Xixi He, Gang Yang, Jieping Xu, Xirong Li
    ICASSP 2015
  • Does Product Recommendation Meet its Waterloo in Unexplored Categories? No, Price Comes to Help
    Jia Chen, Qin Jin, Shiwan Zhao, Shenghua Bao, Li Zhang, Zhong Su, Yong Yu
    SIGIR 2014
  • Semantic Concept Annotation of Consumer Videos at Frame-level Using Audio
    Junwei Liang, Qin Jin, Xixi He, Xirong Li, Gang Yang, Jieping Xu
    Pacific-rim Conference on Multimedia (PCM) 2014
  • Speech Emotion Classification using Acoustic Features
    Shizhe Chen, Qin Jin, Xirong Li, Gang Yang, Jieping Xu
    ISCSLP 2014