Journal and Conference

Progressive Learning for Image Retrieval with Hybrid-Modality Queries
Yida Zhao, Yuqing Song, Qin Jin
[pdf]
SIGIR, 2022.
VRDFormer: End-to-End Video Visual Relation Detection with Transformers
Sipeng Zheng, Shizhe Chen, Qin Jin
CVPR, 2022.
M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database
Jinming Zhao, Tenggan Zhang, Jingwen Hu, Yuchen Liu, Qin Jin, Xinchao Wang, Haizhou Li
[code]
ACL, 2022.
Image Difference Captioning with Pre-Training and Contrastive Learning
Linli Yao, Weiying Wang, Qin Jin
[pdf]   [code]
AAAI, 2022.
Training strategies for automatic song writing: a unified framework perspective
Tao Qian, Jiatong Shi, Shuai Guo, Peter Wu, Qin Jin
[pdf]
ICASSP, 2022.
MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition
Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang, Haizhou Li
[pdf]
ICASSP, 2022.
Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
Yuqing Song, Shizhe Chen, Qin Jin, Wei Luo, Jun Xie, and Fei Huang

[pdf]   [code]
ACM Multimedia, 2021.
Question-controlled Text-aware Image Captioning
Anwen Hu, Shizhe Chen, Qin Jin
[pdf]   [code]
ACM Multimedia, 2021.
Enhancing Neural Machine Translation with Dual-Side Multimodal Awareness
Yuqing Song, Shizhe Chen, Qin Jin, Wei Luo, Jun Xie, and Fei Huang
[pdf]
IEEE Transactions on Multimedia, 2021.
Speech Emotion Recognition via Multi-Level Cross-Modal Distillation
Ruichen Li, Jinming Zhao, Qin Jin
[pdf]
Interspeech, 2021.
Missing Modality Imagination Network for Emotion Recognition with Uncertain Missing Modalities
Jinming Zhao, Ruichen Li, Qin Jin
[pdf]
ACL, 2021.
MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation
Jingwen Hu, Yuchen Liu, Jinming Zhao, Qin Jin
[pdf]
ACL, 2021.
Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song, Shizhe Chen, Qin Jin
[pdf]   [code]
CVPR, 2021.
Sequence-To-Sequence Singing Voice Synthesis With Perceptual Entropy Loss
Jiatong Shi, Shuai Guo, Nan Huo, Yuekai Zhang, Qin Jin
[pdf]
ICASSP, 2021.
Context-aware Goodness of Pronunciation for Computer-Assisted Pronunciation Trainings
Jiatong Shi, Nan Huo, Qin Jin
[pdf]
Interspeech, 2020.
VideoIC: A Video Interactive Comments Dataset and Multimodal Multitask Learning for Comments Generations
Weiying Wang, Jieting Chen, Qin Jin
[pdf]   [code]
ACM Multimedia, 2020.
ICECAP: Information Concentrated Entity-aware Image Captioning
Anwen Hu, Shizhe Chen, Qin Jin
[pdf]   [code]
ACM Multimedia, 2020.
Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching
Jingjun Liang, Ruichen Li, Qin Jin
[pdf]
ACM Multimedia, 2020.
Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs
Shizhe Chen, Qin Jin, Peng Wang, Qi Wu
[pdf]
CVPR, 2020.
Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning
Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu
[pdf]
CVPR, 2020.
Better Captioning With Sequence-Level Exploration
Jia Chen, Qin Jin
[pdf]
CVPR, 2020.
Skeleton-based Interactive Graph Network for Human Object Interaction Detection
Sipeng Zheng, Shizhe Chen, Qin Jin
[pdf]
ICME, 2020.
Unsupervised Bilingual Lexicon Induction from Mono-lingual Multimodal Data
Shizhe Chen, Qin Jin, Alexandar Hauptmann
[pdf]
AAAI, 2019.
Cross-culture Multimodal Emotion Recognition with Adversarial Learning
Jingjun Liang, Shizhe Chen, Jinming Zhao, Qin Jin, Haibo Liu, Li Lu
[pdf]
ICASSP, 2019.
Activitynet 2019 Task 3:Exploring Contexts for Dense Captioning Events in Video
Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin,Zhaoyang Zeng, Bei Liu, Jianlong Fu, Alexander Hauptmann
[pdf]
CVPR 2019, ActivityNet Large Scale Activity Recognition Challenge.
From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots
Shizhe Chen, Qin Jin, Jianlong Fu
[pdf]
IJCAI, 2019.
Generating Video Descriptions With Latent Topic Guidance
Shizhe Chen, Qin Jin, Jia Chen, Alexander G. Hauptmann
[pdf]
IEEE TRANSACTIONS ON MULTIMEDIA, 2019.
Speech Emotion Recognition in Dyadic Dialogues
Jinming Zhao, Shizhe Chen, Jingjun Liang, Qin Jin
[pdf]
INTERSPEECH, 2019.
Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Yuqing Song, Shizhe Chen, Qin Jin
[pdf]
ACM Multimedia, 2019.
Visual Relation Detection with Multi-Level Attention
Sipeng Zheng, Shizhe Chen, Qin Jin
[pdf]
ACM Multimedia, 2019.
Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, Jin Zhou
[pdf]
ACM Multimedia, 2019.
Relation Understanding in Videos
Sipeng Zheng, Xiangyu Chen, Shizhe Chen, Qin Jin
[pdf]
ACM Multimedia, Grand Challenge: Relation Understanding in Videos, 2019.
Adversarial Domain Adaption for Multi-Cultural DimensionalEmotion Recognition in Dyadic Interactions
Jinming Zhao, Ruichen Li, Jingjun Liang, Qin Jin
[pdf]
AVEC, 2019.
Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019
Shizhe Chen, Yida Zhao, Yuqing Song, Qin Jin, Qi Wu
[pdf]
ICCV, VATEX Video Captioning Challenge 2019.
YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension
Weiying Wang, Yongcheng Wang, Shizhe Chen, Qin Jin
[pdf]
EMNLP, 2019.
RUC_AIM3 at TRECVID 2019: Video to Text
Yuqing Song, Yida Zhao, Shizhe Chen, Qin Jinn
[pdf]
NIST TRECVID, 2019.
Semi-supervised Multimodal Emotion Recognition With Improved Wasserstein GANs
Jingjun Liang, Shizhe Chen, Qin Jin
[pdf]
APSIPA ASC, 2019.
RUC+CMU: System Report for Dense Captioning Events in Videos
Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin, Alexandar Hauptmann
[pdf]
CVPR ActivityNet Large Scale Activity Recognition Challenge, 2018.
Class-aware Self-Attention for Audio Event Recognition
Shizhe Chen, Jia Chen, Qin Jin, Alexandar Hauptmann
[pdf]
ICMR, 2018. (Best Paper Runner-up)
Multimodal Dimensional and Continuous Emotion Recognition in Dyadic Video Interactions
Jinming Zhao, Shizhe Chen, Qin Jin
[pdf]
Pacific-Rim Conference on Multimedia (PCM), 2018.
iMakeup: Makeup Instructional Video Dataset for Fine-grained Dense Video Captioning
Xiaozhu Lin, Qin Jin, Shizhe Chen, Yuqing Song, Yida Zhao
[pdf]
Pacific-Rim Conference on Multimedia (PCM), 2018.
Multi-modal Multi-cultural Dimensional Continues Emotion Recognition in Dyadic Interactions
Jinming Zhao, Ruichen Li, Shizhe Chen, Qin Jin
[pdf]
ACM Multimedia Audio-Visual Emotion Challenge (AVEC) Workshop, 2018.
Video Captioning with Guidance of Multimodal Latent Topics
Shizhe Chen, Jia Chen, Qin Jin, Alexandar Hauptmann
[pdf]
ACM Multimedia, 2017.
Knowing Yourself: Improving Video Caption via In-depth Recap
Qin Jin, Shizhe Chen, Jia Chen, Alexandar Hauptmann
[pdf]
ACM Multimedia, 2017. (Best Grand Challenge Paper)
Multimodal Multi-task Learning for Dimensional and Continuous Emotion Recognition
Shizhe Chen, Qin Jin, Jinming Zhao and Shuai Wang
[pdf]
ACM Multimedia Audio-Visual Emotion Challenge (AVEC) Workshop, 2017.
Generating Video Descriptions with Topic Guidance
Shizhe Chen, Jia Chen, Qin Jin
[pdf]
ICMR, 2017.
Emotion Recognition with Multimodal Features and Temporal Models
Shuai Wang, Wenxuan Wang, Jinming Zhao, Shizhe Chen, Qin Jin, Shilei Zhang, Yong Qin
[pdf]
ICMI, 2017.
Facial Action Units Detection with Multi-Features and-AUs Fusion
Xinrui Li, Shizhe Chen, and Qin Jin
[pdf]
Automatic Face & Gesture Recognition (FGR), 2017.
Boosting Recommendation in Unexplored Categories by User Price Preference
Jia Chen, Qin Jin, Shiwan Zhao, Shenghua Bao, Li Zhang, Zhong Su, Yong Yu
[pdf]
ACM Transactions on Information Systems (TOIS), 2016.
Video Emotion Recognition in the Wild Based on Fusion of Multimodal Features
Shizhe Chen, Xinrui Li, Qin Jin, Shilei Zhang, Yong Qin
[pdf]
ICMI 2016.
Describing Videos using Multi-modal Fusion
Qin Jin, Jia Chen, Shizhe Chen, Yifan Xiong
[pdf]
ACM Multimedia, 2016.
Semantic Image Profiling for Historic Events: Linking Images to Phrases
Jia Chen, Qin Jin, Yifan Xiong
[pdf]
ACM Multimedia 2016.
Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction
Shizhe Chen, Qin Jin
[pdf]
ACM Multimedia 2016.
History Rhyme: Searching Historic Events by Multimedia Knowledge
Yifan Xiong, Jia Chen, Qin Jin, Chao Zhang
[pdf]
ACM Multimedia 2016.
Detecting Violence in Video using Subclasses
Xirong Li, Yujia Huo, Qin Jin, Jieping Xu
[pdf]
ACM Multimedia 2016.
Generating Natural Video Descriptions via Multimodal Processing
Qin Jin, Junwei Liang, Xiaozhu Lin
[pdf]
Interspeech 2016.
Improving Image Captioning by Concept-based Sentence Reranking
Xirong Li, Qin Jin
[pdf]
Pacific-Rim Conference on Multimedia (PCM), 2016. (Best Paper Runner-up)
Video Description Generation using Audio and Visual Cues
Qin Jin, Junwei Liang
[pdf]
ICMR 2016.
Exploitation and Exploration Balanced Hierarchical Summary for Landmark Images
Jia Chen, Qin Jin, Shenghua Bao, Junfeng Ye, Zhong Su, Shimin Chen, Yong Yu
[pdf]
IEEE Transactions on Multimedia (TMM), 2015
Lead Curve Detection in Drawings with Complex Cross-Points
Jia Chen, Min Li, Qin Jin, Yongzhe Zhang, Shenghua Bao, Zhong Su, Yong Yu
[pdf]
Neurocomputing, 2015, 168: 35-46.
Image Profiling for History Events on the Fly
Jia Chen, Qin Jin, Yong Yu, Alexander G. Hauptmann
[pdf]
ACM Multimedia 2015.
Persistent B+-Trees in Non-Volatile Main Memory
Shimin Chen and Qin Jin
[pdf]
VLDB, Hawaii, USA, 2015 (VLDB’15).
Semantic Concept Annotation for User Generated Videos Using Soundtracks
Qin Jin, Junwei Liang, Xixi He, Gang Yang, Jieping Xu, Xirong Li
[pdf]
ICMR 2015.
Speech Emotion Recognition With Acoustic And Lexical Features
Qin Jin, Chengxin Li, Shizhe Chen, Huimin Wu
[pdf]
ICASSP, 2015.
Detecting Semantic Concepts In Consumer Videos Using Audio
Junwei Liang, Qin Jin, Xixi He, Gang Yang, Jieping Xu, Xirong Li
[pdf]
ICASSP, 2015.
Does Product Recommendation Meet its Waterloo in Unexplored Categories? No, Price Comes to Help
Jia Chen, Qin Jin, Shiwan Zhao, Shenghua Bao, Li Zhang, Zhong Su, Yong Yu
[pdf]
SIGIR 2014 (SIGIR’14).
Semantic Concept Annotation of Consumer Videos at Frame-level Using Audio
Junwei Liang, Qin Jin, Xixi He, Xirong Li, Gang Yang, Jieping Xu
[pdf]
Pacific-rim Conference on Multimedia 2014 (PCM’14).
Speech Emotion Classification using Acoustic Features
Shizhe Chen, Qin Jin, Xirong Li, Gang Yang, Jieping Xu
[pdf]
ISCSLP, 2014.