Jia Jia's Homepage

About me more

Jia Jia
Tenure Professor

Department of Computer Science and Technology
Tsinghua University

Contact:

Email ["jjia", at "tsinghua", "dot", "edu", dot, "cn"]

Office:

3-522, FIT Building, Tsinghua University, Beijing, 100084, China PR.

Biography:

Dr. Jia Jia is a tenure professor in Department of Computer Science and Technology, Tsinghua University. She got bachelor degree at Tsinghua University in 2003, and received her Ph.D. degree from Tsinghua University in 2008.

Research Interests

Her main research interest is digital avatar synthesis, affective computing and human computer speech interaction.

Selected Publications more

Recent Conference Papers

Shikun Sun, Chengrui Wang, Min Zhou, Zixuan Wang, Xiaoyu Qin, Tiezheng Ge, Bo Zheng, Jia Jia. DEPO: Enhancing E-commerce Image Background Generation with Short Trajectory Direct Expected Preference Optimization. In Proceedings of the 33rd ACM International Conference on Multimedia (MM' 25) [PDF]
Sihan Zhao, Zixuan Wang, Tianyu Luan, Jia Jia, Wentao Zhu, Jiebo Luo, Junsong Yuan, Nan Xi. PP-Motion: Physical-Perceptual Fidelity Evaluation for Human Motion Generation. In Proceedings of the 33rd ACM International Conference on Multimedia (MM' 25) [PDF]
Xingqi Wang, Xiaoyuan Yi, Xing Xie, and Jia Jia. Specify Privacy Yourself: Assessing Inference-Time Personalized Privacy Preservation Ability of Large Vision-Language Models. In Proceedings of the 33rd ACM International Conference on Multimedia (MM' 25) [Code] [PDF]
Songtao Zhou, Xiaoyu Qin, Yixuan Zhou, Qixin Wang, Zeyu Jin, Zixuan Wang, Zhiyong Wu, Jia Jia. HarmoniVox: Painting Voices to Match the Avatar's Soul. In Proceedings of the 33rd ACM International Conference on Multimedia (MM'25) [Demo] [PDF]
Houlun Chen, Xin Wang, Hong Chen, Wei Feng, Zihan Song, Jia Jia, Wenwu Zhu. Localizing Step-by-Step: Multimodal Long Video Temporal Grounding with LLM In Proceedings of IEEE International Conference on Multimedia & Expo 2025 (ICME'25) [PDF]
Shikun Sun, Min Zhou, Zixuan Wang, Xubin Li, Tiezheng Ge, Zijie Ye, Xiaoyu Qin, Junliang Xing, Bo Zheng, Jia Jia Minimal Impact ControlNet: Advancing Multi-ControlNet Integration. The Thirteenth International Conference on Learning Representations, 2025 (ICLR'25) [Arxiv] [PDF]
Zijie Ye, Jia-Wei Liu, Shikun Sun, Jia Jia, Mike Zheng Shou. Skinned Motion Retargeting with Dense Geometric Interaction Perception. Advances in Neural Information Processing Systems (NeurIPS'24 Spotlight)
Houlun Chen, Xin Wang, Hong Chen, Zeyang Zhang, Wei Feng, Bin Huang, Jia Jia, Wenwu Zhu. VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding. In Proceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS'24) [Paper]
Yixuan Zhou, Xiaoyu Qin, Zeyu Jin, Shuoyi Zhou, Shun Lei, Songtao Zhou, Zhiyong Wu, Jia Jia. VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling. In Proceedings of the 32nd ACM International Conference on Multimedia (MM'24, Oral) [Demo]
Zeyu Jin, Jia Jia, Qixin Wang, Kehan Li, Shuoyi Zhou, Songtao Zhou, Xiaoyu Qin, Zhiyong Wu. SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description. In Proceedings of the 32th ACM International Conference on Multimedia (MM'24) [PDF] [Arxiv] [Page] [Code]
Zixuan Wang, Jiayi Li, Xiaoyu Qin, Shikun Sun, Songtao Zhou, Jia Jia, Jiebo Luo. DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis. In Proceedings of the 32nd ACM International Conference on Multimedia (MM'24) [PDF] [Arxiv] [Page] [Code]
Zixuan Wang, Jia Jia, Shikun Sun, Haozhe Wu, Rong Han, Zhenyu Li, Di Tang, Jiaqing Zhou, and Jiebo Luo. 2024. DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'24) [PDF] [Arxiv] [Page] [Code]
Shikun Sun, Longhui Wei, Zhicai Wang, Zixuan Wang, Junliang Xing, Jia Jia, Qi Tian. Inner classifier-free guidance and its taylor expansion for diffusion models. In Proceedings of the Twelfth International Conference on Learning Representations (ICLR'24)
Zijie Ye, Jia Jia, Junliang Xing. Semantics2Hands: Transferring Hand Motion Semantics between Avatars. In Proceedings of the 31th ACM International Conference on Multimedia (MM'23 Best BNI Paper) [PDF] [Page]
Zeyu Jin, Zixuan Wang, Qixin Wang, Ye Bai, Yi Zhao, Hao Li, Xiaorui Wang, and Jia Jia. HoloSinger: Semantics and Music Driven Motion Generation with Octahedral Holographic Projection. In Proceedings of the 31th ACM International Conference on Multimedia (MM'23) [PDF]
Houlun Chen, Xin Wang, Xiaohan Lan, Hong Chen, Xuguang Duan, Jia Jia, Wenwu Zhu. Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding. In Proceedings of the 31th ACM International Conference on Multimedia (MM'23) [PDF]
Haoyu Wang, Haozhe Wu, Junliang Xing, Jia Jia. Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space. In Proceedings of the 31th ACM International Conference on Multimedia (MM'23) [PDF]
Shuo Huang, Zongxin Yang, Liangting Li, Yi Yang, Jia Jia. AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion. In Proceedings of the 31th ACM International Conference on Multimedia (MM'23) [PDF]
Haozhe Wu, Songtao Zhou, Jia Jia, Junliang Xing, Qi Wen, Xiang Wen. Speech-Driven 3D Face Animation with Composite and Regional Facial Movements. In Proceedings of the 31th ACM International Conference on Multimedia (MM'23) [PDF]
Xianhao Wei, Jia Jia, Xiang Li, Zhiyong Wu, Ziyi Wang. A Discourse-level Multi-scale Prosodic Model for Fine-Grained Emotion A nalysis. China Multimedia 2023 (China MM'23 Best Paper) [PDF]
Shikun Sun, Longhui Wei, Junliang Xing, Jia Jia, Qi Tian. SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation. In Proceedings of the 40th International Conference on Machine Learning (ICML'23) [PDF]
Zijie Ye, Jia Jia, Haozhe Wu, Shuo Huang, Shikun Sun, Junliang Xing. Salient Co-Speech Gesture Synthesizing with Discrete Motion Representation. International Conference on Acoustics, Speech and Signal Processing (ICASSP'23) [PDF]
Shikun Sun, Jia Jia, Haozhe Wu, Zijie Ye, Junliang Xing. MSNet: A Deep Architecture Using Multi-Sentiment Semantics for Sentiment-Aware Image Style Transfer. International Conference on Acoustics, Speech and Signal Processing (ICASSP'23) [PDF]
Shuo Huang, Jia Jia, Zongxin Yang, Wei Wang, Haozhe Wu, Yi Yang, Junliang Xing. Shuffled Autoregression for Motion Interpolation. International Conference on Acoustics, Speech and Signal Processing (ICASSP'23) [PDF]
Jinghe Cai, Xiaohan Li, Bohan Chen, Zhigang Wang, Jia Jia. CatHill: Emotion-Based Interactive Storytelling Game as a Digital Mental Health Intervention. ACM Conference on Human Factors in Computing Systems (CHI'23) [PDF]
Zhihan Yang, Zhiyong Wu, Ying Shan, Jia Jia. What Does Your Face Sound Like? 3D Face Shape Towards Voice. In Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI'23) [PDF]
Zixuan Wang, Jia Jia, Haozhe Wu, Junliang Xing, Jinghe Cai, Fanbo Meng, Guowen Chen, Yanfeng Wang. GroupDancer: Music to Multi-People Dance Synthesis with Style Collaboration. In Proceedings of the 30th ACM International Conference on Multimedia (MM'22) [PDF]
Jingbei Li, Yi Meng, Xixin Wu, Zhiyong Wu, Jia Jia, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang. Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks. In Proceedings of the 30th ACM International Conference on Multimedia (MM'22) [PDF]
Jia Jia, Wei Chen, Kai Yu, Xiaodong He, Jun Du, Heung-Yeung Shum. The Practice of Speech and Language Processing in China. Communications of the ACM. [PDF]
Haozhe Wu, Jia Jia, Haoyu Wang, Yishun Dou, Chao Duan, Qingshan Deng. Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis. In Proceedings of the 29th ACM International Conference on Multimedia (MM'21) [PDF]
Suping Zhou, Jia Jia, Zhiyong Wu, Zhihan Yang, Yanfeng Wang, Wei Chen, Fanbo Meng, Shuo Huang, Jialie Shen, Xiaochuan Wang. Inferring Emotion from Large-scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI'21) [PDF]
Yaohua Bu, Tianyi Ma, Weijun Li, Hang Zhou, Jia Jia, Shengqi Chen, Kaiyuan Xu, Dachuan Shi, Haozhe Wu, Zhihan Yang, Kun Li, Zhiyong Wu, Yuanchun Shi, Xiaobo Lu, Ziwei Liu. PTeacher: a Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback. International conference on Human-Computer Interaction 2021 (CHI'2021) [PDF]
Zhiyuan Hu, Jia Jia, Bei Liu, Yaohua Bu, Jianlong Fu. Aesthetic-Aware Image Style Transfer. In Proceedings of the 28th ACM International Conference on Multimedia (MM'20) [PDF]
Zijie Ye, Haozhe Wu, Jia Jia, Yaohua Bu, Wei Chen, Fanbo Meng, Ynagfeng Wang. ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit. In Proceedings of the 28th ACM International Conference on Multimedia (MM'20) [PDF]
Haozhe Wu, Jia Jia, Lingxi Xie, Guojun Qi, Yuanchun Shi, Qi Tian. Cross-VAE: Towards Disentangling Expression from Identity For Human Faces. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'20) [PDF]
Suping Zhou,Jia Jia, Long Zhang, Yanfeng Wang, Wei Chen, Fanbo Meng, Fei Yu, Jialie Shen. Inferring Emphasis for Real Voice Data: an Attentive Multimodal Neural Network Approach. The 26th Anniversary International Conference on MultiMedia Modeling(MMM'2020) [PDF]
Tiancheng Shen, Jia Jia, Yan Li, Yihui Ma, Yaohua Bu, Hanjie Wang, Bo Chen, Tat-Seng Chua, Wendy Hall. PEIA: Personality and Emotion Integrated Attentive Model for Music Recommendation on Social Media Platforms In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI'20) [PDF]
Haozhe Wu, Zhiyuan Hu, Jia Jia, Yaohua Bu, Xiangnan He, Tat-Seng Chua. Mining Unfollow Behavior in Large-Scale Online Social Networks via Spatial-Temporal Interaction In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI'20) [PDF]
Yulan Chen, Zhiyong Wu, Jia Jia. Modeling Emotion Influence Using Attention-based Graph Convolutional Recurrent Network In Proceedings of the 21st ACM International Conference on Multimodal Interaction (ICMI'19) [PDF]
Suping Zhou, Jia Jia, Yufeng Yin, Xiang Li, Yang Yao, Ying Zhang, Zeyang Ye, Kehua Lei, Yan Huang, Jialie Shen. Understanding the Teaching Styles by an Attention based Multi-task Cross-media Dimensional modelling In Proceedings of the 27th ACM International Conference on Multimedia (MM'19) [PDF]
Runnan Li, Zhiyong Wu, Jia Jia, Yaohua Bu, Sheng Zhao, Helen Meng. Towards Discriminative Representation Learning for Speech Emotion Recognition In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI'19) [PDF]
Runnan Li, Zhiyong Wu, Jia Jia, Sheng Zhao, Helen Meng. Dilated Residual Network with Multi-Head Self-Attention for Speech Emotion Recognition In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'19) [PDF]
Dongyang Dai, Zhiyong Wu, Runnan Li, Xixin Wu, Jia Jia, Helen Meng. Learning Discriminative Features from Spectrograms Using Center Loss for Speech Emotion Recognition In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'19) [PDF]
Pan Zhou, Wenwen Yang, Wei Chen, Yanfeng Wang, Jia Jia. Modality Attention for End-to-End Audio-Visual Speech Recognition In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'19) [PDF]
Taoran Tang, Hanyang Mao, Jia Jia. AniDance : Real-Time Dance Motion Synthesize to the Song. In Proceedings of the 26th ACM International Conference on Multimedia (MM'18 Best Demo) [PDF]
Taoran Tang, Jia Jia, Hanyang Mao. Dance with Melody: An LSTM-autoencoder Approach to Music-oriented Dance Synthesis In Proceedings of the 26th ACM International Conference on Multimedia (MM'18) [PDF]
Runnan Li, Zhiyong Wu, Jingbei Li, Jia Jia, Chen Wei, Helen Meng. Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs. In Proceedings of the 26th ACM International Conference on Multimedia (MM'18) [PDF]
Suping Zhou, Jia Jia, Yanfeng Wang, Wei Chen, Fanbo Meng, Ya Li, Jianhua Tao. Emotion Inferring from Large-scale Internet Voice Data: A Multimodal Deep Learning Approach. 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia'18). [PDF]
Long Zhang, Jia Jia, Fanbo Meng, Suping Zhou, Wei Chen, Cunjun Zhang, Runnan Li. Emphasis Detection for Voice Dialogue Applications Using Multi-channel Convolutional Bidirectional Long Short-Term Memory Network In Proceedings of the 11th International Symposium on Chinese Spoken Language Processing (ISCSLP'18) [PDF]
Wenjing Cai, Jia Jia, Wentao Han. Inferring Emotions from Image Social Netwoks Using Group-Based Factor Graph Model. In Proceedings of the 19th International Conference on Multimedia & Expo (ICME'18) [PDF]
Jia Jia. Mental Health Computing via Harvesting Social Media Data. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI'18) [PDF]
Tiancheng Shen, Jia Jia, Guangyao Shen, Fuli Feng, Xiangnan He, Huanbo Luan, Jie Tang, Thanassis Tiropanis, Tat-Seng Chua and Wendy Hall. Cross-Domain Depression Detection via Harvesting Social Media. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI'18) [PDF]
Peijun Zhao, Jia Jia, Yongsheng An, Jie liang, Lexing Xie and Jiebo Luo. Analyzing and Predicting Emoji Usages in Social Media. In Proceedings of the Web Conference 2018 (WWW'18) [PDF]
Yihui Ma, Jia Jia, Yufan Hou, Yaohua Bu and Wentao Han. Understanding the Aesthetic Styles of Social Images. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'18) [PDF]
Suping Zhou, Jia Jia, Qi Wang, Yufei Dong, Yufeng Yin and Kehua Lei. Inferring Emotion from Conversational Voice Data: A Semi-supervised Multi-path Generative Neural Network Approach. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI'18) [PDF]
Ye Ma, Xinxing Li, Mingxing Xu, Jia Jia and Lianhong Cai. Multi-scale Context Based Attention for Dynamic Music Emotion Prediction. In Proceedings of the 25th ACM International Conference on Multimedia (MM'17) [PDF]
Guangyao Shen, Jia Jia, Liqiang Nie, Fuli Feng, Cunjun Zhang, Tianrui Hu, Tat-Seng Chua and Wenwu Zhu. Depression Detection via Harvesting Social Media: A Multimodal Dictionary Learning Solution. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI'17) [PDF]
Yihui Ma, Jia Jia, Suping Zhou, Jingtian Fu, Yejun Liu and Zijian Tong. Towards Better Understanding the Clothing Fashion Styles: A Multimodal Deep Learning Approach. In Proceedings of the 31th AAAI Conference on Artificial Intelligence (AAAI'17) [PDF]
Yishuang Ning, Jia Jia, ZhiyongWu, Runnan Li, Yongsheng An, Yanfeng Wang and Helen Meng. Multi-task Deep Learning for User Intention Understanding in Speech Interaction Systems. In Proceedings of the 31th AAAI Conference on Artificial Intelligence (AAAI'17) [PDF]
Shumei Zhang, Jia Jia and Yishuang Ning. INFERRING EMOTIONS FROM HETEROGENEOUS SOCIAL MEDIA DATA: A CROSS-MEDIA AUTO-ENCODER SOLUTION. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'17) [PDF]
Yishuang Ning, Zhiyong Wu, Runnan Li, Jia Jia, Helen Meng and Lianhong Cai. Learning Cross-Lingual Knowledge with Multilingual Blstm for Emphasis Detection with Limited Training Data. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'17) [PDF]
Yuhao Wu, Jia Jia, Feng Lu, and Lianhong Cai. A SYSTEMATIC APPROACH TO COMPUTE PERCEPTUAL DISTRIBUTION OF MONOSYLLABLES. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'17) [PDF]

Recent Journals

Jia Jia, Suping Zhou, Yufeng Yin, Boya Wu, Wei Chen, Fanbo Meng and Yanfeng Wang. Inferring Emotions From Large-scale Internet Voice Data. IEEE Transactions on Multimedia, 2019 (TMM'19) [PDF]
Huijie Lin, Jia Jia, Jiezhong Qiu, Yongfeng Zhang, Guangyao Shen, Lexing Xie, Jie Tang, Ling Feng and Tat-Seng Chua. Detecting Stress Based on Social Interactions in Social Networks. IEEE Transactions on Knowledge & Data Engineering, 2017, PP(99):1820-1833 (TKDE'17) [PDF]
Boya Wu, Jia Jia, Yang Yang, Peijun Zhao, Jie Tang and Qi Tian. Inferring Emotional Tags From Social Images With User Demographics. IEEE Transactions on Multimedia, 2017, PP(99):1-1 (TMM'17) [PDF]
Xishan Zhang, Jia Jia, Ke Gao, Yongdong Zhang, Dongming Zhang, Jintao Li and Qi Tian. Trip Outfits Advisor: Location-Oriented Clothing Recommendation. IEEE Transactions on Multimedia, 2017, PP(99):1-1 (TMM'17) [PDF]
Chao Wu, Yaoxue Zhang, Jia Jia and Wenwu Zhu. Mobile Contextual Recommender System for Online Social Media. IEEE Transactions on Mobile Computing, 2017, PP(99):1-1 (TMC'17) [PDF]