IEEE 4th International Conference on Multimedia Information Processing and Retrieval (IEEE MIPR 2021)

March 22-24, 2021 September 8-10, 2021. Tokyo, Japan online.

TCMC Award Talks


2021 TCMC Impact Award

Awardee: Dr. Yong Rui (Lenovo)
Talk Title: Multimedia Computing -- A Journey of Three Decades
Abstract
Since I was at the University of Illinois at Urbana-Champaign in the 1990s, I have been conducting multimedia research and witnessing the development of this field for nearly 30 years. The advancement of deep learning, which gave rise to a multimodality-based algorithm framework, has pushed forward and will continue to push forward this field. In my talk at MIPR'21, I will look into the past and present of multimedia research, and share my personal reflections on its future.
Biography of Dr. Yong Rui
Dr. Yong Rui is the Chief Technology Officer and Senior Vice President of Lenovo Group. He directs Lenovo's corporate technical strategy, research and development directions, and leads the Lenovo Research organization that investigates intelligent devices, artificial intelligence, 5G, cloud and edge computing, and smart vertical solutions. A Fellow of ACM, IEEE, IAPR, SPIE, and a Foreign Member of Academia Europaea and Canadian Academy of Engineering, Dr. Rui is world renowned technologist. He holds 70 patents, and is the recipient of the prestigious 2018 ACM SIGMM Technical Achievement Award and 2016 IEEE Computer Society Edward J. McCluskey Technical Achievement Award.

2021 TCMC Mid-term Career Award

Awardee: Dr. Weiyao Lin (Shanghai Jiao Tong University)
Talk Title: Multi-modality multimedia semantic information understanding and compression
Abstract
With the rapid growth of multimedia applications and services, semantic information, such as objects' motion, action, & property, is of increasing importance in many emerging multimedia applications whose data has become extremely "big". This imposes a huge demand for the efficient extraction and compression of semantic information. In this talk, I will introduce our works on multi-modality multimedia information analysis and compression. Firstly, I will introduce our work on object activity and interaction recognition. We re-model the existing action detection architectures, and develop a long-term parsing & short-term sampling structure. Secondly, we will introduce our work on multi-modality multimedia analysis, which aims to accurately localize and analyze objects based on the joint analysis between audio and video streams. Thirdly I will also present our new work on semantic information compression. We construct a new model to describe the spatial-temporal redundancies in semantic data, and design a new architecture that can compress more than 70% of the semantic data. Finally, I will give some industry application examples of our work.
Biography of Dr. Weiyao Lin
Weiyao Lin received the B.E. degree from Shanghai Jiao Tong University, China, in 2003, the M.E. degree from Shanghai Jiao Tong University, China, in 2005, and the Ph.D degree from the University of Washington, Seattle, USA, in 2010, all in electrical engineering. He is currently a Professor with the Department of Electronic Engineering, Shanghai Jiao Tong University, China. He has authored or coauthored 100+ technical papers on top journals/conferences including TPAMI, IJCV, CVPR, and ICCV. He holds 25 patents and has 10+ under reviewing patents. His research interests include multimedia content understanding, computer vision, video/image compression, and video/image processing applications. Dr. Lin served as an associate editor for IEEE Trans. Image Processing, IEEE Trans. Circuits & Systems for Video Technology, IEEE Trans. Intelligent Transportation Systems. He is an organizing committee chair of International Conference on Image and Graphics (ICIG) 2017, an area chair/senior PC of AAAI’21, ICPR’20, ACM MM'20, BMVC'19, ICIP'19, and ICME'2018, and an organizer of 6+ workshops in ICCV, ECCV, ACM MM, and ICME. He is a member of a number of international technical committees including IEEE TCMC TC, MMSP TC, IEEE MSA TC, and IEEE VSPC TC. He received the TCMC Mid-term Career award in 2021 and Multimedia Rising Star award in ICME'2019, the outstanding Area Chair award in ICME'2018. He is a senior member of IEEE.

2020 TCMC Rising Star Award

Awardee: Dr. Lianli Gao (University of Electronic Science and Technology of China)
Talk Title: Integrating knowledge and natural language for Visual Understanding
Abstract
A long-term goal of AI research is to build intelligent agents to see and understand the complex visual environment in the physical world and then communicate their understanding to other agents or humans with natural language. This goal brings new problems and research challenges. In this talk, I will introduce the motivations and advantages of integrating knowledge and natural language processing techniques for improving visual understanding. Specifically, I will present our recent discoveries at the intersection of vision and language, including image scene graph generation, visual dialogue, visual captioning, VQA and GAN-related image generation with natural language description or prior knowledge.
Biography of Dr. Lianli Gao
Lianli Gao is a researcher in School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC). She received her PhD in Information Technology from The University of Queensland (UQ), Australia, 2015. Her research interests mainly related to investigate and implement novel solutions for multimedia understanding, e.g., video and image content analysis, knowledge discovery, and visual understanding by integrating natural language processing techniques. She has published 50+ papers in prestigious venues. Dr. Gao has received The Alibaba Damo Academy Young Fellow, 2019.

2020 TCMC Impact Award

Awardee: Dr. C.-C. Jay Kuo (University of Southern California)
Talk Title: Bridging Gap between Image Pixels and Semantics via Supervision (slide)
Abstract
The fact that there exists a gap between pixels and semantics of images, called the semantic gap, is known for decades. Its resolution is a long-standing problem. Judged by today’s norm, the sizes of labeled datasets were quite small before 2010. The situation began to change with the introduction of the ImageNet, which was viewed as the engine to drive deep learning in the last decade. The chase of more and more annotated data is a clear evidence of supervision’s role. Supervision manifests itself through two aspects: 1) large-scale, high quality annotated data, and 2) well-designed optimization objectives. The two aspects come into play synergistically. For example, design of optimization objectives highly depends on annotations. The optimization procedure often entails a minimum amount of labeled data, and it is expected to scale well with more data. To illustrate various forms of supervision, experiences are drawn from two application domains in this talk: object detection and metric learning for content-based image retrieval (CBIR).
Biography of Dr. C.-C. Jay Kuo
Dr. C.-C. Jay Kuo received his Ph.D. degree from the Massachusetts Institute of Technology in 1987. He is now with the University of Southern California (USC) as William M. Hogue Professor, Distinguished Professor of Electrical and Computer Engineering and Computer Science, and Director of the Media Communications Laboratory. His research interests are in visual computing and communication. He is a Fellow of AAAS, NAI, IEEE and SPIE. Dr. Kuo has received numerous awards for his outstanding research contributions, including the 2010 Electronic Imaging Scientist of the Year Award, the 2010-11 Fulbright-Nokia Distinguished Chair in Information and Communications Technologies, the 2019 IEEE Computer Society Edward J. McCluskey Technical Achievement Award, the 2019 IEEE Signal Processing Society Claude Shannon-Harry Nyquist Technical Achievement Award, the 2020 IEEE TCMC Impact Award, the 72nd annual Technology and Engineering Emmy Award (2020), and the 2021 IEEE Circuits and Systems Society Charles A. Desoer Technical Achievement Award. Dr. Kuo was Editor-in-Chief for the IEEE Transactions on Information Forensics and Security (2012-2014) and the Journal of Visual Communication and Image Representation (1997-2011). He has guided 160 students to their PhD degrees and supervised 31 postdoctoral research fellows.

2019 TCMC Rising Star Award

Awardee: Dr. Ting Yao (JD AI Research, Beijing, China)
Talk Title: Vision to Language: from Independency, Interaction, to Symbiosis
Abstract
Vision and Language are two fundamental capabilities of human intelligence. Humans routinely perform tasks through the interactions between vision and language, supporting the uniquely human capacity to talk about what they see. That motivates us researchers to expand the horizons of such cross-modal analysis. In particular, vision to language is probably one of the hottest topics in the past five years, with a significant growth in both volume of publications and extensive applications. In this talk, we look into the problem of vision to language, from three different perspectives: 1) Independency – aim for a thorough image/video understanding for language generation; 2) Interaction – explore the (1st, 2nd, …) interaction across vision and language inputs; 3) Symbiosis – learn a universal encoder-decoder structure for vision-language tasks. Moreover, we will also discuss the real-world deployments or services of vision to language.
Biography of Dr. Ting Yao
Ting Yao is currently a Principal Researcher in Vision and Multimedia Lab at JD AI Research, Beijing, China. His research interests include video understanding, vision and language, and deep learning. Prior to joining JD.com, he was a Researcher with Microsoft Research Asia, Beijing, China. Ting is the principal architect of the top-performing multimedia analytic systems in international benchmark competitions such as ActivityNet Large Scale Activity Recognition Challenge 2021-2016, Visual Domain Adaptation Challenge 2019-2017, and COCO Image Captioning Challenge. He is the leader organizer of Pre-training for Video Understanding Challenge in ACM Multimedia 2021 & 2020, and MSR Video to Language Challenge in ACM Multimedia 2017 & 2016. He also built MSR-VTT, a large-scale video to text dataset that is widely used worldwide. His works have led to many awards, including ACM SIGMM Outstanding Ph.D. Thesis Award 2015, ACM SIGMM Rising Star Award 2019, and IEEE TCMC Rising Star Award 2019. He is an Associate Editor of IEEE Trans. on Multimedia.

2019 TCMC Impact Award

Awardee: Ramesh Jain
Talk Title: Multimedia For Healthy Society
Abstract
From its earliest days, multimedia community developed fundamental principles in a context of a challenge the society was facing. A well-defined challenging application focuses research and development while offering concrete problems that help in evaluating technology. The last few years have seen more focus on machine learning related tools as applied to enhancing old applications practical and more efficient. The last two years, however, have shown that the most important challenge for human society is HEALTH, which is not the focus of multimedia community. Fortunately, sensing, knowledge representation, machine learning, data management as well as biological and health sciences have now created a nexus that makes the next challenge clear as well as practical. I believe that multimedia, and its natural new reincarnation the multimodal, technology is the most relevant technology to make the world a healthy and happy place. Healthy society depends on aggregation of its individual’s health as well as on planetary health. Human food and lifestyle are clearly the source of human health and happiness, but also the source of planetary health. In this presentation we will discuss how to create personal health models and build and use these in recommending lifestyle and food that is both enjoyable, healthy, and sensitive to planetary health. It is now within the realm of sciences and technology to help redefine food ecosystem that will help in making individuals as well as our society and the planet healthy. I believe that the multimedia technology is ready to address this important challenge. Let’s seize this opportunity to make multimodal computing the most impactful area of computing.
Biography of Ramesh Jain
Ramesh Jain is an entrepreneur, researcher, and educator. His research interests covered Control Systems (cybernetics), Computer Vision, Artificial Intelligence, and Multimedia Computing. His current research passion is in addressing health issues using cybernetic principles building on the progress in sensors, mobile, processing, artificial intelligence, computer vision, and storage technologies. He is the founding director of the Institute for Future Health at UCI. He is a Fellow of AAAS, ACM, IEEE, AAAI, IAPR, and SPIE. Ramesh co-founded several companies, managed them in initial stages, and then turned them over to professional management. He enjoys new challenges and likes to use technology to solve them. He is participating in addressing the biggest challenge for us all: how to enjoy long life in good health. Towards his new passion, he is working towards building a global community to address food and lifestyle for individual and planetary health.