Keynote 1
Kien A. Hua: Two Innovative Applications in Multimedia Information Processing and Retrieval Keynote 2
Akio Yamada: On the road to digital transformation via multimedia information processing Keynote 3
Mohan Kankanhalli: Exploring Visual Sentiment: From Experimental Psychology to Computational Modeling
Kien A. Hua: Two Innovative Applications in Multimedia Information Processing and Retrieval Keynote 2
Akio Yamada: On the road to digital transformation via multimedia information processing Keynote 3
Mohan Kankanhalli: Exploring Visual Sentiment: From Experimental Psychology to Computational Modeling
Keynote 1: Two Innovative Applications in Multimedia Information Processing and Retrieval
Kien A. Hua
IEEE Fellow, Professor, University of Central Florida, USA
IEEE Fellow, Professor, University of Central Florida, USA
Abstarct
The Internet of Things is a new frontier for MIPR research. With billions of connected devices deployed at all “corners” of the Internet, it is desirable to have a scalable platform to facilitate sharing of these devices and enable collaboration on IoT software development. We discuss the ThingStore approach, in which IoT app developers may use EQL (Event Query Language) to query events detected by the online devices. It allows formulation of queries on complex events captured in data of different modalities (e.g., video, sensor data). The EQL Server provides a gateway for the IoT apps to access and share the Internet of Things.
Another innovative MIPR application is Deep Composer, a new approach to music generation. AI research currently focus on mimicking universal human ability like face recognition and human activity understanding. A new direction in AI research can emphasize learning to become a specific expert, i.e., intelligence duplication. For instance, we can obtain tens of thousands of tiny music segments from Mozart’s compositions and learn his way to place these tiny music building blocks (MBBs) in an abstract high dimensional space (i.e., learned imbedding) so that the distance between any two MBBs indicates how likely Mozart would use them as adjoining music segments in his composition. This imbedding model, thus, duplicates Mozart’s musical mind. It allows us to artificially generate new Mozart music by piecing his MBBs together in million different ways through a series of KNN retrievals. Besides music, human also express our mind through art and language; and the Deep Composer approach can potentially be applied to these different media as well. While mimicking universal human intelligence is sufficient to support many useful applications such as industrial robots, duplication of talented experts offers the possibility to take automation applications to a whole new level.
The Internet of Things is a new frontier for MIPR research. With billions of connected devices deployed at all “corners” of the Internet, it is desirable to have a scalable platform to facilitate sharing of these devices and enable collaboration on IoT software development. We discuss the ThingStore approach, in which IoT app developers may use EQL (Event Query Language) to query events detected by the online devices. It allows formulation of queries on complex events captured in data of different modalities (e.g., video, sensor data). The EQL Server provides a gateway for the IoT apps to access and share the Internet of Things.
Another innovative MIPR application is Deep Composer, a new approach to music generation. AI research currently focus on mimicking universal human ability like face recognition and human activity understanding. A new direction in AI research can emphasize learning to become a specific expert, i.e., intelligence duplication. For instance, we can obtain tens of thousands of tiny music segments from Mozart’s compositions and learn his way to place these tiny music building blocks (MBBs) in an abstract high dimensional space (i.e., learned imbedding) so that the distance between any two MBBs indicates how likely Mozart would use them as adjoining music segments in his composition. This imbedding model, thus, duplicates Mozart’s musical mind. It allows us to artificially generate new Mozart music by piecing his MBBs together in million different ways through a series of KNN retrievals. Besides music, human also express our mind through art and language; and the Deep Composer approach can potentially be applied to these different media as well. While mimicking universal human intelligence is sufficient to support many useful applications such as industrial robots, duplication of talented experts offers the possibility to take automation applications to a whole new level.
Biography
Kien A. Hua is a Pegasus Professor and Director of the Data Systems Lab at the University of Central Florida. He served as the Associate Dean of Research of the College of Engineering and Computer Science at UCF. Prior to joining the university, he was a Lead Architect at IBM Mid-Hudson Laboratory, where he led a team of senior engineers to develop a highly parallel computer system, the precursor to the highly successful commercial parallel computer known as SP2. More recently, Prof. Hua served as a domain expert on spaceport technology at NASA, and a data analytics expert to advise the U.S. Air Force on the Air Force Strategy 2030 Initiative.
Prof. Hua received his B.S. in Computer Science, and M.S. and Ph.D. in Electrical Engineering, all from the University of Illinois at Urbana-Champaign. His diverse expertise includes multimedia computing, network and wireless communications, Internet of Things, machine learning, and big data analytics. He has published widely with 16 papers recognized as best/top papers at conferences and a journal. Many of his research have had significant impact. His paper on the Chaining technique introduces the peer-to-peer computing technology, currently with many important applications such as data sharing, video streaming, blockchain, and cryptocurrencies. His Skyscraper Broadcast, Patching, and Zigzag techniques have each been heavily cited in the literature and have also inspired many commercial systems in use today.
Prof. Hua has served as a Conference Chair, an Associate Chair, and a Technical Program Committee Member of numerous international conferences, and on the editorial boards of several professional journals. Professor Hua is a Fellow of the IEEE.
Kien A. Hua is a Pegasus Professor and Director of the Data Systems Lab at the University of Central Florida. He served as the Associate Dean of Research of the College of Engineering and Computer Science at UCF. Prior to joining the university, he was a Lead Architect at IBM Mid-Hudson Laboratory, where he led a team of senior engineers to develop a highly parallel computer system, the precursor to the highly successful commercial parallel computer known as SP2. More recently, Prof. Hua served as a domain expert on spaceport technology at NASA, and a data analytics expert to advise the U.S. Air Force on the Air Force Strategy 2030 Initiative.
Prof. Hua received his B.S. in Computer Science, and M.S. and Ph.D. in Electrical Engineering, all from the University of Illinois at Urbana-Champaign. His diverse expertise includes multimedia computing, network and wireless communications, Internet of Things, machine learning, and big data analytics. He has published widely with 16 papers recognized as best/top papers at conferences and a journal. Many of his research have had significant impact. His paper on the Chaining technique introduces the peer-to-peer computing technology, currently with many important applications such as data sharing, video streaming, blockchain, and cryptocurrencies. His Skyscraper Broadcast, Patching, and Zigzag techniques have each been heavily cited in the literature and have also inspired many commercial systems in use today.
Prof. Hua has served as a Conference Chair, an Associate Chair, and a Technical Program Committee Member of numerous international conferences, and on the editorial boards of several professional journals. Professor Hua is a Fellow of the IEEE.
Keynote 2: On the road to digital transformation via multimedia information processing
Akio Yamada
Senior Vice President, NEC Corporation, Japan
Senior Vice President, NEC Corporation, Japan
Abstarct
On the road to digital transformation (DX), the current technology trends and challenges are changing dynamically according to the changes of people and the society. For example, due to the global pandemic, COVID-19 spread, the human beings start converting to new lifestyle for security and safety, and new society style for sustainability from our daily life. People try to re-think the meaning of move and face-to-face, re-find the new values different from the past, and re-new the lifestyle according to our own values. At the same time, the society is trying to divide the labor between humans and machines to exert their essential powers, to require safer and fairer services than before, and to realize the global optimization by visualizing everything as much as possible. Towards these changes on the road to DX, there are a number of big challenges for the realization of DX, including five aspects of remote, online, touchless, automation, and transparency/trust.
Motivated by these challenges, Dr. Yamada will introduce the industrial level framework in this talk regarding how NEC is trying to realize the DX, by demonstrating a series of selected research achievements that contributed to both of the academia and industry.
At the beginning of this talk, Dr. Yamada will abstract the current research and business activities that were conducted at the global NEC Laboratories from the past to the present, to introduce how NEC is utilizing the technologies of multimedia information processing to realize the future DX society.
Subsequently, an industrial level framework of bridging the gaps between the real-world and the social values will be introduced. Dr. Yamada will zoom into the main research category of recognition AI technologies that directly contribute to the realization of DX society. A series of latest research achievements will be selected to demonstrate how NEC is driving the researches on recognition AI technologies for sensing and understanding of persons, objects, and environments in the real world, and for predicting the future towards the realization of DX.
The latest research achievements on face recognition, iris recognition, re-identification, behavior analysis, and action detection regarding the visualization of persons will be introduced first with some interesting demos. Then, Dr. Yamada will continue to introduce some novel technologies, including fingerprint of things, heterogeneous object recognition, high-speed camera imaging, 3D scene understanding, and spatiotemporal reasoning, which are proposed to visualize and understand the objects and environments in the real world.
Finally, Dr. Yamada will conclude this talk by pointing out the potential challenges in other research areas, including analytics AI, control AI, security, network, and system platform, and zooms out back to the big picture to show how NEC is conducting the researches on cutting-edge AI with ICT platforms for realizing the future DX society.
On the road to digital transformation (DX), the current technology trends and challenges are changing dynamically according to the changes of people and the society. For example, due to the global pandemic, COVID-19 spread, the human beings start converting to new lifestyle for security and safety, and new society style for sustainability from our daily life. People try to re-think the meaning of move and face-to-face, re-find the new values different from the past, and re-new the lifestyle according to our own values. At the same time, the society is trying to divide the labor between humans and machines to exert their essential powers, to require safer and fairer services than before, and to realize the global optimization by visualizing everything as much as possible. Towards these changes on the road to DX, there are a number of big challenges for the realization of DX, including five aspects of remote, online, touchless, automation, and transparency/trust.
Motivated by these challenges, Dr. Yamada will introduce the industrial level framework in this talk regarding how NEC is trying to realize the DX, by demonstrating a series of selected research achievements that contributed to both of the academia and industry.
At the beginning of this talk, Dr. Yamada will abstract the current research and business activities that were conducted at the global NEC Laboratories from the past to the present, to introduce how NEC is utilizing the technologies of multimedia information processing to realize the future DX society.
Subsequently, an industrial level framework of bridging the gaps between the real-world and the social values will be introduced. Dr. Yamada will zoom into the main research category of recognition AI technologies that directly contribute to the realization of DX society. A series of latest research achievements will be selected to demonstrate how NEC is driving the researches on recognition AI technologies for sensing and understanding of persons, objects, and environments in the real world, and for predicting the future towards the realization of DX.
The latest research achievements on face recognition, iris recognition, re-identification, behavior analysis, and action detection regarding the visualization of persons will be introduced first with some interesting demos. Then, Dr. Yamada will continue to introduce some novel technologies, including fingerprint of things, heterogeneous object recognition, high-speed camera imaging, 3D scene understanding, and spatiotemporal reasoning, which are proposed to visualize and understand the objects and environments in the real world.
Finally, Dr. Yamada will conclude this talk by pointing out the potential challenges in other research areas, including analytics AI, control AI, security, network, and system platform, and zooms out back to the big picture to show how NEC is conducting the researches on cutting-edge AI with ICT platforms for realizing the future DX society.
Biography
Akio Yamada received Ph.D. degree in Electronic and Information Engineering from Nagoya University and joined Central Research Laboratories of NEC Corporation in 1993. He started his research career in digital media distribution system and expanded it to media content recognition, ICT system architecture, and knowledge discovery science. After having business experiences in enterprise DX market as Vice President on Technology, he is now the Senior Vice President and the Head of NEC Laboratories. He has also made high-impact contributions to many international standards in media content distribution and processing area, known as MPEG and JPEG, for about 20 years and received a lot of awards, including METI Industrial Standardization Award, ITSCJ Standardization Contribution Award, ITE Niwa-Takayanagi Award (Best Paper Award), etc. He is currently serving as the Vice President of Operations Research Society of Japan (ORSJ), and was the Director of the Institute of Electronics, Information and Communication Engineers (IEICE) responsible to the international coordination and publicity from 2020 to 2021.
Akio Yamada received Ph.D. degree in Electronic and Information Engineering from Nagoya University and joined Central Research Laboratories of NEC Corporation in 1993. He started his research career in digital media distribution system and expanded it to media content recognition, ICT system architecture, and knowledge discovery science. After having business experiences in enterprise DX market as Vice President on Technology, he is now the Senior Vice President and the Head of NEC Laboratories. He has also made high-impact contributions to many international standards in media content distribution and processing area, known as MPEG and JPEG, for about 20 years and received a lot of awards, including METI Industrial Standardization Award, ITSCJ Standardization Contribution Award, ITE Niwa-Takayanagi Award (Best Paper Award), etc. He is currently serving as the Vice President of Operations Research Society of Japan (ORSJ), and was the Director of the Institute of Electronics, Information and Communication Engineers (IEICE) responsible to the international coordination and publicity from 2020 to 2021.
Keynote 3: Exploring Visual Sentiment: From Experimental Psychology to Computational Modeling
Mohan Kankanhalli
IEEE Fellow, Provost's Chair Professor, National University of Singapore
IEEE Fellow, Provost's Chair Professor, National University of Singapore
Abstarct
A picture is worth a thousand words. Visual representation is one of the dominant forms of social media. The emotions that viewers feel when observing a visual content is often referred to as the content's visual sentiment. Analysis of visual sentiment has become increasingly important due to the huge volume of online visual data generated by users of social media. Automatic assessment of visual sentiment has many applications, such as monitoring the mood of the population in social media platforms (e.g., Twitter, Facebook), facilitating advertising, and understanding user behavior. However, in contrast to the extensive research on predicting textual sentiment, relatively less work has been done on sentiment analysis of visual content. Moreover, visual sentiment is more subjective and implicit when compared to textual sentiment. There exists a significant semantic gap between high-level visual perception and low-level computational attributes.
In this talk, we argue that these challenges can be addressed by combining findings from the psychology and cognitive science domains. We will first briefly overview our human-centric research framework, which focuses on applying the paradigms and methodologies from experimental psychology to computer science. We will present four of our works on visual sentiment, guided by this research framework. Our first work focuses on how multiple visual factors affect human perception of digital images. We build a dataset with quantitative measures for human perception of image attributes under different viewing conditions. Statistical analyses indicate varying importance of holistic cues, color information, semantics, and saliency on different types of attributes. Based on these insights we build an empirical model of human image perception. Moreover, we designed computational models that predict high-level image attributes. Extensive experiments demonstrate that understanding human visual perception helps create better computational models.
In our second work, we investigate the relation between human attention and visual sentiment. We quantitatively measure how human attention interacts with various emotional properties of an image. We build a unique emotional eye fixation dataset with object and scene-level human annotations, and exploit comprehensively how human attention is affected by emotional properties of images. Guided by the results of our human studies, we design a deep convolutional neural network for human attention prediction. Results demonstrate that efficient encoding of image sentiment information helps boost its performance.
Our third work explores the opposite i.e.how human attention can be used in predicting visual sentiment. We experimentally disentangle effects of focal information and contextual information on human emotional reactions. We then incorporate related insights into computational models. On two benchmark datasets, the proposed computational models demonstrate superior performance on visual sentiment prediction.
Finally, we extend our research on image sentiment to video sentiment, and explore the role of motion in sentiment perception. In particular, we compare emotions and perceptions elicited by short videos versus static frames extracted from the videos. We show that static frames and videos elicit most emotions similarly, but static frames elicit negative emotions more strongly than videos. We test two complementary explanations: differential activation of suspense and the peak-end rule. These findings help us to computationally model human reactions more faithfully with fewer video frames. We will end with future research directions on visual sentiment analysis. Our studies highlight the importance of understanding human cognition for interpreting the latent sentiments behind visual scenes. Our interdisciplinary results have important implications for methods, theory, and applications in diverse fields, including social psychology, computer vision, mass media, and marketing.
A picture is worth a thousand words. Visual representation is one of the dominant forms of social media. The emotions that viewers feel when observing a visual content is often referred to as the content's visual sentiment. Analysis of visual sentiment has become increasingly important due to the huge volume of online visual data generated by users of social media. Automatic assessment of visual sentiment has many applications, such as monitoring the mood of the population in social media platforms (e.g., Twitter, Facebook), facilitating advertising, and understanding user behavior. However, in contrast to the extensive research on predicting textual sentiment, relatively less work has been done on sentiment analysis of visual content. Moreover, visual sentiment is more subjective and implicit when compared to textual sentiment. There exists a significant semantic gap between high-level visual perception and low-level computational attributes.
In this talk, we argue that these challenges can be addressed by combining findings from the psychology and cognitive science domains. We will first briefly overview our human-centric research framework, which focuses on applying the paradigms and methodologies from experimental psychology to computer science. We will present four of our works on visual sentiment, guided by this research framework. Our first work focuses on how multiple visual factors affect human perception of digital images. We build a dataset with quantitative measures for human perception of image attributes under different viewing conditions. Statistical analyses indicate varying importance of holistic cues, color information, semantics, and saliency on different types of attributes. Based on these insights we build an empirical model of human image perception. Moreover, we designed computational models that predict high-level image attributes. Extensive experiments demonstrate that understanding human visual perception helps create better computational models.
In our second work, we investigate the relation between human attention and visual sentiment. We quantitatively measure how human attention interacts with various emotional properties of an image. We build a unique emotional eye fixation dataset with object and scene-level human annotations, and exploit comprehensively how human attention is affected by emotional properties of images. Guided by the results of our human studies, we design a deep convolutional neural network for human attention prediction. Results demonstrate that efficient encoding of image sentiment information helps boost its performance.
Our third work explores the opposite i.e.how human attention can be used in predicting visual sentiment. We experimentally disentangle effects of focal information and contextual information on human emotional reactions. We then incorporate related insights into computational models. On two benchmark datasets, the proposed computational models demonstrate superior performance on visual sentiment prediction.
Finally, we extend our research on image sentiment to video sentiment, and explore the role of motion in sentiment perception. In particular, we compare emotions and perceptions elicited by short videos versus static frames extracted from the videos. We show that static frames and videos elicit most emotions similarly, but static frames elicit negative emotions more strongly than videos. We test two complementary explanations: differential activation of suspense and the peak-end rule. These findings help us to computationally model human reactions more faithfully with fewer video frames. We will end with future research directions on visual sentiment analysis. Our studies highlight the importance of understanding human cognition for interpreting the latent sentiments behind visual scenes. Our interdisciplinary results have important implications for methods, theory, and applications in diverse fields, including social psychology, computer vision, mass media, and marketing.
Biography
Mohan Kankanhalli is Provost's Chair Professor of Computer Science at the National University of Singapore (NUS). He is also the Dean of NUS School of Computing. Before becoming the Dean in July 2016, he was the NUS Vice Provost (Graduate Education) during 2014-2016 and Associate Provost during 2011-2013. Mohan obtained his BTech from IIT Kharagpur and MS & PhD from the Rensselaer Polytechnic Institute. Mohan’s research interests are in Multimedia Computing, Information Security and Privacy, Image/Video Processing and Social Media Analysis.
He directs N-CRiPT (NUS Centre for Research in Privacy Technologies) which conducts research on privacy on structured as well as unstructured (multimedia, sensors, IoT) data. N-CRiPT looks at privacy at both individual and organizational levels along the entire data life cycle. He is personally involved in privacy research related to images, video and social media as well as privacy risk management. N-CRiPT, which has been funded by Singapore’s National Research Foundation, works with many industry, government and academic partners. Mohan is a Fellow of IEEE.
Mohan Kankanhalli is Provost's Chair Professor of Computer Science at the National University of Singapore (NUS). He is also the Dean of NUS School of Computing. Before becoming the Dean in July 2016, he was the NUS Vice Provost (Graduate Education) during 2014-2016 and Associate Provost during 2011-2013. Mohan obtained his BTech from IIT Kharagpur and MS & PhD from the Rensselaer Polytechnic Institute. Mohan’s research interests are in Multimedia Computing, Information Security and Privacy, Image/Video Processing and Social Media Analysis.
He directs N-CRiPT (NUS Centre for Research in Privacy Technologies) which conducts research on privacy on structured as well as unstructured (multimedia, sensors, IoT) data. N-CRiPT looks at privacy at both individual and organizational levels along the entire data life cycle. He is personally involved in privacy research related to images, video and social media as well as privacy risk management. N-CRiPT, which has been funded by Singapore’s National Research Foundation, works with many industry, government and academic partners. Mohan is a Fellow of IEEE.