With rapid development of social networks and search engines, a surge of interest has been witnessed in jointly analyzing of multimodal data such as text, image, audio and video. To cope with this situation, information acquisition and processing have to be transformed from the form of single modality to multiple modalities. Challenges stemming from the “media gap”, meaning that representations of different media types are inconsistent, are appealing increasing attention. Different types of data from various heterogeneous sources are usually employed together to provide a comprehensive view of an entity.
In order to gather and present innovative research on the following aspects:1) multi-modal representation learning: jointly representation learning, coordinated representation learning and so on; 2) modality alignment: explicit alignment, latent alignment of modalities and so on; 3) modality translation: example-based, model-driven; 4) modality fusion: early fusion, late fusion and so on; 5) co-learning: transfer knowledge between modalities, including their representations and predictive models, we solicit submissions of high-quality manuscripts reporting the state-of-the-art techniques and trends in this field.
List of Topics
- Neural networks, Probabilistic graphical models, Sequential models in representation learning.
- Example based models, generative models in modality translation
- Unsupervised approaches, supervised approaches in modality alignment
- Model-agnostic approaches, Model-based approaches in modality fusion
- Co-training, transfer learning, zero-shot learning
- Methodology and architectures to improve model explainability
- Multi-modal techniques for cross-modal retrieval, recommendation, social media mining and so on.
Important Dates
- Paper Submission Deadline:
November 20, 2020February 12, 2021 - Notification of Acceptance:
December 25, 2020March 19, 2021 - Camera-Ready Deadline:
January 8, 2021March 26, 2021
Paper Submission Instructions
Special session paper manuscripts must be in English of up to 6 pages excluding references (using the IEEE two-column template instructions). Submissions should include the title, author(s), affiliation(s), e-mail address(es), abstract, and postal address(es) on the first page. The templates in Word or LaTex format are available here. To submit your papers to this session, please select the “Special Session: Knowledge-Driven Multi-modal Deep Analysis for Multimedia” in Microsoft CMT submission site.
Special Session Organizers
- Jianwei Zhang (Iwate University, Japan) zhang@iwate-u.ac.jp
- Xiahui Tao (University of Southern Queensland, Australia) Xiaohui.tao@usq.edu.au