Multi-Modal Visual Pattern Recognition

The workshop on Multi-Modal Visual Pattern Recognition aims to provide a comprehensive platform for researchers and practitioners to discuss recent advancements, challenges, and opportunities in the field of multi-modal visual pattern recognition. The workshop is held in conjunction with the 27th International Conference on Pattern Recognition (ICPR 2024).

Workshop

Date: December 1st 2024
Time: 17:00—20:30 PM in Beijing Time (CST)

Agenda

Time (CST)	Session Title
17:00-17:30	Session 1: Workshop Reports & Challenge Results Opening remarks: Prof. Josef Kittler (University of Surrey, UK), Prof. Xiao-Jun Wu (Jiangnan University, China) Workshop Reports & Challenge Results: Dr. Tianyang Xu (Jiangnan University, China)
17:30-18:00	Session 1: Invited Talk 1 - Dr. Han Xu (Southeast University, China)
18:00-18:30	Session 3: Invited Talk 2 - Dr. Hui Li (Jiangnan University, China)
18:30-19:00	Coffee Break
19:00-20:30	Session 4: Winner Presentations & Contributed Papers Paper Title: Multi-Modal Fusion of LiDAR and PRISMA Data for Cobalt Mapping: A Case Study from the Áramo Mine, Spain⋆ Affiliations: Geological Survey of Finland (GTK), Finland; Department of Computing, University of Turku, Finland; Aurum Exploration Limited (Aurum), Kells, Ireland Paper Title: Adapting SAM2 for Visual Object Tracking Affiliations: University of Washington, Seattle WA, USA; Electronics and Telecommunications Research Institute, Daejeon, South Korea; National Center for High-performance Computing, Hsinchu Paper Title: Visual Prompt with Larger Model for Multi-Modal Tracking Affiliations: Dalian University of Technology, China; Dalian Minzu University, China Paper Title: Enhancing Multi-Modal Object Detection with Data Augmentation, Focal Loss, and Model Ensembling Affiliation: State Key Laboratory for Novel Software Technology, Nanjing University, China Paper Title: Advancing Multi-Modal Visual Pattern Recognition: Object Detection Affiliations: Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala; Founding Minds Software Paper Title: Action Recognition Using Temporal Shift Module and Ensemble Learning Affiliations:University of Limoges, Limoges, France ; L3i Laboratory, La Rochelle University, France Paper Title: Modality Fusion Adaptor-Enhanced Vision Transformer for Multimodal Action Recognition Affiliation:School of Computer Science and Technology, Xidian University, China; Paper Title: An Effective End-to-End Solution for Multimodal Action Recognition Affiliations: Nanjing University, China; Nanjing University of Science and Technology, China Paper Title: Evolution of Hybrid Multi-Modal Action Recognition: From DA-CNN+Bi-GRU to EfficientNet-CNN-ViT Affiliation: Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, India; Founding Minds Software
20:30	Closing Remarks

Meeting Info

Virtual link: Link.
Virtual meeting info:
Microsoft Teams ID: 444 031 699 394 Password: xe6iP2n7

Call for papers

We invite submissions presenting new and original research on topics including but not limited to the following:

Cross-Modal Learning and Representation
Multi-Modal Object Detection and Tracking
Multi-Modal Video Understanding
Multi-Modal Machine Learning in Healthcare
Few-Shot and Zero-Shot Learning in Multi-Modal Systems
Applications of Multi-Modal Learning in Social Media

Paper Submission and Review

ICPR-2024 will follow a single-blind review process. Authors can include their names and affiliations in the manuscript. Also, the authors can submit their arXiv papers in ICPR-2024. Note: Please submit your paper to the email by October 15, 2024, 11:59pm Anywhere On Earth.

Paper Format and Length

Springer LNCS format with maximum 15 pages (including references) during paper submission. There is no minimum page limit. To take care of reviewers’ comments, one more page is allowed (without any charge) during revised/camera ready submission. Moreover, authors may purchase up to 2 extra pages. Extra page charges must be paid at the time of registration. Springer LNCS paper formatting instructions and templates for ICPR-2024 are available here DOC and LaTeX.

Supplementary materials

By the submission deadline, the authors may optionally submit additional material that was ready at the time of paper submission but could not be included due to constraints of format or space. The authors should refer to the contents of the supplementary material appropriately in the paper. Reviewers will be encouraged to look at it, but are not obligated to do so.

Supplementary material may include videos, proofs, additional figures or tables, more detailed analysis of experiments presented in the paper. There is no page limit for the supplementary materials but only one file with maximum file size of 50 MB is allowed for submission.

We encourage (if possible) authors to upload their code as part of their supplementary material in order to help reviewers assess the quality of the work.

Overview

The workshop aims to foster collaboration and exchange of ideas among researchers from different domains, including pattern recognition, computer vision, machine learning, signal processing, and artificial intelligence. By addressing technical issues such as feature heterogeneity, data fusion, and cross-modal correlation modeling, the workshop aims to advance the state-of-the-art in multi-modal visual pattern recognition and promote the development of innovative solutions for real-world applications. The topics of interest include but are not limited to:

Integration of multiple modalities (such as images, videos, text, audio, and other sensor data) for pattern recognition tasks.
Novel algorithms and techniques for multi-modal feature extraction, representation learning, and fusion.
Applications of multi-modal visual pattern recognition in various domains, including computer vision, multimedia analysis, biometrics, healthcare, robotics, and more.
Evaluation methodologies and benchmark datasets for assessing the performance of multi-modal visual pattern recognition systems.

Multi-Modal Visual Pattern Recognition has become increasingly important in various domains, including surveillance, robotics, healthcare, and multimedia analysis. The ability to integrate information from multiple modalities enables more robust and comprehensive understanding of complex real-world environments. As such, the workshop on Multi-Modal Visual Pattern Recognition with Challenge Tracks is highly relevant and of interest for the community. By incorporating challenge tracks into the workshop, participants will have the opportunity to benchmark their algorithms and techniques against state-of-the-art methods in multi-modal pattern recognition. This not only fosters healthy competition but also encourages the development of novel approaches and solutions to address the challenges in the field. Furthermore, the workshop provides a unique platform for researchers to showcase their work, share insights, and engage in discussions on emerging trends and future directions in multi-modal visual pattern recognition. This platform contains datasets, evaluation metrics, baseline algorithms, and evaluation server.

Challenge

The workshop will feature three challenge tracks, each focusing on a specific aspect of multi-modal pattern recognition. To participate, please fill out this online Multi-Modal Visual Pattern Recognition Challenge Datasets Request Form.

Track 1: Multi-Modal Tracking : This track aims to address the technical challenges associated with tracking objects using multi-modal data. You can participate in in the Track 1 through the link.

Track 2: Multi-Modal Detection: The goal of this track is to explore techniques for detecting objects of interest in multi-modal data streams. You can participate in the Track 2 through the link.

Track 3: Multi-Modal Action Recognition : This track focuses on recognizing human actions or activities from multi-modal data sources. You can participate in the Track 3 through the link.

Details

The Multi-Modal Visual Pattern Recognition Workshop will feature three challenge tracks. The datasets for the tracks involve modalities including RGB, infrared thermal, depth, and event. The details of each track are as follows:

Track 1: Multi-Modal Tracking
This track aims to address the technical challenges associated with tracking objects in multi-modal data. The dataset for this task comprises 500 multi-modal videos, with 400 allocated for training purposes and the remaining 100 for testing.

Track 2: Multi-Modal Detection
The goal of this track is to explore techniques for detecting objects of interest in multi-modal data streams. The dataset for this task comprises 5000 multi-modal images in total, with 4000 images allocated for training and the remaining 1000 images for testing.

Track 3: Multi-Modal Action Recognition
This track focuses on recognizing human actions from multi-modal data sources. The dataset for this track contains 2500 multi-modal videos (2000 for training and 500 for test) spanning across 20 action classes.

Note: The Top-3 teams in each track are required to submit a workshop paper describing their respective solutions. This workshop sets awards for the Top-3 of each track, 3 best research paper awards.

Important Dates

Challenge Open (Training & Test dataset release): July 26, 2024
Results Submission Deadline: October 07, 2024
Paper Submission Deadline: October 15, 2024
Notification to Authors: October 27, 2024
Workshop: December 01, 2024

Organizing Committee

Tianyang Xu Jiangnan University	Xiao-Jun Wu Jiangnan University	Josef Kittler University of Surrey	Umapada Pal Indian Statistical Institute
Jiwen Lu Tsinghua University	Xi Li Zhejiang University	Vasile Palade Coventry University	Xuefeng Zhu Jiangnan University
Linze Li Jiangnan University	Xiao Yang Jiangnan University	Yifan Pan Jiangnan University	Minzhi Li Jiangnan University
Han Zang Jiangnan University	Youchen Xie Jiangnan University

Challenge Group

If you have any questions about the challenge, you can discuss it in the Google Group.
You can also join the WeChat Group by the QR code.