Skip to the content.

MMVPR
ICPR

The workshop on Multi-Modal Visual Pattern Recognition aims to provide a comprehensive platform for researchers and practitioners to discuss recent advancements, challenges, and opportunities in the field of multi-modal visual pattern recognition. The workshop is held in conjunction with the 27th International Conference on Pattern Recognition (ICPR 2024).

Workshop

Agenda

Time (CST) Session Title
17:00-17:30 Session 1: Workshop Reports & Challenge Results
  • Opening remarks:
    Prof. Josef Kittler (University of Surrey, UK),
    Prof. Xiao-Jun Wu (Jiangnan University, China)
  • Workshop Reports & Challenge Results:
    Dr. Tianyang Xu (Jiangnan University, China)
17:30-18:00 Session 1: Invited Talk 1 - Dr. Han Xu (Southeast University, China)
18:00-18:30 Session 3: Invited Talk 2 - Dr. Hui Li (Jiangnan University, China)
18:30-19:00 Coffee Break
19:00-20:30 Session 4: Winner Presentations & Contributed Papers
  1. Paper Title: Multi-Modal Fusion of LiDAR and PRISMA Data for Cobalt Mapping: A Case Study from the Áramo Mine, Spain⋆
    Affiliations: Geological Survey of Finland (GTK), Finland; Department of Computing, University of Turku, Finland; Aurum Exploration Limited (Aurum), Kells, Ireland
  2. Paper Title: Adapting SAM2 for Visual Object Tracking
    Affiliations: University of Washington, Seattle WA, USA; Electronics and Telecommunications Research Institute, Daejeon, South Korea; National Center for High-performance Computing, Hsinchu
  3. Paper Title: Visual Prompt with Larger Model for Multi-Modal Tracking
    Affiliations: Dalian University of Technology, China; Dalian Minzu University, China
  4. Paper Title: Enhancing Multi-Modal Object Detection with Data Augmentation, Focal Loss, and Model Ensembling
    Affiliation: State Key Laboratory for Novel Software Technology, Nanjing University, China
  5. Paper Title: Advancing Multi-Modal Visual Pattern Recognition: Object Detection
    Affiliations: Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala; Founding Minds Software
  6. Paper Title: Action Recognition Using Temporal Shift Module and Ensemble Learning
    Affiliations:University of Limoges, Limoges, France ; L3i Laboratory, La Rochelle University, France
  7. Paper Title: Modality Fusion Adaptor-Enhanced Vision Transformer for Multimodal Action Recognition
    Affiliation:School of Computer Science and Technology, Xidian University, China;
  8. Paper Title: An Effective End-to-End Solution for Multimodal Action Recognition
    Affiliations: Nanjing University, China; Nanjing University of Science and Technology, China
  9. Paper Title: Evolution of Hybrid Multi-Modal Action Recognition: From DA-CNN+Bi-GRU to EfficientNet-CNN-ViT
    Affiliation: Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, India; Founding Minds Software
20:30 Closing Remarks

Meeting Info

Call for papers

We invite submissions presenting new and original research on topics including but not limited to the following:

Paper Submission and Review

ICPR-2024 will follow a single-blind review process. Authors can include their names and affiliations in the manuscript. Also, the authors can submit their arXiv papers in ICPR-2024. Note: Please submit your paper to the email by October 15, 2024, 11:59pm Anywhere On Earth.

Paper Format and Length

Springer LNCS format with maximum 15 pages (including references) during paper submission. There is no minimum page limit. To take care of reviewers’ comments, one more page is allowed (without any charge) during revised/camera ready submission. Moreover, authors may purchase up to 2 extra pages. Extra page charges must be paid at the time of registration. Springer LNCS paper formatting instructions and templates for ICPR-2024 are available here DOC and LaTeX.

Supplementary materials

By the submission deadline, the authors may optionally submit additional material that was ready at the time of paper submission but could not be included due to constraints of format or space. The authors should refer to the contents of the supplementary material appropriately in the paper. Reviewers will be encouraged to look at it, but are not obligated to do so.

Supplementary material may include videos, proofs, additional figures or tables, more detailed analysis of experiments presented in the paper. There is no page limit for the supplementary materials but only one file with maximum file size of 50 MB is allowed for submission.

We encourage (if possible) authors to upload their code as part of their supplementary material in order to help reviewers assess the quality of the work.

Overview

The workshop aims to foster collaboration and exchange of ideas among researchers from different domains, including pattern recognition, computer vision, machine learning, signal processing, and artificial intelligence. By addressing technical issues such as feature heterogeneity, data fusion, and cross-modal correlation modeling, the workshop aims to advance the state-of-the-art in multi-modal visual pattern recognition and promote the development of innovative solutions for real-world applications. The topics of interest include but are not limited to:

Multi-Modal Visual Pattern Recognition has become increasingly important in various domains, including surveillance, robotics, healthcare, and multimedia analysis. The ability to integrate information from multiple modalities enables more robust and comprehensive understanding of complex real-world environments. As such, the workshop on Multi-Modal Visual Pattern Recognition with Challenge Tracks is highly relevant and of interest for the community. By incorporating challenge tracks into the workshop, participants will have the opportunity to benchmark their algorithms and techniques against state-of-the-art methods in multi-modal pattern recognition. This not only fosters healthy competition but also encourages the development of novel approaches and solutions to address the challenges in the field. Furthermore, the workshop provides a unique platform for researchers to showcase their work, share insights, and engage in discussions on emerging trends and future directions in multi-modal visual pattern recognition. This platform contains datasets, evaluation metrics, baseline algorithms, and evaluation server.

Challenge

The workshop will feature three challenge tracks, each focusing on a specific aspect of multi-modal pattern recognition. To participate, please fill out this online Multi-Modal Visual Pattern Recognition Challenge Datasets Request Form.

Track 1: Multi-Modal Tracking : This track aims to address the technical challenges associated with tracking objects using multi-modal data. You can participate in in the Track 1 through the link.

Track 2: Multi-Modal Detection: The goal of this track is to explore techniques for detecting objects of interest in multi-modal data streams. You can participate in the Track 2 through the link.

Track 3: Multi-Modal Action Recognition : This track focuses on recognizing human actions or activities from multi-modal data sources. You can participate in the Track 3 through the link.

Details

The Multi-Modal Visual Pattern Recognition Workshop will feature three challenge tracks. The datasets for the tracks involve modalities including RGB, infrared thermal, depth, and event. The details of each track are as follows:

Track 1: Multi-Modal Tracking
This track aims to address the technical challenges associated with tracking objects in multi-modal data. The dataset for this task comprises 500 multi-modal videos, with 400 allocated for training purposes and the remaining 100 for testing.

Track 2: Multi-Modal Detection
The goal of this track is to explore techniques for detecting objects of interest in multi-modal data streams. The dataset for this task comprises 5000 multi-modal images in total, with 4000 images allocated for training and the remaining 1000 images for testing.

Track 3: Multi-Modal Action Recognition
This track focuses on recognizing human actions from multi-modal data sources. The dataset for this track contains 2500 multi-modal videos (2000 for training and 500 for test) spanning across 20 action classes.

Note: The Top-3 teams in each track are required to submit a workshop paper describing their respective solutions. This workshop sets awards for the Top-3 of each track, 3 best research paper awards.

Important Dates

Organizing Committee

People

Tianyang Xu

Jiangnan University

People

Xiao-Jun Wu

Jiangnan University

People

Josef Kittler

University of Surrey

People

Umapada Pal

Indian Statistical Institute

People

Jiwen Lu

Tsinghua University

People

Xi Li

Zhejiang University

People

Vasile Palade

Coventry University

People

Xuefeng Zhu

Jiangnan University

People

Linze Li

Jiangnan University

People

Xiao Yang

Jiangnan University

People

Yifan Pan

Jiangnan University

People

Minzhi Li

Jiangnan University

People

Han Zang

Jiangnan University

People

Youchen Xie

Jiangnan University

Challenge Group