The workshop on Multi-Modal Visual Pattern Recognition aims to provide a comprehensive platform for researchers and practitioners to discuss recent advancements, challenges, and opportunities in the field of multi-modal visual pattern recognition. The workshop is held in conjunction with the 27th International Conference on Pattern Recognition (ICPR 2024).
Workshop
- Date: December 1st 2024
- Time: 17:00—20:30 PM in Beijing Time (CST)
Agenda
Time (CST) | Session Title |
---|---|
17:00-17:30 |
Session 1: Workshop Reports & Challenge Results
|
17:30-18:00 | Session 1: Invited Talk 1 - Dr. Han Xu (Southeast University, China) |
18:00-18:30 | Session 3: Invited Talk 2 - Dr. Hui Li (Jiangnan University, China) |
18:30-19:00 | Coffee Break |
19:00-20:30 |
Session 4: Winner Presentations & Contributed Papers
|
20:30 | Closing Remarks |
Meeting Info
- Virtual link: Link.
- Virtual meeting info:
Microsoft Teams ID: 444 031 699 394 Password: xe6iP2n7
Call for papers
We invite submissions presenting new and original research on topics including but not limited to the following:
- Cross-Modal Learning and Representation
- Multi-Modal Object Detection and Tracking
- Multi-Modal Video Understanding
- Multi-Modal Machine Learning in Healthcare
- Few-Shot and Zero-Shot Learning in Multi-Modal Systems
- Applications of Multi-Modal Learning in Social Media
Paper Submission and Review
ICPR-2024 will follow a single-blind review process. Authors can include their names and affiliations in the manuscript. Also, the authors can submit their arXiv papers in ICPR-2024. Note: Please submit your paper to the email by October 15, 2024, 11:59pm Anywhere On Earth.
Paper Format and Length
Springer LNCS format with maximum 15 pages (including references) during paper submission. There is no minimum page limit. To take care of reviewers’ comments, one more page is allowed (without any charge) during revised/camera ready submission. Moreover, authors may purchase up to 2 extra pages. Extra page charges must be paid at the time of registration. Springer LNCS paper formatting instructions and templates for ICPR-2024 are available here DOC and LaTeX.
Supplementary materials
By the submission deadline, the authors may optionally submit additional material that was ready at the time of paper submission but could not be included due to constraints of format or space. The authors should refer to the contents of the supplementary material appropriately in the paper. Reviewers will be encouraged to look at it, but are not obligated to do so.
Supplementary material may include videos, proofs, additional figures or tables, more detailed analysis of experiments presented in the paper. There is no page limit for the supplementary materials but only one file with maximum file size of 50 MB is allowed for submission.
We encourage (if possible) authors to upload their code as part of their supplementary material in order to help reviewers assess the quality of the work.
Overview
The workshop aims to foster collaboration and exchange of ideas among researchers from different domains, including pattern recognition, computer vision, machine learning, signal processing, and artificial intelligence. By addressing technical issues such as feature heterogeneity, data fusion, and cross-modal correlation modeling, the workshop aims to advance the state-of-the-art in multi-modal visual pattern recognition and promote the development of innovative solutions for real-world applications. The topics of interest include but are not limited to:
- Integration of multiple modalities (such as images, videos, text, audio, and other sensor data) for pattern recognition tasks.
- Novel algorithms and techniques for multi-modal feature extraction, representation learning, and fusion.
- Applications of multi-modal visual pattern recognition in various domains, including computer vision, multimedia analysis, biometrics, healthcare, robotics, and more.
- Evaluation methodologies and benchmark datasets for assessing the performance of multi-modal visual pattern recognition systems.
Multi-Modal Visual Pattern Recognition has become increasingly important in various domains, including surveillance, robotics, healthcare, and multimedia analysis. The ability to integrate information from multiple modalities enables more robust and comprehensive understanding of complex real-world environments. As such, the workshop on Multi-Modal Visual Pattern Recognition with Challenge Tracks is highly relevant and of interest for the community. By incorporating challenge tracks into the workshop, participants will have the opportunity to benchmark their algorithms and techniques against state-of-the-art methods in multi-modal pattern recognition. This not only fosters healthy competition but also encourages the development of novel approaches and solutions to address the challenges in the field. Furthermore, the workshop provides a unique platform for researchers to showcase their work, share insights, and engage in discussions on emerging trends and future directions in multi-modal visual pattern recognition. This platform contains datasets, evaluation metrics, baseline algorithms, and evaluation server.
Challenge
The workshop will feature three challenge tracks, each focusing on a specific aspect of multi-modal pattern recognition. To participate, please fill out this online Multi-Modal Visual Pattern Recognition Challenge Datasets Request Form.
Track 1: Multi-Modal Tracking : This track aims to address the technical challenges associated with tracking objects using multi-modal data. You can participate in in the Track 1 through the link.
Track 2: Multi-Modal Detection: The goal of this track is to explore techniques for detecting objects of interest in multi-modal data streams. You can participate in the Track 2 through the link.
Track 3: Multi-Modal Action Recognition : This track focuses on recognizing human actions or activities from multi-modal data sources. You can participate in the Track 3 through the link.
Details
The Multi-Modal Visual Pattern Recognition Workshop will feature three challenge tracks. The datasets for the tracks involve modalities including RGB, infrared thermal, depth, and event. The details of each track are as follows:
Track 1: Multi-Modal Tracking
This track aims to address the technical challenges associated with tracking objects in multi-modal data. The dataset for this task comprises 500 multi-modal videos, with 400 allocated for training purposes and the remaining 100 for testing.
Track 2: Multi-Modal Detection
The goal of this track is to explore techniques for detecting objects of interest in multi-modal data streams. The dataset for this task comprises 5000 multi-modal images in total, with 4000 images allocated for training and the remaining 1000 images for testing.
Track 3: Multi-Modal Action Recognition
This track focuses on recognizing human actions from multi-modal data sources. The dataset for this track contains 2500 multi-modal videos (2000 for training and 500 for test) spanning across 20 action classes.
Note: The Top-3 teams in each track are required to submit a workshop paper describing their respective solutions. This workshop sets awards for the Top-3 of each track, 3 best research paper awards.
Important Dates
- Challenge Open (Training & Test dataset release): July 26, 2024
- Results Submission Deadline: October 07, 2024
- Paper Submission Deadline: October 15, 2024
- Notification to Authors: October 27, 2024
- Workshop: December 01, 2024
Organizing Committee
Tianyang Xu Jiangnan University |
Xiao-Jun Wu Jiangnan University |
Josef Kittler University of Surrey |
Umapada Pal Indian Statistical Institute |
Jiwen Lu Tsinghua University |
Xi Li Zhejiang University |
Vasile Palade Coventry University |
Xuefeng Zhu Jiangnan University |
Linze Li Jiangnan University |
Xiao Yang Jiangnan University |
Yifan Pan Jiangnan University |
Minzhi Li Jiangnan University |
Han Zang Jiangnan University |
Youchen Xie Jiangnan University |
Challenge Group
- If you have any questions about the challenge, you can discuss it in the Google Group.
- You can also join the WeChat Group by the QR code.