Skip to the content.

Workshop Description

The overall goal of this workshop is to bring researchers, academicians, professionals and policymakers under a single umbrella to innovate data engineering methods that make the best of the limited data in the medical domain. In our past two editions, we focused on addressing several key issues in data engineering, including:

Our contributions are compiled and published in the proceedings. Some of the contributions from the earlier editions, such as Khanal et al. (2023), proposed training models using self-supervised learning, followed by employing methods to learn from noisy labels. Similarly, Poudel et al. (2024) and Thrasher et al. (2024) proposed a task-aware active learning method to sample the most informative unlabeled data, reducing the need for training examples by 50%. Contributions from Dener et al. (2023), Reyes-Amezcua et al. (2024), and Rau et al. (2023) focused on curating both task-aware and task-unaware synthetic data and addressing biases in synthetic data. Babu et al. (2024) and Pokhrel et al. (2024) highlighted biases in data augmentation and their use in out-of-distribution detection, respectively.

These contributions have developed innovative and principled methods to integrate different aspects of data engineering, maximizing the benefits of available data—central to this workshop's theme. As Ilya Sutskever remarked in his speech at NeurIPS 2024, "Computers are advancing with better hardware, algorithms, and clusters." However, data—the "fossil fuel" of AI—has reached its growth limit. His statement underscores the growing importance of AI tools and techniques for effectively utilizing limited data.

Data-driven deep learning architectures such as UNet, VGGNet, ResNet, V-Net, DenseNet, and Vision Transformers are widely used in downstream tasks such as detection, classification, and 3D reconstruction. These architectures require large volumes of annotated data to train their millions of parameters, which is difficult and expensive to collect and annotate. In medical image analysis, standard sensors are often unsuitable for in vivo use. When they are suitable, data collection requires patient consent, lengthy acquisition procedures, and the large inter-rater variability in medical image analysis which makes it even harder to have high-quality labels.

To address these challenges, data engineering methods such as geometric transformations (e.g., rotation, flipping, cropping), MixUp, and Cutout —though limited to single modality—have been introduced to expand training datasets over the past decade. However, the frequency of such contributions remains low compared to architectural engineering. Although infrequent, these methods have proven effective in improving model generalization.

In recent years, there has been a growing trend in leveraging multimodal data. Large Language Models, Vision-Language Models, and multimodal generative models have been used to synthesize multimodal content, expanding training datasets. Despite the ability to generate large volumes of synthetic data, the fidelity and quality of this data are often insufficient, leading to either modest improvements or even deterioration in model generalization. Existing data engineering methods are predominantly designed for unimodal datasets, emphasizing the need to extend them to handle multimodal data effectively.

Workshop Themes

  1. Data Augmentation in the Medical Domain: This sub theme covers data augmentation through geometric transformations, simulated data from phantoms and generative models in the medical domain, large language models, and multimodal data. It also investigates methods for designing application-aware data augmentation policies.

  2. Active Learning and Active Synthesis: This sub theme focuses on methods for identifying the most discriminative and diverse subsets of unlabeled unimodal and multimodal data to train models for various clinical applications. Active synthesis involves generating synthetic data relevant to the target application.

  3. Self-Supervised Learning: This sub theme explores methods for designing application-specific pretext tasks for pre-training models in a self-supervised manner. Generic pretext tasks are often suboptimal for downstream tasks, highlighting the need for tailored approaches for both unimodal and multimodal datasets.

  4. Datasets and Benchmarking for Data Engineering: This subtheme explores datasets and benchmarks specifically designed for developing, assessing, and validating data engineering methods in both unimodal and multimodal setups, including validation of newly generated samples( e.g newly generated samples). As we know, the standard metrics are known to be suboptimal, expert rating is highly subjective and depends on the medical application.

Imaging Themes (not limited to):

Optical imaging, Endoscopy, OCT, histopathology, Hyperspectral imaging, opto-acoustics, fundus imaging, CT, PET, MRI, X-ray, Ultrasound, New imaging biomarkers, Multimodal imaging, Synthetic data of various imaging types, Other imaging

Clinical Applications (not limited to):

Surgical data science, Classification, detection, and diagnosis in medical image analysis, Organ/instrument/lesion segmentation, Image registration and data fusion, Image reconstruction, Prognosis and prediction, Tissue characterisation, Biology image analysis

Organs (not limited to):

Brain, Head and neck, Liver, Gastrointestinal tract diseases, Lungs

Keynote speakers

Coming Soon

Important Dates

Coming Soon

Submission

Coming Soon

Proceedings


LNCS

Accepted papers will be published in LNCS as a separate DEMI 2025 (MICCAI Workshop) proceeding

Organising committee

Binod Bhattarai Anita Rau Razvan Caramalau
Binod Bhattarai
University of Aberdeen, UK
Anita Rau
Stanford University, USA
Razvan Caramalau
Digital Technologies, Medtronic, UK
annika anguyen ana-namburete
Annika Reinke
German Cancer Research Institute, Germany
Anh Nguyen
University of Liverpool, UK
Ana Namburete
University of Oxford, UK
Prashnna Danail Stoyanov
Prashnna Gyawali
West Virginia University, USA
Danail Stoyanov
University College London, UK

Technical Program Committee

TBD

Web and Publicity Chair

  • Anju Chhetri, NAAMII, Nepal
  • Sandesh Pokhrel, NAAMII, Nepal
  • Sponsors

    Logo

    We are seeking additional academic/industrial sponsorships. Please contact us for more details: demiworkshop23@gmail.com

    Past Iterations

  • DEMI@MICCAI2024
  • DEMI@MICCAI2023