One Map Does Not Fit All

Arxiv Paper, Video paper presentation, Slides
The work-in-progress paper is accepted by ICML 2021 Workshop: Interpretable Machine Learning in Healthcare as a spotlight paper.
The full paper is in: Guidelines and evaluation for clinical explainable AI

Related Publication

ICML-w
One Map Does Not Fit All: Evaluating Saliency Map Explanation on Multi-Modal Medical Images

Jin, Weina, Li, Xiaoxiao, and Hamarneh, Ghassan

ICML 2021 Workshop on Interpretable Machine Learning in Healthcare 2021

Spotlight paper (top 10%), oral presentation

abstract arXiv Bib Paper Blog Code Poster Slides

TLDR: The precursor of the AAAI22’ paper - Evaluating Explainable AI on a Multi-Modal Medical Imaging Task: Can Existing Algorithms Fulfill Clinical Requirements?

Being able to explain the prediction to clinical end-users is a necessity to leverage the power of AI models for clinical decision support. For medical images, saliency maps are the most common form of explanation. The maps highlight important features for AI model’s prediction. Although many saliency map methods have been proposed, it is unknown how well they perform on explaining decisions on multi-modal medical images, where each modality/channel carries distinct clinical meanings of the same underlying biomedical phenomenon. Understanding such modality-dependent features is essential for clinical users’ interpretation of AI decisions. To tackle this clinically important but technically ignored problem, we propose the MSFI (Modality-Specific Feature Importance) metric to examine whether saliency maps can highlight modality-specific important features. MSFI encodes the clinical requirements on modality prioritization and modality-specific feature localization. Our evaluations on 16 commonly used saliency map methods, including a clinician user study, show that although most saliency map methods captured modality importance information in general, most of them failed to highlight modality-specific important features consistently and precisely. The evaluation results guide the choices of saliency map methods and provide insights to propose new ones targeting clinical applications.
@article{jin2021one_map_not_fit_all, author = {Jin, Weina and Li, Xiaoxiao and Hamarneh, Ghassan}, title = {One Map Does Not Fit All: Evaluating Saliency Map Explanation on Multi-Modal Medical Images}, journal = {ICML 2021 Workshop on Interpretable Machine Learning in Healthcare}, year = {2021}, eprinttype = {arXiv}, eprint = {2107.05047}, timestamp = {Tue, 20 Jul 2021 15:08:33 +0200}, biburl = {https://dblp.org/rec/journals/corr/abs-2107-05047.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}, project = {one_map}, }

Introduction

Being able to explain the prediction to clinical end-users is a necessity to leverage the power of AI models for clinical decision support. For medical images, saliency maps are the most common form of explanation. The maps highlight important features for AI model’s prediction. Although many saliency map methods have been proposed, it is unknown how well they perform on explaining decisions on multi-modal medical images, where each modality/channel carries distinct clinical meanings of the same underlying biomedical phenomenon.

Understanding such modality-dependent features is essential for clinical users’ interpretation of AI decisions. However, we don’t know whether existing saliency maps can still fulfill the particular clinical requirements on multi-modal images. Therefore, we would like to propose this clinically-motivated requirement to the technical community: The need for modality-specific explanations that are aligned with clinical prior knowledge, or simply: multi-modal explanation.

Methods

As a primary step toward this direction, in this work, we propose evaluation metrics and conduct experiments on existing saliency map methods regarding their multi-modal explanation property. We included 16 commonly used methods in the evaluation, that cover activation-based, gradient-based, and perturbation-based explainable AI approaches.

The evaluation was conducted on a brain tumor classification task and two datasets: one is Brats dataset which is from real-patients, and the other is a synthetic dataset where we have a better control of the ground-truth.

With the generated saliency maps, we first conducted a doctor user study to seek doctor’s clinical requirements and rating on the saliency maps generated on multi-modal MRI.

Doctors tend to prioritize the important modality for prediction given the task, and they expect the AI to correctly localize the discriminative features.

To fulfill such clinical requirements, we propose the computational metric MSFI (Modality-Specific Feature Importance) that encodes the clinical requirements on modality prioritization and modality-specific feature localization.

Results

Our evaluations show that although most saliency map methods captured modality importance information in general, most of them failed to highlight modality-specific important features consistently and precisely.

Conclusions

We propose the clinically motivated problem to the technical community: Explain on multi-modal medical images the clinical problem of explaining multi modality medical images to the technical community.

We also propose the evaluation metric MSFI and conducted experiments to evaluate saliency maps regarding their abilities to fulfill clinical requirements. Based on MSFI metric, there are discrepancies between current saliency map methods and the clinical requirements for multi-modal image explanation.

Significance

The application of MSFI could help to evaluate and select the saliency map method before clinical deployment, and also provides clinical insights for the proposal of new AI method, such as those that incorporate clinical requirements on multi-modal image explanation into the model training.