- The Clinical Explainable AI Guidelines provides design and evaluation criteria that supports the XAI design and evaluation for clinical use.
- Explanation form is chosen based on G1 Understandability and G2 Clinical relevance.
- Explanation method is chosen based on G3 Truthfulness and G4 Informative plausibility.
- Evaluations on two medical datasets showed existing heatmap methods met G1, partially met G2, but failed G3 and G4.
- We propose a novel problem of multi-modal medical image explanation and its metrics.
- arXivGuidelines and Evaluation for Clinical Explainable AI in Medical Image AnalysisUnder revision at Medical Image Analysis 2022
A precursor of this work is published at AAAI 22 Social Impact Track:
Evaluating Explainable AI on a Multi-Modal Medical Imaging Task: Can Existing Algorithms Fulfill Clinical Requirements?
- AAAIEvaluating Explainable AI on a Multi-Modal Medical Imaging Task: Can Existing Algorithms Fulfill Clinical Requirements?Proceedings of the AAAI Conference on Artificial Intelligence Jun 2022
Acceptance rate: 15%
The overarching problem is: how to design and evaluate explainable AI in real-world high-stakes domains. We propose a novel problem in the medical domain, multi-modal medical image explanation, and use it as an example to demonstrate our evaluation process that incorporates both technical and clinical requirements.
Our evaluation is on the commonly used heatmap methods for end-user understandability. We cover both gradient and perturbation-based methods.
Based on the explanation goals in real-world critical tasks, we set two primary evaluation objectives on faithfulness and plausibility. Three evaluations on faithfulness show all the examined algorithms did not faithfully represent the AI model decision process at feature level. And plausibility evaluation results show that users’ assessment of how plausible explanations are, is not indicative for model decision quality.
Our systematic evaluation provides a roadmap and objectives for the design and evaluation of explainable AI in critical tasks.
Link to the previous work-in-progress paper: One Map Not Fit All.
- ICML-wOne Map Does Not Fit All: Evaluating Saliency Map Explanation on Multi-Modal Medical ImagesICML 2021 Workshop on Interpretable Machine Learning in Healthcare 2021
Spotlight paper (top 10%), oral presentation