TLDR: A clinical user study with 35 neurosurgeons and identified that AI and explainable AI are not helpful in the collaborative doctor-AI clinical settings.
Clinical evaluation evidence and model explainability are key gatekeepers to ensure the safe, accountable, and effective use of artificial intelligence (AI) in clinical settings. We conducted a clinical user-centered evaluation with 35 neurosurgeons to assess the utility of AI assistance and its explanation on the glioma grading task. Each participant read 25 brain MRI scans of patients with gliomas, and gave their judgment on the glioma grading without and with the assistance of AI prediction and explanation. The AI model was trained on the BraTS dataset with 88.0% accuracy. The AI explanation was generated using the explainable AI algorithm of SmoothGrad, which was selected from 16 algorithms based on the criterion of being truthful to the AI decision process. Results showed that compared to the average accuracy of 82.5±8.7% when physicians performed the task alone, physicians’ task performance increased to 87.7±7.3% with statistical significance (p-value = 0.002) when assisted by AI prediction, and remained at almost the same level of 88.5±7.0% (p-value = 0.35) with the additional assistance of AI explanation. Based on quantitative and qualitative results, the observed improvement in physicians’ task performance assisted by AI prediction was mainly because physicians’ decision patterns converged to be similar to AI, as physicians only switched their decisions when disagreeing with AI. The insignificant change in physicians’ performance with the additional assistance of AI explanation was because the AI explanations did not provide explicit reasons, contexts, or descriptions of clinical features to help doctors discern potentially incorrect AI predictions. The evaluation showed the clinical utility of AI to assist physicians on the glioma grading task, and identified the limitations and clinical usage gaps of existing explainable AI techniques for future improvement.
2023
MedIA
Guidelines and evaluation of clinical explainable AI in medical image analysis
Jin, Weina, Li, Xiaoxiao, Fatehi, Mostafa, and Hamarneh, Ghassan
TLDR: The Clinical XAI Guidelines provides criteria that explanation should fulfill in critical decision support.
Explainable artificial intelligence (XAI) is essential for enabling clinical users to get informed decision support from AI and comply with evidence-based medical practice. Applying XAI in clinical settings requires proper evaluation criteria to ensure the explanation technique is both technically sound and clinically useful, but specific support is lacking to achieve this goal. To bridge the research gap, we propose the Clinical XAI Guidelines that consist of five criteria a clinical XAI needs to be optimized for. The guidelines recommend choosing an explanation form based on Guideline 1 (G1) Understandability and G2 Clinical relevance. For the chosen explanation form, its specific XAI technique should be optimized for G3 Truthfulness, G4 Informative plausibility, and G5 Computational efficiency. Following the guidelines, we conducted a systematic evaluation on a novel problem of multi-modal medical image explanation with two clinical tasks, and proposed new evaluation metrics accordingly. Sixteen commonly-used heatmap XAI techniques were evaluated and found to be insufficient for clinical use due to their failure in G3 and G4. Our evaluation demonstrated the use of Clinical XAI Guidelines to support the design and evaluation of clinically viable XAI.
MethodsX
Generating post-hoc explanation from deep neural networks for multi-modal medical image analysis tasks
Jin, Weina, Li, Xiaoxiao, Fatehi, Mostafa, and Hamarneh, Ghassan
TLDR: Describing the methods of generating AI heatmap explanations for multi-modal medical images.
Explaining model decisions from medical image inputs is necessary for deploying deep neural network (DNN) based models as clinical decision assistants. The acquisition of multi-modal medical images is pervasive in practice for supporting the clinical decision-making process. Multi-modal images capture different aspects of the same underlying regions of interest. Explaining DNN decisions on multi-modal medical images is thus a clinically important problem. Our methods adopt commonly-used post-hoc artificial intelligence feature attribution methods to explain DNN decisions on multi-modal medical images, including two categories of gradient- and perturbation-based methods. • Gradient-based explanation methods – such as Guided BackProp, DeepLift – utilize the gradient signal to estimate the feature importance for model prediction. • Perturbation-based methods – such as occlusion, LIME, kernel SHAP – utilize the input-output sampling pairs to estimate the feature importance. • We describe the implementation details on how to make the methods work for multi-modal image input, and make the implementation code available.
2022
AAAI
Evaluating Explainable AI on a Multi-Modal Medical Imaging Task: Can Existing Algorithms Fulfill Clinical Requirements?
Jin, Weina, Li, Xiaoxiao, and Hamarneh, Ghassan
Proceedings of the AAAI Conference on Artificial Intelligence Jun 2022
TLDR: Our systematic evaluation showed the examined 16 heatmap algorithms failed to fulfill clinical requirements to correctly indicate AI model decision process or decision quality.
Being able to explain the prediction to clinical end-users is a necessity to leverage the power of artificial intelligence (AI) models for clinical decision support. For medical images, a feature attribution map, or heatmap, is the most common form of explanation that highlights important features for AI models’ prediction. However, it is unknown how well heatmaps perform on explaining decisions on multi-modal medical images, where each image modality or channel visualizes distinct clinical information of the same underlying biomedical phenomenon. Understanding such modality-dependent features is essential for clinical users’ interpretation of AI decisions. To tackle this clinically important but technically ignored problem, we propose the modality-specific feature importance (MSFI) metric. It encodes clinical image and explanation interpretation patterns of modality prioritization and modality-specific feature localization. We conduct a clinical requirement-grounded, systematic evaluation using computational methods and a clinician user study. Results show that the examined 16 heatmap algorithms failed to fulfill clinical requirements to correctly indicate AI model decision process or decision quality. The evaluation and MSFI metric can guide the design and selection of explainable AI algorithms to meet clinical requirements on multi-modal explanation.
What Explanations Do Doctors Require From Artificial Intelligence?
TLDR: The precursor of the AAAI22’ paper - Evaluating Explainable AI on a Multi-Modal Medical Imaging Task: Can Existing Algorithms Fulfill Clinical Requirements?
Being able to explain the prediction to clinical end-users is a necessity to leverage the power of AI models for clinical decision support. For medical images, saliency maps are the most common form of explanation. The maps highlight important features for AI model’s prediction. Although many saliency map methods have been proposed, it is unknown how well they perform on explaining decisions on multi-modal medical images, where each modality/channel carries distinct clinical meanings of the same underlying biomedical phenomenon. Understanding such modality-dependent features is essential for clinical users’ interpretation of AI decisions. To tackle this clinically important but technically ignored problem, we propose the MSFI (Modality-Specific Feature Importance) metric to examine whether saliency maps can highlight modality-specific important features. MSFI encodes the clinical requirements on modality prioritization and modality-specific feature localization. Our evaluations on 16 commonly used saliency map methods, including a clinician user study, show that although most saliency map methods captured modality importance information in general, most of them failed to highlight modality-specific important features consistently and precisely. The evaluation results guide the choices of saliency map methods and provide insights to propose new ones targeting clinical applications.
arXiv
EUCA: the End-User-Centered Explainable AI Framework
TLDR: EUCA provides design suggestions from end-users’ perspective on explanation forms and goals.
The ability to explain decisions to end-users is a necessity to deploy AI as critical decision support. Yet making AI explainable to non-technical end-users is a relatively ignored and challenging problem. To bridge the gap, we first identify twelve end-user- friendly explanatory forms that do not require technical knowledge to comprehend, including feature-, example-, and rule-based explanations. We then instantiate the explanatory forms as prototyping cards in four AI-assisted critical decision-making tasks, and conduct a user study to co-design low-fidelity prototypes with 32 layperson participants. The results confirm the relevance of using explanatory forms as building blocks of explanations, and identify their proprieties — pros, cons, applicable explanation goals, and design implications. The explanatory forms, their proprieties, and prototyping supports (including a suggested prototyping process, design templates and exemplars, and associated algorithms to actualize explanatory forms) constitute the End-User-Centered explainable AI framework EUCA, and is available at http://weinajin.github.io/end-user-xai. It serves as a practical prototyping toolkit for HCI/AI practitioners and researchers to understand user requirements and build end-user-centered explainable AI.
2020
JNE
Artificial Intelligence in Glioma Imaging: Challenges and Advances
TLDR: A review if the challenges and advances of implementing AI in clinical settings (including the interpretable AI challenge) in the neuro-oncology domain.
Primary brain tumors including gliomas continue to pose significant management challenges to clinicians. While the presentation, the pathology, and the clinical course of these lesions are variable, the initial investigations are usually similar. Patients who are suspected to have a brain tumor will be assessed with computed tomography (CT) and magnetic resonance imaging (MRI). The imaging findings are used by neurosurgeons to determine the feasibility of surgical resection and plan such an undertaking. Imaging studies are also an indispensable tool in tracking tumor progression or its response to treatment. As these imaging studies are non-invasive, relatively cheap and accessible to patients, there have been many efforts over the past two decades to increase the amount of clinically-relevant information that can be extracted from brain imaging. Most recently, artificial intelligence (AI) techniques have been employed to segment and characterize brain tumors, as well as to detect progression or treatment-response. However, the clinical utility of such endeavours remains limited due to challenges in data collection and annotation, model training, and the reliability of AI-generated information. We provide a review of recent advances in addressing the above challenges. First, to overcome the challenge of data paucity, different image imputation and synthesis techniques along with annotation collection efforts are summarized. Next, various training strategies are presented to meet multiple desiderata, such as model performance, generalization ability, data privacy protection, and learning with sparse annotations. Finally, standardized performance evaluation and model interpretability methods have been reviewed. We believe that these technical approaches will facilitate the development of a fully-functional AI tool in the clinical care of patients with gliomas.
2019
IEEE VIS poster
Bridging AI Developers and End Users: an End-User-Centred Explainable AI Taxonomy and Visual Vocabularies
Jin, Weina, Carpendale, Sheelagh, Hamarneh, Ghassan, and Gromala, Diane
In IEEE VIS 2019 Conference Poster Abstract Apr 2019
TLDR: We conducted a literature review and summarize the end-user-friendly explanation forms as visual vocabularies. This is The precursor of the EUCA framework.
Researchers in the re-emerging field of explainable/interpretable artificial intelligence (XAI) have not paid enough attention to the end users of AI, who may be lay persons or domain experts such as doctors, drivers, and judges. We took an end-user-centric lens and conducted a literature review of 59 technique papers on XAI algorithms and/or visualizations. We grouped the existing explanatory forms in the literature into the end-user-friendly XAI taxonomy. It consists of three forms that explain AI’s decisions: feature attribute , instance, and decision rules/trees. We also analyzed the visual representations for each explanatory form, and summarized them as the XAI visual vocabularies. Our work is a synergy of XAI algorithm, visualization, and user-centred design. It provides a practical toolkit for AI developers to define the explanation problem from a user-centred perspective, and expand the visualization space of explanations to develop more end-user-friendly XAI systems.
2018
IEEE GEM
Automatic Prediction of Cybersickness for Virtual Reality Games
Jin, Weina, Fan, Jianyu, Gromala, Diane, and Pasquier, Philippe
In 2018 IEEE Games, Entertainment, Media Conference (GEM) Apr 2018
TLDR: Applying machine learning to predict cybersickness.
Cybersickness, which is also called Virtual Reality (VR) sickness, poses a significant challenge to the VR user experience. Previous work demonstrated the viability of predicting cybersickness for VR 360°videos. Is it possible to automatically predict the level of cybersickness for interactive VR games? In this paper, we present a machine learning approach to automatically predict the level of cybersickness for VR games. First, we proposed a novel ranking-rating (RR) score to measure the ground-truth annotations for cybersickness. We then verified the RR scores by comparing them with the Simulator Sickness Questionnaire (SSQ) scores. Next, we extracted features from heterogeneous data sources including the VR visual input, the head movement, and the individual characteristics. Finally, we built three machine learning models and evaluated their performances: the Convolutional Neural Network (CNN) trained from scratch, the Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) trained from scratch, and the Support Vector Regression (SVR). The results indicated that the best performance of predicting cybersickness was obtained by the LSTM-RNN, providing a viable solution for automatically cybersickness prediction for interactive VR games.
2017
CSCW poster
A Collaborative Visualization Tool to Support Doctors’ Shared Decision-Making on Antibiotic Prescription
Jin, Weina, Gromala, Diane, Neustaedter, Carman, and Tong, Xin
In Companion of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing Apr 2017
TLDR: A visualization prototype to support asynchronous collaboration among healthcare professionals.
The inappropriate prescription of antibiotics may cause severe medical outcomes such as antibiotic resistance. To prevent such situations and facilitate appropriate antibiotic prescribing, we designed and developed an asynchronous collaborative visual analytics tool. It visualizes the antibiotics’ coverage spectrum that allows users choose the most appropriate antibiotics. The asynchronous collaboration around visualization mimics the actual collaboration scenarios in clinical settings, and provides supportive information during physician’s decision-making process. Our work contributes to the CSCW community by providing a design prototype to support asynchronous collaboration among healthcare professionals, which is crucial but lacks in many of the present clinical decision support systems.
2016
CHI poster
AS IF: A Game as an Empathy Tool for Experiencing the Activity Limitations of Chronic Pain Patients
Jin, Weina, Ulas, Servet, and Tong, Xin
In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems Apr 2016
TLDR: The design of an empathy game for people with chronic pain.
Pain is both a universal and unique experience for its sufferers. Nonetheless, pain is also invisible and incommunicable that it becomes difficult for the public to understand or even believe the suffering, especially for the persistent form of pain: Chronic Pain. Therefore, we designed and developed the game – AS IF – to foster non-patients’ empathy for Chronic Pain sufferers. In this game, players engage with the connecting dots tasks through whole body interaction. After they generate the connection with their virtual body, they will experience a certain degree of activity limitation that mimics one of the sufferings of Chronic Pain. In this paper, we introduce the game design that facilitates the enhancement of empathy for Chronic Pain experience, and illustrate how this game acts as a form of communication media that may help to enhance understanding.
2015
IEEE GEM poster
Serious game for serious disease: Diminishing stigma of depression via game experience
Jin, Weina, Gromala, Diane, and Tong, Xin
In 2015 IEEE Games Entertainment Media Conference (GEM) Apr 2015
TLDR: The design of an empathy game for people with depression.
Stigma is a common and serious problem for patients who suffer from depression and other mental illnesses. We designed a serious game to address this problem. The game enables player to experience and strive to overcome the disempowering aspects of depression during the journey to recovery. Through the game’s interaction, player may gain a better understanding of the relationship between patient and the disease, which in turn may help to change the players’ moral model with the disease model, and thereby diminish the stigma of depression.