Machine Learning: Interpretability vs Explainability
Authors: Navya Tyagi & Esha Srivastava, Tyagi Lab, RMIT University, Melbourne, Australia
Introduction
We are aware that machine learning models are built upon vast amounts of pre-existing context and are constantly learning. The advent of machine learning is meant to assist humans in making complex decisions. However, there are still many misconceptions and uncertainties about how much trust one should place in machine learning when it comes to decision-making—especially when these decisions could impact an individual’s life. But what if there was a way to at least understand or backtrack how machine learning makes decisions? Hence, in this blog, we are going to discuss interpretability and explainability.
We often come across these terms and may use them interchangeably. However, it is important to understand the distinction between them. Interpretability refers to converting the implicit information within a neural network into human-interpretable insights. This is crucial for various purposes such as debugging, making discoveries, providing clear explanations, and in the healthcare sector, it can be helpful in understanding the diagnosis of a disease—which in turn can improve trust and treatment outcomes. In other words, interpretability is about asking “Why?” For instance—why does the result of an image appear this way?
On the other hand, explainability deals with the “How”—how the process is taking place or what mechanisms are involved in understanding a particular image. So, explainability addresses the processes and results, allowing humans to interpret and understand the outcomes. Although both terms overlap, they converge at a point where they address the mechanisms of machine learning models.
1.Interpretability
Interpretability involves understanding deep learning methods to gain insights into why the models are performing in a certain way. This enables us to pinpoint or backtrack the process in case something goes wrong, which helps us understand why a certain decision was made or a specific result was obtained.
For example, in disease diagnosis, it is crucial to understand why a doctor prescribes a particular medicine—not just because a machine learning model suggested it, but to understand the reasoning behind the recommendation and how it would be effective for treatment.
1.1 Building an ML Model
The typical approach to machine learning involves building a large dataset, utilizing significant computational power, and designing a robust model. What truly matters is how useful the model is in tasks such as classification, classification with localization, object detection, and instance segmentation.
It is important to extract interpretable information from these "black boxes" to verify that the model works as expected and to improve or debug the classifier. Gaining insight into what the machine learning model is doing enables collaboration with doctors and clinicians to understand the decision-making process, which can lead to new discoveries.
Furthermore, it is essential that the model can provide explanations for its outputs, as there are often biases embedded in the training data that may influence decision-making.
There are two types of machine learning:
Standard machine learning: This approach involves collecting data, building a machine learning model, and generating predictions. It focuses on evaluating training error, test error, and generalization error.
Interpretable machine learning: This approach emphasizes extracting interpretability from the machine learning model. It allows a human to inspect what the model is learning and understand its behavior. Feedback can be obtained through the data or the model itself by interpreting mistakes and analyzing what is happening inside the black-box model. The goal is not only to optimize for generalization error but also to enhance the human experience—making the model's decisions interpretable and understandable for humans.
1.3 Types of Interpretability
There are two types of interpretability:
1.3.1Ante-hoc interpretability – Ante-hoc methods involve designing interpretable models from the outset, such as decision trees or rule-based models. In other words, in ante-hoc interpretability, we choose an interpretable model, train it, and try to understand whether the model is expressive enough to represent the data. For instance, every time a decision is made, it can go through the decision tree to understand why that decision was made.
1.3.2Post-hoc interpretability – Post-hoc methods analyze complex models after they have been trained, often using techniques to interpret their behavior. Deep learning models have millions of parameters, and for post-hoc interpretability, we need to build a complex model and then develop special techniques to interpret what the models are actually doing.
1.4 Levels of Interpretability
1.4.1 Interpreting models (macroscopic) – At the level of interpreting models, it is called macroscopic because we look at the entire class of the decision boundary and summarize a deep neural network with a simpler model (e.g., a decision tree). We approximate not the reality of the world but the decision logic of the model. The model is presumably learning a cleaned-up version of the world, and we aim to approximate the model rather than reality. This includes building prototypical examples of categories and finding ways to maximize the activation of individual neurons at different levels in the hierarchy to better understand the internal representations of the model.
1.4.2 Interpreting decisions (microscopic) – For the microscopic level, we practically ask why the DNN made a particular decision and verify that the model is indeed behaving as expected, identifying the evidence underlying a particular decision. This is important and better suited for practical applications.
There are several ways to interpret models through representation analysis. Some of them are as follows:
Weight visualization – Visualizing the weights of a neural network helps in understanding which features contribute to the model's decisions. For instance, visualizing them in image format can help understand the internal weights of deep neural networks.
Surrogate model – Surrogate models help in understanding the representation of the model. We can use them alongside neural networks to improve interpretability.
Data generation – Data generation, also known as activation maximization, helps in understanding the input that maximizes particular activation units. In simple terms, it helps understand how data is generated and finds patterns that maximize neuron activation. This technique generates inputs that maximize the activation of specific neurons, revealing which features are most influential in the model's decisions.
Example-based – Example-based methods identify specific instances from the training data that influenced a model’s decision.
Although models help in understanding and interpreting the underlying processes, model interpretability has limitations—primarily in (1) finding a prototype and (2) providing decision explanations.
Now that we have understood various types of model interpretability, let’s look at how neural network decisions can be interpreted:
Example-based – This approach asks which training instance influences decisions the most. It is useful for interpreting black-box methods and studying the fragility of neural network models.
Attribution methods – These help to understand why gradients are noisy and involve assigning an attribution value to each feature.
Gradient-based – These methods smooth the gradient of a representation to enhance interpretability.
Backpropagation-based attribution – This includes techniques like deconvolution, which reverse the operation of convolution filters and use the backpropagation method.
2. Explainability
Explainability is a set of processes that help humans understand the workings of AI systems. In AI-powered decision-making, it aids in describing model correctness, fairness, transparency, and results. When implementing AI models in production, companies need explainable AI to foster confidence and trust. Organizations can adopt a responsible approach to AI development with the help of explainability.
The increasing sophistication of AI makes it harder for humans to understand and trace the algorithm’s path to a conclusion. The entire computation process is transformed into what is often called a “black box,” which is incomprehensible. The data is used directly to generate these black-box models. Often, not even the data scientists or engineers who developed the algorithm can comprehend or explain how the AI arrived at a particular conclusion.
Understanding how an AI system produced a certain result has several benefits. Explainability may be required to comply with regulatory requirements, assist developers in ensuring the system works as intended, or enable individuals affected by a decision to contest or request a change in its outcome.
2.1 Why Explainable AI Matters
Explainable AI matters across many domains. It aids in understanding AI decision-making processes, thus promoting accountability and monitoring models rather than trusting them blindly. Machine learning models are often thought of as “black boxes,” which raises questions about whether they are making the right decisions. Explainable AI helps to understand these mechanisms.
ML models can exhibit biases and stereotypes based on factors like age, gender, socio-economic status, color, or race. Therefore, it is crucial to recognize such patterns in decision-making. Additionally, because training and production data often differ, AI model performance may deteriorate or drift over time. For this reason, organizations must monitor and manage models to promote explainability and assess the financial impact of using such algorithms.
Explainable AI also supports model auditability, end-user trust, and effective application. It reduces legal, security, compliance, and reputational risks associated with production AI. Organizations must build AI systems based on trust and transparency to incorporate ethical principles and support responsible AI adoption.
To understand AI and explainable AI, we should briefly look at the differences between them. It's important to understand what each type of AI delivers and how.
While AI often uses machine learning algorithms to arrive at conclusions (without necessarily understanding the path taken), XAI (Explainable AI) employs specific strategies and procedures to ensure that every choice made during the ML process can be tracked and justified. Lack of explainability results in limited control, accountability, and auditability, and makes accuracy checks challenging.
2.2 There are three primary ways to set up XAI approaches:
Prediction accuracy – Accuracy is crucial in evaluating AI’s performance. It can be determined by running simulations and comparing the XAI output with training data outcomes. A commonly used method is LIME (Local Interpretable Model-Agnostic Explanations), which explains how ML algorithms predict classifiers.
Traceability – This is achieved by narrowing the scope of ML rules and restricting decision pathways. An example is DeepLIFT (Deep Learning Important FeaTures), which compares a neuron’s activation to a reference neuron and shows a traceable link and dependencies.
Decision understanding – This human-centric aspect builds trust in AI systems by helping users understand the reasoning behind AI decisions.
In simple terms:
Interpretability refers to the extent to which an observer can understand the reasoning behind a model's decision.
Explainability goes a step further—it focuses on how the AI arrived at that outcome, and the likelihood that humans can predict and trust the model's output.
2.3 Benefits of Explainable AI
Promotes trust and confidence in AI adoption
Enables faster outcomes with reduced risk and model costs
Supports critical sectors like healthcare, finance, and law
Enhances model auditability, transparency, and accountability
3. Conclusion
To conclude, it is important for us humans to understand the mechanisms behind what these black-box models do—how the decision-making process takes place, and if something goes wrong, what exactly went wrong.
For instance, consider a car accident: why did the car turn left when it was supposed to turn right to avoid a collision? Situations like this involve multiple stakeholders, and it becomes crucial not only for the individual using the machine learning model (in this case, the driver) but also for associated stakeholders who may play significant roles in accountability and safety.
In another example, if a doctor prescribes medication to a patient for back pain, the clinician cannot justify the decision by simply saying, “The model told me to.” The patient has every right to question the authenticity and ethics of that decision. The patient may demand an explanation—why was this particular prescription recommended by the model? This process of demanding and receiving a rationale is known as recourse.
Therefore, understanding interpretability and explainability in AI is essential. Without knowing the why and the how behind a model’s decision, such technology cannot truly serve or benefit humanity.
References:
- Interpretable Deep Learning- Deep Learning in Life Sciences - Lecture 05 (Spring 2021)
- Figure 2 from "Navigating the Multiverse: a Hitchhiker’s guide to selecting harmonization methods for multimodal biomedical data" https://pmc.ncbi.nlm.nih.gov/articles/PMC12043205
- What is explainable AI? - https://www.ibm.com/think/topics/explainable-ai.
Comments
Post a Comment