Introduction to XAI

XAI Course Notes

In this introduction lecture on explainability in AI, we will delve into the key topics that surround this emerging field. We will first provide an overview of the motivation for explainability, exploring how it helps us to achieve more transparent and trustworthy AI systems, particularly from a managerial perspective. We will then define some of the key terminology in the field and differentiate between black box explanation and interpretable ML. We will discuss the differences between global and local explanations, and include many examples from different fields and use cases throughout the lecture. Next, we will examine the “built-in” feature importance methods that are commonly used for regression and trees, and discuss the strengths and limitations of these methods. Overall, this lecture will provide a comprehensive introduction to explainability in AI, covering the key topics and terminology that are essential for understanding this field.

explainable AI
XAI
machine learning
ML
data science
contrafactuals
casual inference
CI
Authors
Affiliations

Rada Menczel

Maya Bercovitch

Published

Sunday, March 5, 2023

Course

course poster

course poster

Course Goals

  • The XAI course provides a comprehensive overview of explainable AI,
    • covering both theory and practice, and
    • exploring various use cases for explainability.
  • Participants will learn how
    • to generate explanations,
    • to evaluate explanations, and
    • effectively communicate these to diverse stakeholders.

overview link

Session Description

  • In this introduction lecture on explainability in AI, we will delve into the key topics that surround this emerging field.
  • Overall, this lecture will provide a comprehensive introduction to explainability in AI, covering the key topics and terminology that are essential for understanding this field.

Session Description

  • Motivate explainability.
    • Explore how it achieve greater transparency and trustworthiness in AI systems,
  • Provide the the key terminology
  • Discuss the differences between global and local explanations
  • Examine the “built-in” feature importance methods commonly used for regression and trees.

Session Video

Speakers

Speakers

Speakers

Introduction to XAI

What is Explainability?

Explainability Definition

Explainability Definition

What do we mean by Explainability?

  • We define explainability by:

“The ability of an AI system or algorithm to explain its decision making process in a way that humans can understand” 1

An explanation is the answer to a why question – (Miller 2017)

What do we mean by Explainability?

The capacity of an model to back predictions with a human understandable interpretation of the impact of inputs on predictions.

  • What humans find understandable differs widely.
  • Learning in ML can differ greatly:
    • Parametric models learn a handful of parameters,
    • Non-parametric model may learn billions.
  • Explanations are subjective
    • Artifacts of the model, not the data
    • Reflect any inductive bias in the model 2

Agenda

Talk Agenda

  • Motivation
  • What is XAI
  • Introduction to trees
  • XAI in the forest

Motivation

  • AI market size is rapidly expanding and projected to reach 1.6 Billion by 2030 (Research, n.d.)
  • More ML projects are reaching deployment

Motivation

AI market size in 2022 from (Research, n.d.)

AI market size in 2022 from (Research, n.d.)

Motivation

Gartner on AI Project deployments

Motivation

AI adoption in the enterprise 2021

How can XAI be useful?

XAI to Avoid Biases in ML Models

Avoiding ML bias in (Dastin 2018)

Avoiding ML bias in (Dastin 2018)
  • Source of the bias is that they trained on 10 years of worker’s CVs. surprise their workforce had a bias and the model perpetuated it.

XAI to Avoid Biases in ML Models

  • XAI can reveal bias before models reach production.
  • Example:
    • A US based client started doing business abroad.
    • New non US prospects were misclassified.
    • 🤯 XAI showed the country biased against non US prospects.
    • \implies dropped the country feature from the model.

XAI to Avoid Biases in ML Models - Comments 1

  • Devils Advocate:😈
    • Q. Why add a features like country if all activity is in one country?
    • Q. Why drop it? Won’t country be an informative feature going forward?
    • Q. Won’t this be an issue for each new country added?
    • \implies Partial Pooling can learn to strike a balance 🤔

XAI to Avoid Biases in ML Models - Comments 2

  • What is a unbiased estimator?
    • estimators are unbiased w.r.t. some specific criteria.
    • there is a bias variance trade-off.
    • which is worse depends on the cost of type I errors vs type II errors

XAI to Avoid Biases in ML Models - Comments 3

  • adding more criteria will reduce its performance on the main metric (i.e. variance).
  • people tend to like a biased estimator with small variance to unbiased one with high variance.
  • it looks like a class imbalance problem for which there are well know solutions like re-sampling and weighting.
  • the datasets in upstream models may be the issue
    • how can we detect and correct in these models.
    • ignoring for the moment the costs of sourcing better data what do we do when the bias comes from the real world (gender gap in payment).
  • and how can we avoid making the bias bigger?

XAI to Avoid Biases in ML Models

Chat GPT political bias

Chat GPT political bias

Response

Response

XAI to Avoid Biases in ML Models

  • Predicting which prospective customers will convert
    • current market is is in the US
    • Model accuracy on test is high
    • Predictions distribution over time is off?
  • What to do next?

Feature selection

XAI for feature selection.

XAI for feature selection.
  • One learns in linear regression 101, that the \text{adjusted } R^2 let’s you gauge the performance of models built with different features. This means we already should have a principled approach to feature selection.

  • the most obvious method – stepwise regression is prone to overfitting if there are many features and the Bonferroni point 3 which governs the admissibly of non-spurious features is \approx \sqrt{2\log p} for the t-test (where p is the number if predictors). However this is will reject good features.

  • the Benjamini–Hochberg procedure procedure is less conservative and avoid the use of p-values which are amenable to p-hacking.

  • In black box model like a Deep Neural Networks the model learns its own features so again I don’t see how XAI is going to be able to help out.

  • Gelman and Hill (2007) pointers out that adding features to a regression can lead to a regression formula that does not make sense. They suggest a procedure that lead to an interpretable model. However the culture in ML is rather different than in statistical learning.

  • If we work with a Causal DAG we may well de have even more to say on the

  • Q. So what more can XAI informs us as to features selection?

XAI to Investigating Bugs 1

Investigating Bugs Investigating Bugs
Investigating Bugs Investigating Bugs

XAI to Investigating Bugs 2

Investigating Bugs Investigating Bugs

XAI to support business decisions

  • External data consumption to improve prediction
  • Explainability to create a personalized well suited sales pitch

Who Needs Explanations ?

Who needs explanations

Who needs explanations

Explaining the Data vs. Explaining the Model

Feature Description

  • Characteristics of the input data
  • E.g.:
    • Feature correlation
    • Anomalies & Extreme values
    • Feature values distribution

Feature Contribution

  • Feature’s impact on predictions
  • Not aligned with feat. correlation to target variable
  • E.g.:
    • Feature importance in trees
    • SHAP values

slide

slide

Properties of Explanations

White Box

  • An interpretable model.
  • Humans can understand how the model makes predictions.
  • Examples:
    • linear and logistic regression
    • decision tree

Black Box

  • Do not reveal their internal mechanisms
  • Cannot be understood by looking at their parameters
  • Examples:
    • Deep Neural Nets
    • XGBoost, Random Forest

Properties of Explanation Methods

  • Predictive mode interpretation level
  • Explanation creation time
  • Model Agnostic vs. model specific
  • Global and Local explanations
  • Explanation structure
  • Explanation reproducibilty

Performance & Interpretability Trade-off

spectrum of interpretability

spectrum of interpretability

Performance & Interpretability Trade-off

The trade-off between predcitive power and interpretability

The trade-off between predcitive power and interpretability

Intrinsic & Extrinsic Methods

Intrinsic

  • ML model that are considered interpretable due to their simple structure.
  • Explanation methods that rely on looking into ML models, like its parameters
  • No additional complexity or resources requires

Extrinsic

  • Applying methods that analyze the model after training

  • Post hoc methods can also be applied to intrinsically interperetable models

  • Additional complexity - XAI algorithms and computation resources requried

Post Hoc XAI using Surrogate Models

Post Hoc methods create and use a surrogate model to explain predictions

Post Hoc methods create and use a surrogate model to explain predictions

Model Specific & Model Agnostic Methods

Model Specific

  • Limited to specific model type.
  • Examples:
    • Regression weights in a linear model
    • GINI importance score in a decision tree

Model Agnostic

  • XAI tools for any ML Model
  • Pos hoc methods that
  • Map input output pairs
  • Examples:
    • SHAP
    • LIME

Local and Global Methods

Contasting Global with Local views of the data

Contasting Global with Local views of the data

Explain the Predictions of a Segment

A segment of the data

A segment of the data

Explanation Structure

Explanation Structure - Numerical, Graphs and Text

Explanation Structure - Numerical, Graphs and Text

Graphs Representation for SHAP

SHAP global feat importance SHAP bee-swarm plot shows the global importance of each feature and the distribution of effect sizes SHAP dependence plot

SHAP Visulizations

SHAP Visulizations

Explanation Repoducibility

  • Most post hoc techniques use random samples of the data and premutation vlues

  • This results in inconsistant results - for the same model we can get different explanations.

  • As data scientists we should be aware of this and consider consistanc if applicable/required.

slide

slide

slide

slide

Part 1 Summary

  • The demand for XAI is high

  • XAI can be achieved in many ways

  • Think about the set of considerations discussed before choosing a method4

  • Choose wisely


Disicion Trees

Why Decision Trees?

  • Easy to explain.

  • Clear structure - order and hierarchy.

  • Simple interpretability.

  • Can be converted into rules.

  • Often used as a surrogate model


How do we build Decision Trees?

Entropy - the measurement of the impurity or randomness in the data points

Information Theory: Entropy

slide

slide

Entropy

Entropy

Entropy

Entropy

Information Theory: Conditional Entropy

Conditional Entropy

Conditional Entropy

Information Theory: Mutual Information


Mutual Information

Mutual Information

slide

slide

Decision Tree

slide

slide

Example

slide

slide

Decision Tree - Entropy Calculation

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

\begin{aligned} Entropy(Play)&= -p_{No}\log(P_{No})-p_{Yes}\log(P_{Yes})\\ &= -\frac{5}{14}\log_2{\frac{5}{14}}-\frac{9}{14}\log_2{\frac{9}{14}}\\ &=0.94 \end{aligned} \qquad \tag{1}


Information Theory: Discretization

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

Information Theory: Gini Index

GINI Index

GINI Index

Entropy and Gini

slide

slide

slide

slide

Decision Tree - Iris Dataset

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

Decision Tree - Titanic Dataset

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

Feature Importance - Mean Decrease in Impurity (MDI)

First introduced in (Breiman 2001b)

Mean Decrease in Impurity Feature Importance

Mean Decrease in Impurity Feature Importance

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

slide

MDI Feature Importance Calculation

MDI Feature Importance Calculation


Feature Importance - Permutation Feature Importance

This is defined by sk-learn as follows:

  • Inputs: fitted predictive model m, tabular dataset (training or validation) D.
  • Compute the reference score s of the model m on data D (for instance the accuracy for a classifier or the R^2 for a regressor).
  • For each feature j (column of D):
    • For each repetition k in 1,\ldots,K:
      • Randomly shuffle column j of dataset D to generate a corrupted version of the data named \bar D_{k,j}.
      • Compute the score s_{k,j of model m on corrupted data \bar D_{k,j}.
    • Compute importancei_j for feature f+j defined as:

i_j=s-\frac{1}{K}\sum_{k=1}^Ks_{k_j} \qquad \tag{2}


slide

slide

slide

slide

slide

slide

slide

slide

Random Forest

introduced in (Ho 1995) and extended in (Breiman 2001a).

  • Ensemble of decision trees.

    • N – number of training samples

    • M – number of features

    • n_estimators – The number of trees in the forest

    • Create n_estimators decision trees using

      • N samples with replacement

      • m<M features for each step typically m-\sqrt{M}


wisdom of crowds

wisdom of crowds

Decision Tree - Iris Dataset

slide

slide

MDI Feat Importance Iris Dataset


slide

slide

How to calculate Feature Importance in Random Forest?

MDI Feature importance for Random Forest

MDI Feature importance for Random Forest

slide

slide

slide

slide

slide

slide

slide

slide

Feature Importance Methods

slide

slide

Feature Importance Score

slide

slide

Summary

  • Motivation
  • Explain XAI
  • Introduction to decision Trees
  • XAI in the Forest

Thank You

contact

contact

credits

credits

References

  • https://www.youtube.com/watch?v=6qisPX7o-bg
Breiman, Leo. 2001a. “Random Forests.” Machine Learning 45 (1): 5–32. https://doi.org/10.1023/a:1010933404324.
———. 2001b. Machine Learning 45 (1): 5–32. https://doi.org/10.1023/a:1010933404324.
Dastin, Jeffrey. 2018. “Amazon Scraps Secret AI Recruiting Tool That Showed Bias Against Women.” https://www.reuters.com/article/amazoncom-jobs-automation/insight-amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSL2N1VB1FQ/?feedType=RSS%26feedName=companyNews.
Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Vol. Analytical methods for social research. New York: Cambridge University Press.
Ho, Tin Kam. 1995. “Random Decision Forests.” In Proceedings of 3rd International Conference on Document Analysis and Recognition, 1:278–282 vol.1. https://doi.org/10.1109/ICDAR.1995.598994.
Miller, Tim. 2017. “Explanation in Artificial Intelligence: Insights from the Social Sciences.” CoRR abs/1706.07269. http://arxiv.org/abs/1706.07269.
Molnar, Christoph. 2022. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 2nd ed. https://christophm.github.io/interpretable-ml-book.
Research, Precedence. n.d. https://www.globenewswire.com/news-release/2022/04/19/2424179/0/en/Artificial-Intelligence-Market-Size-to-Surpass-Around-US-1-597-1-Bn-By-2030.html.

Footnotes

  1. Unfortunately, this is a circular definition.↩︎

  2. Trees are highly sensitive to small changes in the data↩︎

  3. The Bonferroni point, or adjusted p value is the point at which you need to adjust the p-value threshold due to multiple comparisons when performing feature selection . In simpler terms, it’s about accounting for the increased chance of falsely identifying significant features when you test many features simultaneously↩︎

  4. The last lecture provides some insights and charts to assist this step!↩︎

Reuse

CC SA BY-NC-ND

Citation

BibTeX citation:
@online{bochman2023,
  author = {Bochman, Oren},
  title = {Introduction to {XAI}},
  date = {2023-03-05},
  url = {https://orenbochman.github.io/notes/XAI/l01/},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2023. “Introduction to XAI.” March 5, 2023. https://orenbochman.github.io/notes/XAI/l01/.