2 Local Explanations - Concept and Methods

XAI Course Notes

Machine learning models can be analyzed at a high level using global explanations, such as linear model coefficients. However, there are several limitations to these global explanations. In this talk, I will review the use cases where local explanations are needed and introduce two popular methods for generating local explanations LIME and SHAP. Our learning will be focused on SHAP, its theory, model-agnostic and model-specific versions, and how to use and read SHAP visualizations.

explainable AI
XAI
machine learning
ML
data science
contrafactuals
global explanations
local explanations
LIME
SHAP
CI
Author

Oren Bochman

Published

Monday, March 13, 2023

XAI is all about illuminating the opaque inner working of black box model. These are the type of models data scientist prefer to deploy to production as they tend to give better results. The rub is that many end users and other stakeholders like executives may not trust the predictions made by such models. After all we all learned that:

all model are wrong but some are useful.

XAI empowers the data scientist with post hoc methods that manipulate the black box model and make the outcomes more approachable to users.

There are added benefits - when we use local explanations to understand why the model is giving bad predictions for specific entries. This understanding is the best way to move forward and improve the model. We can also use these to understand the biases that tend to creep into our model so we can take steps to mitigate it.

This is a fascinating session on XAI, building on the previous session. I’ve embedded the video below.

The speakers did not provide code samples. I have tried to add some code samples but any shortcoming are mine.

Series Poster

series poster

series poster

Session Video

This is the video for this session:

Instructor Biographies

  • Bitya Neuhof
    • Ph.D student, Statistics & Data Science
    • HUJI
    • Bitya is a Ph.D. student in Statistics and Data Science at the Hebrew University, exploring and developing explainable AI methods. Before her PhD she worked as a Data Scientist specializing in analyzing high-dimensional tabular data. Bitya is also a Core-Team member at Baot, the largest Israeli community of experienced women in R&D.
    • linkedin profile
  • Yasmin Bokobza
    • ML Scientist Leader
    • Microsoft
    • Yasmin is a ML Scientist Leader and Mentor in the Startups Accelerator program at Microsoft. Her work focuses on developing (ML) models for Microsoft Cloud Computing Platforms and Services. Part of her work has been filed as patents, published in Microsoft Journal of Applied Research (MSJAR), and presented at various conferences, meetups and webinars. Previously her work focused on the security field developing ML models to detect cyber-attacks and methods to harvest leaked information in social networks using socialbots and crawler and detecting the source of the leak. She is listed as a cyber threat detection method patent author and part of her research was published at the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. Yasmin graduated fast track for an MSc degree, that focused on ML & Security, in the department of Information Systems Engineering at Ben-Gurion University in Israel.
    • linkedin profile

Agenda

  • Approaches:
    • Post-hoc - create a new model to explain the main model.
    • Transparent/Intrinsic models - e.g. a probabilistic model
  • Local v.s. Global
  • Post-hoc Explainability
    • Technique Categorization
    • Lime
    • SHAP
  • Conclusions

Explainability approaches

Explainability approaches

Explainability approaches
  • Post hoc techniques - make use an explainer model to provide explanations.
  • Transparent models - can be queried directly to provide explanations
    • probabilistic models
    • decision trees
    • regression models

Local V.S. Global Explanations

Next we look at the difference between Global and local explanations.

Global Explanations

  • Global explanations describe the average behavior of a ML model.
    • What for?
      • Provide insights into the overall behavior of ML model
      • Can help identify patterns and relations in the data learned by the model
    • Techniques:
      • Decision Tree
    • Why?
      • Analyze the general behavior of the model
      • Identify important features for the model’s predictions
      • Feature selection
      • Model optimization
    • Why Not?
      • What is a sensible way to aggregate a model ?
      • May Oversimplify a complex model.
      • Which leads to inaccurate interpretations.

Local Explanations

  • Local explanations are interpretation of the ML prediction for individual instances. 1
    • What for?
      • Provide a detailed understanding of how a model arrived at its prediction for a specific input.
      • Can help identify and correct model errors
      • Foster trust in stakeholders whom are skeptical of black box models.
    • Techniques:
    • Why?
      • Provides insights into predictions for specific rows.
      • A complex model can be simple locally. 2
      • Can explain changes of prediction for rows without changes in the model.
    • Why Not?
      • Limited in scope.
      • Does not provide a holistic understanding of the model.
      • Constitutionally expensive for large datasets

Local & Global method Comparison

Table 1: Local & Global method Comparison

Post-hoc Explainability

Techniques Categorization

Techniques Categorization Table ## Post-hoc Explainability Table

Table 2: Post-hoc Explainability Table

LIME

LIME Post-hoc

LIME Post-hoc
  • the advantages is we can perturb by adding some noise the input.
    • this can be done in human understandable ways
    • we get contrafactuals about which we may should have good intuition.
  • the intuitions is it can be much easier to understand a complex model using a local linear model.

Code Examples

let’s use the salary prediction dataset from Kaggle to try XAI methods:

salary prediction dataset overview

salary prediction dataset overview

Load the dataset

Salary Prediction DS

Salary Prediction DS
import numpy as np
import pandas as pd
from itables import show
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
import xgboost as xgb

df = pd.read_csv('./data/Salary Data.csv')
show(df.head())
1
import the usual suspects
2
load the salary dataset
3
peek at the data
Table 3
Age Gender Education Level Job Title Years of Experience Salary
Loading... (need help?)

raw Salary DataSet

Cleanup the dataset

Preprocessing

Preprocessing
  • we can see that there are lots of categorical features
  • also there are missing values
  • we should encode gender as numeric or boolean
  • we should encode education level using dummy variables
from sklearn.preprocessing import LabelEncoder,  OneHotEncoder

df = (  df.dropna()
          .drop_duplicates()
          .assign(is_male=lambda x: x['Gender'].apply(lambda y: 1 if y == 'Male' else 0),
                  is_PhD=lambda x: x['Education Level'].apply(lambda y: 1 if y == 'PhD' else 0),
                  is_BA=lambda x: x['Education Level'].apply(lambda y: 1 if y == 'Bachelor\'s' else 0),
                  is_MA=lambda x: x['Education Level'].apply(lambda y: 1 if y == 'Master\'s' else 0),
                 
          )
          .rename(columns={'Years of Experience':'xp'})
          .drop(['Gender','Education Level','Job Title'],axis=1)

    )

#df['Education Level'] = edu_label_encoder.fit_transform(df['Education Level'])
#job_title_encoder = LabelEncoder()
#df['Job Title']=job_title_encoder.fit_transform(df['Job Title'])
show(df.head())
1
import the usual suspects
2
remove rows with missing values
3
remove duplicate entries
4
recode gender to is_male
5
recode categorical education level to dummies
6
rename columns
7
drop columns
8
peek at the data
Table 4
Age xp Salary is_male is_PhD is_BA is_MA
Loading... (need help?)

cleaned Salary data set


<class 'pandas.core.frame.DataFrame'>
Index: 324 entries, 0 to 371
Data columns (total 7 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Age      324 non-null    float64
 1   xp       324 non-null    float64
 2   Salary   324 non-null    float64
 3   is_male  324 non-null    int64  
 4   is_PhD   324 non-null    int64  
 5   is_BA    324 non-null    int64  
 6   is_MA    324 non-null    int64  
dtypes: float64(3), int64(4)
memory usage: 20.2 KB

Fit a Decision Tree

from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn import metrics

y = df['Salary']
X = df.drop(['Salary'], axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)
1
import the usual suspects
2
target variable
3
features
4
perform a test/train split
dt_clf_model = DecisionTreeRegressor(
  max_depth=3, 
  random_state=123)
dt_clf_model.fit(X_train, y_train)
#Predict the response for test dataset
y_pred = dt_clf_model.predict(X_test)

# Model Accuracy, how often is the classifier correct?
#print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
DecisionTreeRegressor(max_depth=3, random_state=123)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
import graphviz

dot_data = tree.export_graphviz(dt_clf_model, out_file=None, 
                              feature_names=X_train.columns,  
                              class_names=y,  
                              filled=True, rounded=True,  
                              special_characters=True)

graph = graphviz.Source(dot_data) 
graph
plt.show()
Figure 1: A simple decision tree for the Salary DataSet

LIME for Tabular Data

LIME for Tabular

LIME for Tabular
from  lime import lime_tabular

y_pred = dt_clf_model.predict(X_test)

feature_names=X_train.columns
lime_explainer = lime_tabular.LimeTabularExplainer(
      training_data=X_train.to_numpy(),
      feature_names=feature_names,
      class_names=['Salary'],
      categorical_features=['is_male','is_BA','is_MA','is_PhD'],
      verbose=True,
      mode='regression')

i = np.random.randint(0, X_test.shape[0])

exp = lime_explainer.explain_instance(X_test.values[i,:], 
                                      dt_clf_model.predict, 
                                      num_features=5,
                                      num_samples=100)
exp.as_list()
Intercept 120535.02045561386
Prediction_local [43651.18047193]
Right: 39310.46511627907
/home/oren/work/blog/env/lib/python3.10/site-packages/sklearn/base.py:493: UserWarning:

X does not have valid feature names, but DecisionTreeRegressor was fitted with feature names
[('xp <= 4.00', -50278.823048445236),
 ('Age <= 31.00', -20345.143828960205),
 ('is_male <= 0.00', -4587.43533357944),
 ('is_BA <= 0.00', -2615.2342590476005),
 ('0.00 < is_MA <= 1.00', 942.7964863512598)]

text output for a lime explainer

LIME for Tabular Viz

LIME for Tabular Viz
exp.show_in_notebook(show_table=True)
Table 5

A graphical LIME explaination for an entry in the Salary DataSet

import shap
explainer = shap.TreeExplainer(dt_clf_model,X_test)
shap_values = explainer.shap_values(X)
shap_values[i]
Listing 1
array([57669.41742788, 11277.1547476 ,     0.        ,     0.        ,
       12894.87367788,     0.        ])

LIME an intuitive explantion

LIME Post-hoc

LIME Post-hoc
  1. Our data is a complex manifold with non-convex boundry pink region
  2. repeat:
  3. We pick a single row r_i in the data set which we call an instance.
  4. We then perturb it by modifying the instance randomly p_i=x_i + \delta
  5. We generate a prediction for the perturbation using our black box model \hat y_{p_i}
  6. We reweigh each perturbation using the relative distance of the prediction: w \propto | \hat{y} - \hat y_{p_i} |

More precisely, the explanation for a data point x is the model g that minimizes the locality-aware loss L(f,g,Π_x) measuring how unfaithful g approximates the model to be explained f in its vicinity Π_x while keeping the model complexity denoted low.

\arg\min _g L(f,g,\pi_x)+\Omega(g)

Therefore, LIME experiences a trade off between model fidelity and complexity

for more information on lime consult (Molnar 2022) section on Lime .


LIME for Images

LIME for Images

LIME for Images

LIME for Images

LIME for Images

LIME pros & Cons

LIME Pros & Cons

LIME Pros & Cons

SHAP

Terminology

Terminology

Shapley Values

Shapley Values

  • Link to Wikipedia article
  • Lloyd Shapley was the Noble Memorial Prize Laureate for this gem back in in 2012
  • Far a cooperative game it considers all coalitions and lets us see how much each is contributing to overall surplus.
  • This idea can then be used to decide how divide the surplus (profit) most fairly.
  • Think how the extremest can set the tone for a coalition by threatening to break it up.

1. Efficiency - The sum of the Shapley values of all agents equals the value of the grand coalition, so that all the gain is distributed among the agents: 2. Symmetry - equal treatment of equals 3. Linearity - If two coalition games described by gain functions {\displaystyle v} and {\displaystyle w} are combined, then the distributed gains should correspond to the gains derived from {\displaystyle v} and the gains derived from {\displaystyle w} 4. Monotonically 5. Null Player - The Shapley value \varphi _{i}(v) of a null player i in a game v is zero.

Shapley Fairness

Shapley Formula

Shapley Formula

In ML

In ML

Shapley Problems

Shapley Problems

Shapley for ML

Shapley for ML

SHAP

SHAP

SHAP - Shapley Addative Explanations

Kernel SHAP

Kernel SHAP

Tree SHAP

Tree SHAP

Decision Tree

Decision Tree

TreeExplainer

TreeExplainer

Kernel Explainer

Kernel Explainer

SHAP Visualization

Local View – Waterfall Plot

Local Waterfall Plot

Local Waterfall Plot

Local View – Bar Plot

Local Bar Plot

Local Bar Plot

Global View – Bar Plot

Global Bar Plot

Global Bar Plot

Global Bar Plot

Global Bar Plot

Global View – Beeswarm Plot

Global Beeswarm

Global Beeswarm

Global View – Scatter Plot

Global Scatter Plot

Global Scatter Plot

Global View – Scatter Plot

Globle Scatter Plot

Globle Scatter Plot

Global View – Scatter Plot

Globle Scatter Plot

Globle Scatter Plot

Model Hierarchy

Model Hierarchy

Model Hierarchy

Local Uncertainty

Local Uncertainty

Local Uncertainty

References

References

References

Conclusion

This course presented so much information it is easy to loose sight of some key point, so here are a few conclusions.

  • other approaches which include EDA.
  • using more transparent models e.g. regressions or statistical models.
  • by far the most prevalent approach in XAI is post hoc methods.
  • defined global and local explanations and noted their limitations.

What do we mean by explanations in XAI:

  • could be any number of visualization.
  • could be a simplified model. 💡 locally a complex manifold may look flat.
  • could be a ranking of the features by their contribution. 💡 SHAP and MIE
  • could be by picking related examples 💡 KNN

References

Breiman, L., J. Friedman, C. J. Stone, and R. A. Olshen. 1984. Classification and Regression Trees. Taylor & Francis. https://www.google.com/books?id=JwQx-WOmSyQC.
Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emily Pitkin. 2013. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24: 44–65. https://api.semanticscholar.org/CorpusID:88519447.
Lundberg, Scott, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” https://arxiv.org/abs/1705.07874.
Molnar, Christoph. 2022. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 2nd ed. https://christophm.github.io/interpretable-ml-book.
Poyiadzi, Rafael, Kacper Sokol, Raúl Santos-Rodriguez, Tijl De Bie, and Peter A. Flach. 2019. FACE: Feasible and Actionable Counterfactual Explanations.” CoRR abs/1909.09369. http://arxiv.org/abs/1909.09369.
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. Why Should I Trust You?: Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44. KDD ’16. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2939672.2939778.
Selvaraju, Ramprasaath R., Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, and Dhruv Batra. 2016. “Grad-CAM: Why Did You Say That? Visual Explanations from Deep Networks via Gradient-Based Localization.” CoRR abs/1610.02391. http://arxiv.org/abs/1610.02391.
Zilke, Jan Ruben, Eneldo Loza Mencía, and Frederik Janssen. 2016. “DeepRED - Rule Extraction from Deep Neural Networks.” In IFIP Working Conference on Database Semantics. https://api.semanticscholar.org/CorpusID:10289003.

Footnotes

  1. i.e. for a breakdown for the given prediction↩︎

  2. think anomalies and sub-populations↩︎

Reuse

CC SA BY-NC-ND

Citation

BibTeX citation:
@online{bochman2023,
  author = {Bochman, Oren},
  title = {2 {Local} {Explanations} - {Concept} and {Methods}},
  date = {2023-03-13},
  url = {https://orenbochman.github.io/notes/XAI/l02/},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2023. “2 Local Explanations - Concept and Methods.” March 13, 2023. https://orenbochman.github.io/notes/XAI/l02/.