Model Interpretability & Explainability

Tree-Based Feature Importance

Imagine you're buying a house and the model predicts it's worth $500K. You'd want to know: "Was it mostly the location? The size? The number of bathrooms?" Feature importance answers exactly this! For tree-based models (Random Forest, XGBoost), we can measure how much each feature helped the model make better decisions during training. The more a feature reduces "confusion" (impurity) across all the trees, the more important it is.

Key Concept

Interpretability vs Explainability

Interpretability: "Can I understand HOW this model works?" — like looking at a simple linear equation y = 2x + 3. You understand the whole thing. Explainability: "Can I understand WHY it made THIS specific prediction?" — like asking "why did it predict $500K for THIS house?" Both matter! Banks need to explain loan denials, doctors need to explain diagnoses, and you need to debug when things go wrong.

Random Forest Feature Importance

# Step 1: Train a model and extract importance
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Load dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Train model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

Setting up our experiment: We're using the breast cancer dataset — a classic dataset with 30 features (things like tumor radius, texture, perimeter, etc.) to predict if a tumor is malignant or benign.

30 features: Various measurements of cell nuclei from images
Random Forest: 100 decision trees working together — each tree votes and we take the majority
The goal: Which of these 30 measurements matter most for detecting cancer?

# Step 2: Extract and visualize feature importance
import matplotlib.pyplot as plt

# Get feature importance
importances = rf.feature_importances_
indices = np.argsort(importances)[::-1]

# Create importance DataFrame
importance_df = pd.DataFrame({
    'feature': X.columns[indices],
    'importance': importances[indices]
}).head(15)

print("Top 15 Features by Importance:")
print(importance_df.to_string(index=False))

What's happening: feature_importances_ returns a score for each feature showing how much it helped reduce "impurity" (confusion) across all 100 trees.

Higher importance: The feature was frequently used at decision splits AND those splits were effective at separating the classes
argsort[::-1]: Sort from highest to lowest importance
Think of it like: "Which ingredients matter most for this recipe?" — the ones that dramatically change the taste!

# Step 3: Visualize feature importance
plt.figure(figsize=(10, 8))
plt.barh(range(15), importance_df['importance'], align='center')
plt.yticks(range(15), importance_df['feature'])
plt.xlabel('Feature Importance')
plt.title('Random Forest Feature Importance (Top 15)')
plt.gca().invert_yaxis()  # Highest importance at top
plt.tight_layout()
plt.show()

Reading the chart: The longest bar = most important feature. This tells you what the model relies on most.

For doctors: "These 3 measurements are what the model cares about most"
For debugging: "Wait, why is 'patient ID' important? That's a data leak!"
invert_yaxis(): Puts the most important feature at the top for easier reading

Pro tip: If a useless feature (like ID) ranks high, something's wrong with your data!

Limitation: Tree-based importance can be biased toward high-cardinality and correlated features. It measures training importance, not necessarily predictive importance on new data.

Practice Questions

Task: Train both Random Forest and Gradient Boosting on the same data. Compare their feature importance rankings.

Show Solution

from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
import pandas as pd

# Train both models
rf = RandomForestClassifier(n_estimators=100, random_state=42)
gb = GradientBoostingClassifier(n_estimators=100, random_state=42)

rf.fit(X_train, y_train)
gb.fit(X_train, y_train)

# Compare top 10 features
comparison = pd.DataFrame({
    'feature': X.columns,
    'rf_importance': rf.feature_importances_,
    'gb_importance': gb.feature_importances_
})

comparison['rf_rank'] = comparison['rf_importance'].rank(ascending=False)
comparison['gb_rank'] = comparison['gb_importance'].rank(ascending=False)

print(comparison.nsmallest(10, 'rf_rank')[['feature', 'rf_rank', 'gb_rank']])

Task: Normalize feature importance values so they sum to 1 (100%). Display as percentages.

Show Solution

import pandas as pd
from sklearn.ensemble import RandomForestClassifier

# Train model
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Normalize to sum to 1
importances = rf.feature_importances_
normalized = importances / importances.sum()

# Create DataFrame with percentages
importance_df = pd.DataFrame({
    'feature': X.columns,
    'importance': normalized,
    'percentage': normalized * 100
}).sort_values('importance', ascending=False)

print("Top 10 Features (as % of total importance):")
print(importance_df.head(10).to_string(index=False))
print(f"\nSum of all percentages: {importance_df['percentage'].sum():.1f}%")

Task: Create a cumulative importance plot. Find how many features are needed to capture 90% of total importance.

Show Solution

import matplotlib.pyplot as plt
import numpy as np

# Get sorted importance
sorted_idx = np.argsort(rf.feature_importances_)[::-1]
sorted_importance = rf.feature_importances_[sorted_idx]
cumulative = np.cumsum(sorted_importance)

# Find number of features for 90% importance
n_features_90 = np.argmax(cumulative >= 0.9) + 1

# Plot
plt.figure(figsize=(10, 5))
plt.plot(range(1, len(cumulative) + 1), cumulative, 'b-', linewidth=2)
plt.axhline(0.9, color='r', linestyle='--', label='90% threshold')
plt.axvline(n_features_90, color='g', linestyle='--', label=f'{n_features_90} features')
plt.xlabel('Number of Features')
plt.ylabel('Cumulative Importance')
plt.title('Cumulative Feature Importance')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print(f"Only {n_features_90} features capture 90% of importance!")

Task: Get feature importance from each tree in the Random Forest and plot with confidence intervals (std across trees).

Show Solution

import numpy as np
import matplotlib.pyplot as plt

# Get importance from each tree
importances_per_tree = np.array([tree.feature_importances_ for tree in rf.estimators_])

# Calculate mean and std
mean_importance = importances_per_tree.mean(axis=0)
std_importance = importances_per_tree.std(axis=0)

# Sort by mean importance
sorted_idx = np.argsort(mean_importance)[::-1][:15]

# Plot with error bars
plt.figure(figsize=(10, 8))
plt.barh(range(15), mean_importance[sorted_idx], 
         xerr=std_importance[sorted_idx], capsize=3)
plt.yticks(range(15), X.columns[sorted_idx])
plt.gca().invert_yaxis()
plt.xlabel('Feature Importance (mean ± std)')
plt.title('Feature Importance with Confidence Intervals')
plt.tight_layout()
plt.show()

Task: Create a dataset with a "leaky" feature (highly correlated with target). Show how feature importance reveals the leak.

Show Solution

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier

# Create normal data
np.random.seed(42)
n = 500
data = pd.DataFrame({
    'feature_1': np.random.randn(n),
    'feature_2': np.random.randn(n),
    'feature_3': np.random.randn(n),
    'target': np.random.randint(0, 2, n)
})

# Add a LEAKY feature (derived from target with some noise)
data['leaky_feature'] = data['target'] * 10 + np.random.randn(n) * 0.5

X = data.drop('target', axis=1)
y = data['target']

# Train model
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X, y)

# Check importance
importance_df = pd.DataFrame({
    'feature': X.columns,
    'importance': rf.feature_importances_
}).sort_values('importance', ascending=False)

print("Feature Importance (look for suspiciously high values!):")
print(importance_df)
print("\n'leaky_feature' has extremely high importance - DATA LEAK!")

Task: Use feature importance to select top features. Compare model performance with all features vs top N features.

Show Solution

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
import numpy as np

# Train model to get importance
rf_full = RandomForestClassifier(n_estimators=100, random_state=42)
rf_full.fit(X_train, y_train)

# Get top features
sorted_idx = np.argsort(rf_full.feature_importances_)[::-1]
results = []

# Test with different numbers of top features
for n_features in [5, 10, 15, 20, 30]:
    top_features = X.columns[sorted_idx[:n_features]]
    X_subset = X[top_features]
    
    rf = RandomForestClassifier(n_estimators=100, random_state=42)
    scores = cross_val_score(rf, X_subset, y, cv=5, scoring='accuracy')
    results.append({'n_features': n_features, 'accuracy': scores.mean()})

# Show results
results_df = pd.DataFrame(results)
print("Model Performance vs Number of Features:")
print(results_df)
print("\nOften fewer features = similar or better performance!")

Permutation Importance

Here's a clever trick: what if we randomly scrambled one feature's values and measured how much the model's accuracy dropped? If scrambling "age" makes accuracy plummet, age must be important! If scrambling "favorite color" changes nothing, it's not useful. This is permutation importance — it works with ANY model (not just trees) and measures importance on real test data, not just training data.

How Permutation Importance Works

Train model and evaluate baseline performance on test data
For each feature: shuffle its values, re-evaluate, measure performance drop
Larger drop = more important feature

# Step 1: Calculate permutation importance
from sklearn.inspection import permutation_importance

# Calculate on test set (avoids overfitting bias)
perm_importance = permutation_importance(
    rf, X_test, y_test, 
    n_repeats=30,  # Number of times to shuffle each feature
    random_state=42,
    n_jobs=-1  # Use all CPU cores
)

# Create DataFrame with results
perm_df = pd.DataFrame({
    'feature': X.columns,
    'importance_mean': perm_importance.importances_mean,
    'importance_std': perm_importance.importances_std
}).sort_values('importance_mean', ascending=False)

print("Permutation Importance (Top 10):")
print(perm_df.head(10).to_string(index=False))

The process: For each feature, shuffle its values 30 times and measure the average accuracy drop.

n_repeats=30: Shuffle each feature 30 times to get a stable average (not just one random shuffle)
On test set: We measure on data the model hasn't seen — this tells us real predictive importance, not just training fit
importance_mean: Average accuracy drop across all shuffles
importance_std: How much the drop varies — low std = reliable importance

Analogy: It's like removing a player from a team and seeing how much worse they perform. The bigger the drop, the more important that player was!

# Step 2: Visualize with error bars
sorted_idx = perm_importance.importances_mean.argsort()[::-1][:15]

fig, ax = plt.subplots(figsize=(10, 8))
ax.barh(
    range(15),
    perm_importance.importances_mean[sorted_idx],
    xerr=perm_importance.importances_std[sorted_idx],
    align='center',
    capsize=3
)
ax.set_yticks(range(15))
ax.set_yticklabels(X.columns[sorted_idx])
ax.invert_yaxis()
ax.set_xlabel('Mean Decrease in Accuracy')
ax.set_title('Permutation Importance with Std Dev')
plt.tight_layout()
plt.show()

Error bars are your friend: They show how much the importance varies across shuffles.

Tall bar + small error: Reliably important — trust this feature!
Small bar + large error: Uncertain importance — might just be noise
xerr parameter: Adds horizontal error bars showing the standard deviation

Pro tip: Features with importance overlapping zero (error bar crosses zero) might not be truly important!

Comparing Tree vs Permutation Importance

# Compare both importance measures
comparison = pd.DataFrame({
    'feature': X.columns,
    'tree_importance': rf.feature_importances_,
    'perm_importance': perm_importance.importances_mean
})

# Normalize for comparison
comparison['tree_norm'] = comparison['tree_importance'] / comparison['tree_importance'].sum()
comparison['perm_norm'] = comparison['perm_importance'] / comparison['perm_importance'].sum()

# Find disagreements
comparison['rank_diff'] = abs(
    comparison['tree_norm'].rank(ascending=False) - 
    comparison['perm_norm'].rank(ascending=False)
)

print("Features with largest rank disagreement:")
print(comparison.nlargest(5, 'rank_diff')[['feature', 'tree_norm', 'perm_norm', 'rank_diff']])

When the two methods disagree, investigate!

Tree says important, Permutation says not: The feature might be overfitting — useful for training but not for real predictions
Tree says not important, Permutation says important: The feature might work through interactions with other features
Both agree: You can trust the importance ranking!

Rule of thumb: When in doubt, trust permutation importance — it's measured on held-out data!

Practice Questions

Task: Calculate permutation importance for Logistic Regression and compare with its coefficients. Do they agree?

Show Solution

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.inspection import permutation_importance

# Scale features for logistic regression
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train logistic regression
lr = LogisticRegression(max_iter=5000, random_state=42)
lr.fit(X_train_scaled, y_train)

# Permutation importance
perm = permutation_importance(lr, X_test_scaled, y_test, n_repeats=30, random_state=42)

# Compare coefficients vs permutation importance
comparison = pd.DataFrame({
    'feature': X.columns,
    'coefficient': np.abs(lr.coef_[0]),  # Absolute value for comparison
    'perm_importance': perm.importances_mean
})

comparison['coef_rank'] = comparison['coefficient'].rank(ascending=False)
comparison['perm_rank'] = comparison['perm_importance'].rank(ascending=False)

print(comparison.nsmallest(10, 'coef_rank')[['feature', 'coef_rank', 'perm_rank']])

Task: Calculate permutation importance for a trained model and find the top 5 most important features.

Show Solution

from sklearn.inspection import permutation_importance
from sklearn.ensemble import RandomForestClassifier

# Train model
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Calculate permutation importance on TEST set
perm_importance = permutation_importance(
    rf, X_test, y_test,
    n_repeats=10,
    random_state=42
)

# Create ranking
top_features = pd.DataFrame({
    'feature': X.columns,
    'importance': perm_importance.importances_mean
}).sort_values('importance', ascending=False)

print("Top 5 Most Important Features:")
print(top_features.head(5))

Task: Calculate permutation importance on both training and test sets. Compare the results - are they different?

Show Solution

from sklearn.inspection import permutation_importance

# Permutation importance on TRAIN set
perm_train = permutation_importance(rf, X_train, y_train, n_repeats=10, random_state=42)

# Permutation importance on TEST set
perm_test = permutation_importance(rf, X_test, y_test, n_repeats=10, random_state=42)

# Compare
comparison = pd.DataFrame({
    'feature': X.columns,
    'train_importance': perm_train.importances_mean,
    'test_importance': perm_test.importances_mean
})

comparison['difference'] = comparison['train_importance'] - comparison['test_importance']
print(comparison.sort_values('difference', ascending=False))

print("\nLarge differences may indicate overfitting to that feature!")
print("→ Features important on train but not test = may be overfitting")

Task: Use a custom scoring metric (F1 score) for permutation importance instead of default accuracy.

Show Solution

from sklearn.inspection import permutation_importance
from sklearn.metrics import make_scorer, f1_score

# Calculate permutation importance with F1 score
perm_f1 = permutation_importance(
    rf, X_test, y_test,
    n_repeats=10,
    scoring='f1',  # Use F1 instead of accuracy
    random_state=42
)

# Also try with accuracy for comparison
perm_acc = permutation_importance(
    rf, X_test, y_test,
    n_repeats=10,
    scoring='accuracy',
    random_state=42
)

# Compare rankings
comparison = pd.DataFrame({
    'feature': X.columns,
    'f1_importance': perm_f1.importances_mean,
    'acc_importance': perm_acc.importances_mean
})

print("Importance may differ based on what metric you care about!")
print(comparison.sort_values('f1_importance', ascending=False))

Task: Permutation importance can be misleading with correlated features. Demonstrate this problem and explain why it happens.

Show Solution

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance

# Create correlated features
np.random.seed(42)
n = 500
feature_1 = np.random.randn(n)
feature_2 = feature_1 + np.random.randn(n) * 0.1  # Highly correlated with feature_1
feature_3 = np.random.randn(n)  # Independent

y = (feature_1 + feature_3 > 0).astype(int)  # Both feature_1 and feature_3 are important

X = pd.DataFrame({
    'feature_1': feature_1,
    'feature_2': feature_2,  # Correlated copy
    'feature_3': feature_3
})

# Train and get permutation importance
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X, y)

perm = permutation_importance(rf, X, y, n_repeats=30, random_state=42)

print("Permutation Importance:")
for name, imp in zip(X.columns, perm.importances_mean):
    print(f"  {name}: {imp:.4f}")

print("\nProblem: feature_1 appears LESS important than expected!")
print("→ Because feature_2 can 'cover' for feature_1 when it's shuffled")
print("→ Solution: Drop one of the correlated features, or use SHAP")

Task: Apply permutation importance to a regression problem using RMSE as the scoring metric.

Show Solution

from sklearn.ensemble import RandomForestRegressor
from sklearn.inspection import permutation_importance
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

# Load regression data
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train regressor
rf_reg = RandomForestRegressor(n_estimators=100, random_state=42)
rf_reg.fit(X_train, y_train)

# Permutation importance with negative MSE (sklearn convention)
# Higher = more important (less negative)
perm_reg = permutation_importance(
    rf_reg, X_test, y_test,
    n_repeats=10,
    scoring='neg_root_mean_squared_error',
    random_state=42
)

# Display results
result = pd.DataFrame({
    'feature': X.columns,
    'importance': -perm_reg.importances_mean  # Negate for interpretability
}).sort_values('importance', ascending=False)

print("Feature Importance (RMSE increase when shuffled):")
print(result)

SHAP Values

SHAP is the gold standard for model explanations, and here's why: it's based on a Nobel Prize-winning idea from game theory! Imagine a team project where everyone gets an A. How do you fairly assign credit? SHAP does exactly this — it fairly distributes the "credit" for each prediction among all the features. The beautiful part? All the SHAP values for a prediction add up exactly to that prediction (minus the baseline). No other method gives you this guarantee!

Key Concept

Shapley Values

The team analogy: Imagine feature A, B, and C are teammates. To find A's contribution, we try every possible team combination: just A, A+B, A+C, A+B+C, and measure how much A adds each time. Average all those contributions = Shapley value! It's provably the only "fair" way to distribute credit. Key property: baseline + SHAP(feature1) + SHAP(feature2) + ... = final prediction.

# Step 1: Install and import SHAP
# pip install shap
import shap

# Create SHAP explainer for tree model
explainer = shap.TreeExplainer(rf)

# Calculate SHAP values for test set
shap_values = explainer.shap_values(X_test)

# For binary classification, shap_values has shape [2, n_samples, n_features]
# shap_values[0] = contributions toward class 0
# shap_values[1] = contributions toward class 1
print(f"SHAP values shape: {np.array(shap_values).shape}")

Setting up SHAP:

TreeExplainer: A super-fast SHAP calculator optimized for tree models (Random Forest, XGBoost, LightGBM)
shap_values[0]: How features push toward class 0 (malignant)
shap_values[1]: How features push toward class 1 (benign)
One SHAP value per feature per sample: Unlike global importance, SHAP explains each prediction individually!

Note: For other model types, use shap.Explainer() or shap.KernelExplainer() (slower but works with anything).

# Step 2: Summary plot (global importance)
plt.figure(figsize=(12, 8))
shap.summary_plot(shap_values[1], X_test, plot_type="bar", show=False)
plt.title("SHAP Feature Importance (Mean |SHAP value|)")
plt.tight_layout()
plt.show()

Global importance from SHAP: We average the absolute SHAP values across all samples to get overall importance.

Why absolute values? A feature that pushes predictions up for some samples and down for others is still important!
Theoretically grounded: Unlike tree importance, SHAP importance comes from game theory with mathematical guarantees
Consistent: If a model relies on feature A more than B for every prediction, A will have higher SHAP importance

# Step 3: SHAP beeswarm plot (shows value distributions)
plt.figure(figsize=(12, 10))
shap.summary_plot(shap_values[1], X_test, show=False)
plt.title("SHAP Values: Feature Impact on Predictions")
plt.tight_layout()
plt.show()

The beeswarm is magic! Each dot is one sample from your data:

Horizontal position: How much this feature pushed the prediction (left = lower, right = higher)
Color: The actual feature value (red = high value, blue = low value)
Reading patterns: If red dots are all on the right, high values of this feature increase predictions
Spread: Features with dots spread wide have big impacts; tight clusters have small impacts

Example insight: "High values of 'worst radius' (red dots on right) strongly push toward benign diagnosis."

Individual Prediction Explanations

# Step 4: Explain a single prediction
sample_idx = 0
sample = X_test.iloc[[sample_idx]]

# Waterfall plot for one prediction
shap.plots.waterfall(shap.Explanation(
    values=shap_values[1][sample_idx],
    base_values=explainer.expected_value[1],
    data=sample.values[0],
    feature_names=X.columns.tolist()
))

Explaining ONE prediction step by step: The waterfall shows exactly how the model arrived at its decision.

Base value: The average prediction (what you'd guess with no information)
Red bars: These features pushed the prediction UP (toward positive class)
Blue bars: These features pushed the prediction DOWN (toward negative class)
Final value: Base + all contributions = the actual prediction

Perfect for stakeholders: "The model predicted cancer because worst_radius was high (+0.3) and mean_texture was unusual (+0.2)."

# Step 5: Force plot for single prediction
shap.initjs()  # Initialize JavaScript for interactive plots
shap.force_plot(
    explainer.expected_value[1],
    shap_values[1][sample_idx],
    sample,
    feature_names=X.columns.tolist()
)

Same information, different view: Force plots show contributions horizontally like a tug-of-war.

Red arrows: Features pulling the prediction up (toward higher value)
Blue arrows: Features pulling the prediction down (toward lower value)
Arrow width: The magnitude of each feature's contribution
Interactive: In Jupyter notebooks, you can hover to see exact values!

Practice Questions

Task: Create a SHAP dependence plot for the most important feature, colored by its strongest interaction.

Show Solution

import shap

# Find most important feature
mean_abs_shap = np.abs(shap_values[1]).mean(axis=0)
top_feature_idx = np.argmax(mean_abs_shap)
top_feature = X.columns[top_feature_idx]

print(f"Most important feature: {top_feature}")

# Dependence plot with automatic interaction detection
plt.figure(figsize=(10, 6))
shap.dependence_plot(
    top_feature, 
    shap_values[1], 
    X_test,
    interaction_index='auto',  # Automatically find strongest interaction
    show=False
)
plt.title(f"SHAP Dependence: {top_feature}")
plt.tight_layout()
plt.show()

Task: Install SHAP, calculate SHAP values for a Random Forest model, and display the summary plot.

Show Solution

# pip install shap
import shap
from sklearn.ensemble import RandomForestClassifier

# Train model
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Create explainer for tree model
explainer = shap.TreeExplainer(rf)

# Calculate SHAP values
shap_values = explainer.shap_values(X_test)

# Summary plot - shows feature importance + impact direction
shap.summary_plot(shap_values[1], X_test, plot_type="bar")
print("Higher bar = more important feature!")

Task: Use SHAP to explain why a specific sample was classified the way it was using a force plot.

Show Solution

import shap

# Explain first test sample
sample_idx = 0

# Get SHAP values for this sample
sample_shap = shap_values[1][sample_idx]

# Force plot for single prediction
shap.initjs()
shap.force_plot(
    explainer.expected_value[1],  # Base value (average prediction)
    sample_shap,                   # SHAP values for this sample
    X_test.iloc[sample_idx],       # Feature values
    feature_names=X.columns.tolist()
)

# Text explanation
print(f"Base prediction: {explainer.expected_value[1]:.3f}")
print(f"Features pushing UP (toward class 1):")
for feat, val in zip(X.columns, sample_shap):
    if val > 0:
        print(f"  {feat}: +{val:.3f}")

Task: Create a SHAP beeswarm plot and identify which features have a clear positive/negative relationship.

Show Solution

import shap

# Calculate SHAP values
explainer = shap.TreeExplainer(rf)
shap_values = explainer.shap_values(X_test)

# Beeswarm plot - each dot is one prediction
shap.summary_plot(shap_values[1], X_test)

# How to read this plot:
# - Position: left = pushes DOWN, right = pushes UP
# - Color: red = high feature value, blue = low feature value
# - Pattern to look for:
#   → Red on right = high value increases prediction
#   → Blue on right = low value increases prediction

print("""
Interpretation Guide:
- Feature with red dots on RIGHT: Higher values → Higher predictions
- Feature with blue dots on RIGHT: Lower values → Higher predictions  
- Mixed colors on both sides: Complex or non-linear relationship
""")

Task: Apply SHAP to a regression problem (house prices). Show how SHAP values explain price predictions.

Show Solution

import shap
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import fetch_california_housing

# Load regression data
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)[:500]
y = data.target[:500]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train regressor
rf_reg = RandomForestRegressor(n_estimators=100, random_state=42)
rf_reg.fit(X_train, y_train)

# SHAP explainer (for regression, shap_values is NOT a list)
explainer = shap.TreeExplainer(rf_reg)
shap_values = explainer.shap_values(X_test)

# Summary plot
shap.summary_plot(shap_values, X_test)

# Waterfall for one house
shap.waterfall_plot(shap.Explanation(
    values=shap_values[0],
    base_values=explainer.expected_value,
    data=X_test.iloc[0],
    feature_names=X.columns.tolist()
))

Task: Use KernelExplainer (model-agnostic SHAP) to explain an SVM model, which doesn't have tree structure.

Show Solution

import shap
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler

# Scale data for SVM
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train SVM (a model SHAP TreeExplainer can't handle directly)
svm = SVC(probability=True, random_state=42)
svm.fit(X_train_scaled, y_train)

# Use KernelExplainer (works with ANY model!)
# Use a small background sample for speed
background = shap.sample(X_train_scaled, 50)
explainer = shap.KernelExplainer(svm.predict_proba, background)

# Calculate SHAP values (slower than TreeExplainer)
shap_values = explainer.shap_values(X_test_scaled[:10])  # Subset for speed

# Summary plot
shap.summary_plot(shap_values[1], X_test.iloc[:10], feature_names=X.columns.tolist())

print("KernelExplainer is slower but works with ANY model!")

LIME Explanations

What if your model is a complete black box (like a deep neural network) and you can't peek inside? LIME to the rescue! Here's the clever idea: we can't understand the complex model globally, but we can understand it locally around one specific prediction. LIME creates fake data points near your sample, asks the black box to predict them, then fits a simple model (like linear regression) that explains what's happening in that neighborhood. Simple model = easy to understand!

How LIME Works

Generate perturbed samples around the instance to explain
Get predictions for perturbed samples from the black-box model
Fit a simple model (e.g., linear regression) weighted by proximity
Use simple model's coefficients as feature importance

# Step 1: Install and setup LIME
# pip install lime
import lime
import lime.lime_tabular

# Create LIME explainer
lime_explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train.values,
    feature_names=X.columns.tolist(),
    class_names=['Malignant', 'Benign'],
    mode='classification',
    random_state=42
)

Setting up LIME: We tell LIME about our data so it knows how to create realistic fake samples.

training_data: LIME uses this to understand the range and distribution of each feature
feature_names: So explanations use readable names, not "feature 0, feature 1"
class_names: For classification, what do 0 and 1 mean? ("Malignant", "Benign")
mode='classification': Tell LIME we're doing classification, not regression

# Step 2: Explain a single prediction
sample_idx = 0
sample = X_test.iloc[sample_idx].values

# Generate explanation
explanation = lime_explainer.explain_instance(
    sample,
    rf.predict_proba,  # Model's probability predictions
    num_features=10,    # Top features to show
    num_samples=1000    # Number of perturbations
)

# Display explanation
print("LIME Explanation for sample {}:".format(sample_idx))
print(f"Prediction: {rf.predict([sample])[0]}")
print(f"Probability: {rf.predict_proba([sample])[0]}")
print("\nTop contributing features:")
for feature, weight in explanation.as_list():
    print(f"  {feature}: {weight:.4f}")

The LIME magic:

num_samples=1000: Create 1000 fake samples by tweaking our original sample
rf.predict_proba: Ask the black box to predict all 1000 fake samples
Fit simple model: LIME fits a linear model to approximate what the black box is doing locally
num_features=10: Show the top 10 most influential features for this prediction

The output: Rules like "worst_radius > 14.5" with a weight showing how much it pushed the prediction!

# Step 3: Visualize LIME explanation
fig = explanation.as_pyplot_figure()
plt.tight_layout()
plt.show()

Reading the LIME chart:

Green bars: These features support the predicted class ("This is WHY it predicted Benign")
Red bars: These features oppose the predicted class ("Despite these, it still predicted Benign")
Rules format: "worst_radius > 14.5" — tells you the threshold, not just the feature name
Bar length: How strongly this feature influenced the local prediction

# Step 4: Interactive HTML explanation
# explanation.show_in_notebook()  # For Jupyter notebooks

# Or save to HTML file
explanation.save_to_file('lime_explanation.html')
print("Explanation saved to lime_explanation.html")

Sharing explanations: LIME can create beautiful interactive HTML reports!

show_in_notebook(): Interactive explanation right in your Jupyter notebook
save_to_file(): Export to HTML for sharing with non-technical stakeholders
The HTML includes: Prediction probabilities, feature contributions, and "what-if" sliders

Use case: Send the HTML to a doctor and say "Here's why the model thinks this tumor is benign."

LIME Advantages

Model-agnostic (works with any model)
Human-readable rules
Fast for single predictions
Works with text and images too

LIME Limitations

Explanations can be unstable
Local linear model may be poor fit
No theoretical guarantees (unlike SHAP)
Hyperparameter sensitive

Practice Questions

Task: Generate both LIME and SHAP explanations for the same sample and compare the top 5 features.

Show Solution

import shap
import lime.lime_tabular

sample_idx = 0
sample = X_test.iloc[sample_idx]

# SHAP explanation
explainer_shap = shap.TreeExplainer(rf)
shap_vals = explainer_shap.shap_values(sample.values.reshape(1, -1))
shap_importance = pd.DataFrame({
    'feature': X.columns,
    'shap_value': shap_vals[1][0]
}).sort_values('shap_value', key=abs, ascending=False)

# LIME explanation
explainer_lime = lime.lime_tabular.LimeTabularExplainer(
    X_train.values, feature_names=X.columns.tolist(),
    class_names=['0', '1'], mode='classification', random_state=42
)
lime_exp = explainer_lime.explain_instance(sample.values, rf.predict_proba, num_features=10)
lime_features = [f.split(' ')[0] for f, w in lime_exp.as_list()[:5]]

print("SHAP Top 5:", shap_importance['feature'].head(5).tolist())
print("LIME Top 5:", lime_features)

# Calculate agreement
agreement = len(set(shap_importance['feature'].head(5)) & set(lime_features))
print(f"Agreement: {agreement}/5 features overlap")

Task: Create a LIME explainer and generate an explanation for a single prediction. Display as a list.

Show Solution

# pip install lime
import lime
import lime.lime_tabular

# Create LIME explainer
explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train.values,
    feature_names=X.columns.tolist(),
    class_names=['Class 0', 'Class 1'],
    mode='classification',
    random_state=42
)

# Explain first test sample
sample = X_test.iloc[0].values
explanation = explainer.explain_instance(
    sample,
    rf.predict_proba,
    num_features=10
)

# Print as simple list
print("LIME Explanation (feature: weight):")
for feature, weight in explanation.as_list():
    direction = "↑" if weight > 0 else "↓"
    print(f"  {feature}: {weight:.3f} {direction}")

Task: Create a LIME explanation and display it as a visual bar chart showing feature contributions.

Show Solution

import lime
import lime.lime_tabular
import matplotlib.pyplot as plt

# Create explainer
explainer = lime.lime_tabular.LimeTabularExplainer(
    X_train.values,
    feature_names=X.columns.tolist(),
    class_names=['Negative', 'Positive'],
    mode='classification'
)

# Generate explanation
sample_idx = 5
explanation = explainer.explain_instance(
    X_test.iloc[sample_idx].values,
    rf.predict_proba,
    num_features=10
)

# Display as matplotlib figure
fig = explanation.as_pyplot_figure()
plt.title(f"LIME Explanation for Sample {sample_idx}")
plt.tight_layout()
plt.show()

# Green bars push toward this class, red bars push away

Task: LIME uses random sampling. Run LIME multiple times on the same sample and check if the explanations are stable.

Show Solution

import lime.lime_tabular
import numpy as np

sample = X_test.iloc[0].values

# Run LIME 5 times with different random states
all_features = []
for seed in range(5):
    explainer = lime.lime_tabular.LimeTabularExplainer(
        X_train.values,
        feature_names=X.columns.tolist(),
        mode='classification',
        random_state=seed  # Different seed each time
    )
    exp = explainer.explain_instance(sample, rf.predict_proba, num_features=5)
    top_features = [f.split(' ')[0] for f, _ in exp.as_list()]
    all_features.append(set(top_features))
    print(f"Run {seed+1}: {top_features}")

# Check overlap across runs
common = all_features[0].intersection(*all_features[1:])
print(f"\nFeatures consistent across ALL runs: {common}")
print(f"Stability: {len(common)}/5 features are stable")

# Higher stability = more trustworthy explanation

Task: Apply LIME to a regression problem (predicting house prices) instead of classification.

Show Solution

import lime
import lime.lime_tabular
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import fetch_california_housing

# Load regression data
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)[:500]
y = data.target[:500]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train regressor
rf_reg = RandomForestRegressor(n_estimators=100, random_state=42)
rf_reg.fit(X_train, y_train)

# LIME explainer for REGRESSION (mode='regression')
explainer = lime.lime_tabular.LimeTabularExplainer(
    X_train.values,
    feature_names=X.columns.tolist(),
    mode='regression'  # Key difference!
)

# Explain prediction
sample = X_test.iloc[0].values
explanation = explainer.explain_instance(sample, rf_reg.predict, num_features=8)

print(f"Actual price: ${y_test.iloc[0] * 100000:.0f}")
print(f"Predicted: ${rf_reg.predict([sample])[0] * 100000:.0f}")
print("\nFeature contributions:")
for feat, weight in explanation.as_list():
    print(f"  {feat}: {weight:.4f}")

Task: Generate a LIME explanation and save it as a standalone HTML file that can be shared with stakeholders.

Show Solution

import lime.lime_tabular

# Create explainer
explainer = lime.lime_tabular.LimeTabularExplainer(
    X_train.values,
    feature_names=X.columns.tolist(),
    class_names=['Negative', 'Positive'],
    mode='classification'
)

# Generate explanation for specific sample
sample_idx = 10
sample = X_test.iloc[sample_idx].values
explanation = explainer.explain_instance(
    sample,
    rf.predict_proba,
    num_features=10
)

# Save as HTML file
html_path = 'lime_explanation.html'
explanation.save_to_file(html_path)

print(f"Explanation saved to: {html_path}")
print("Open in any web browser to view!")
print("Great for sharing with non-technical stakeholders.")

# The HTML file includes:
# - Prediction probabilities
# - Feature contributions (interactive)
# - Actual feature values

Partial Dependence & ICE Plots

Ever wondered: "How does the prediction change as I increase the house size from 1000 to 3000 sq ft?" That's exactly what PDP shows! Instead of explaining one prediction, PDP shows the average relationship between a feature and the prediction across all your data. ICE plots go further — they show this relationship for each individual sample, revealing if different people/houses/patients respond differently to the same feature.

Partial Dependence Plots

# Step 1: Create PDP for top features
from sklearn.inspection import PartialDependenceDisplay

# Find top 2 features by importance
top_features = importance_df.head(2)['feature'].tolist()
print(f"Plotting PDPs for: {top_features}")

# Single feature PDPs
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
PartialDependenceDisplay.from_estimator(
    rf, X_test, features=top_features,
    ax=axes, grid_resolution=50
)
plt.suptitle('Partial Dependence Plots', fontsize=14)
plt.tight_layout()
plt.show()

What PDP shows: The x-axis is the feature value, the y-axis is the average prediction.

Upward slope: Higher feature values → higher predictions (positive relationship)
Downward slope: Higher feature values → lower predictions (negative relationship)
Flat line: This feature doesn't affect predictions much
Curved line: Non-linear relationship (effect changes depending on the value)

How it works: For each x-value, we set EVERYONE's feature to that value and average the predictions. This "averages out" all other features.

# Step 2: 2D PDP (interaction between two features)
fig, ax = plt.subplots(figsize=(8, 6))
PartialDependenceDisplay.from_estimator(
    rf, X_test, 
    features=[(top_features[0], top_features[1])],  # 2D feature pair
    ax=ax, grid_resolution=30
)
plt.title(f'2D PDP: {top_features[0]} vs {top_features[1]}')
plt.tight_layout()
plt.show()

Spotting interactions! If two features work together (interact), a 2D PDP reveals it.

Parallel contour lines: No interaction — each feature has an independent effect
Non-parallel/curved contours: Interaction! The effect of one feature depends on the other's value
Example: "Bedroom count matters a lot in expensive neighborhoods, but not in cheap areas" = interaction!

Pro tip: Start with SHAP to find potentially interacting features, then use 2D PDP to visualize the interaction.

ICE Plots

ICE plots show individual prediction curves, revealing heterogeneity that PDPs average away:

# Step 3: ICE plot with PDP overlay
fig, ax = plt.subplots(figsize=(10, 6))
PartialDependenceDisplay.from_estimator(
    rf, X_test.head(50),  # Subset for clarity
    features=[top_features[0]],
    kind='both',  # Show both ICE lines and PDP
    ax=ax,
    ice_lines_kw={'color': 'lightblue', 'alpha': 0.3},
    pd_line_kw={'color': 'red', 'linewidth': 2}
)
ax.set_title(f'ICE Plot with PDP: {top_features[0]}')
plt.tight_layout()
plt.show()

ICE reveals what PDP hides! PDP shows the average, but what if different samples behave differently?

Light blue lines: Each line is one sample's prediction as we vary the feature
Red line: The PDP (average of all the blue lines)
Parallel lines: Everyone responds the same way — homogeneous effect
Crossing/diverging lines: Different samples respond differently — heterogeneous effect!

Example insight: "For most patients, age doesn't matter. But for patients with high blood pressure (the diverging lines), age dramatically increases risk!"

# Step 4: Centered ICE (c-ICE) for clearer comparison
from sklearn.inspection import partial_dependence

# Calculate partial dependence
pdp_result = partial_dependence(
    rf, X_test, 
    features=[top_features[0]],
    kind='individual',
    grid_resolution=50
)

# Center by subtracting initial value
individual_preds = pdp_result['individual'][0]
centered = individual_preds - individual_preds[:, [0]]  # Subtract first value

plt.figure(figsize=(10, 6))
for i in range(min(50, len(centered))):
    plt.plot(pdp_result['grid_values'][0], centered[i], alpha=0.3, color='blue')

plt.axhline(0, color='red', linestyle='--')
plt.xlabel(top_features[0])
plt.ylabel('Centered Partial Dependence')
plt.title('Centered ICE Plot (c-ICE)')
plt.show()

Centered ICE (c-ICE): All lines start at zero, making it easier to compare SHAPES.

Why center? Regular ICE lines start at different heights (different baseline predictions), making comparison hard
c-ICE trick: Subtract each line's starting value so they all begin at zero
All lines overlap: Everyone responds the same way (homogeneous effect)
Lines spread apart: Different samples respond differently (heterogeneous effect — investigate!)

This is the best plot for spotting heterogeneity!

When to Use Each: Use PDP for quick global understanding. Use ICE when you suspect different subgroups respond differently. c-ICE is best for spotting heterogeneous effects.

Practice Questions

Task: Create 2D PDPs for multiple feature pairs and identify which pairs show strong interactions (non-additive effects).

Show Solution

from sklearn.inspection import PartialDependenceDisplay
import itertools

# Get top 4 features
top_4 = importance_df.head(4)['feature'].tolist()

# All pairs
pairs = list(itertools.combinations(top_4, 2))

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for i, (f1, f2) in enumerate(pairs):
    PartialDependenceDisplay.from_estimator(
        rf, X_test, 
        features=[(f1, f2)],
        ax=axes[i], grid_resolution=20
    )
    axes[i].set_title(f'{f1} vs {f2}')

plt.suptitle('2D PDPs: Feature Interactions', fontsize=14)
plt.tight_layout()
plt.show()

# Interaction is strong if contours are not parallel
print("Look for non-parallel contour lines indicating interactions")

Task: Create a PDP for the most important feature and interpret what the curve tells you.

Show Solution

from sklearn.inspection import PartialDependenceDisplay
import matplotlib.pyplot as plt

# Find most important feature
sorted_idx = np.argsort(rf.feature_importances_)[::-1]
top_feature = X.columns[sorted_idx[0]]
print(f"Creating PDP for: {top_feature}")

# Create PDP
fig, ax = plt.subplots(figsize=(10, 6))
PartialDependenceDisplay.from_estimator(
    rf, X_test,
    features=[top_feature],
    ax=ax
)
plt.title(f"Partial Dependence: {top_feature}")
plt.tight_layout()
plt.show()

print("""
How to read this plot:
- X-axis: Feature values (e.g., age from 20 to 80)
- Y-axis: Average prediction change
- Upward curve: Higher values → Higher predictions
- Flat line: Feature has little effect in that range
""")

Task: Create an ICE plot and identify if different samples respond differently to the same feature.

Show Solution

from sklearn.inspection import PartialDependenceDisplay

# Create ICE plot (Individual Conditional Expectation)
fig, ax = plt.subplots(figsize=(12, 6))
PartialDependenceDisplay.from_estimator(
    rf, X_test,
    features=[top_feature],
    kind='both',  # Show both ICE lines AND PDP
    subsample=50,  # Show 50 individual lines
    ax=ax
)
plt.title(f"ICE + PDP Plot: {top_feature}")
plt.tight_layout()
plt.show()

print("""
Reading ICE plots:
- Each thin gray line = one sample's response to feature changes
- Thick line = PDP (average of all ICE lines)
- Lines PARALLEL = Everyone responds similarly → PDP is representative
- Lines CROSSING = Different subgroups respond differently → PDP hides variation!
""")

Task: Create PDPs for the top 4 most important features in a 2x2 grid and compare their relationships.

Show Solution

from sklearn.inspection import PartialDependenceDisplay

# Get top 4 features
sorted_idx = np.argsort(rf.feature_importances_)[::-1][:4]
top_4_features = X.columns[sorted_idx].tolist()

print(f"Top 4 features: {top_4_features}")

# Create 2x2 grid of PDPs
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
PartialDependenceDisplay.from_estimator(
    rf, X_test,
    features=top_4_features,
    ax=axes.flatten(),
    grid_resolution=50
)
plt.suptitle('Partial Dependence Plots: Top 4 Features', fontsize=14)
plt.tight_layout()
plt.show()

print("""
Compare the plots:
- Steeper curves = Stronger effect on prediction
- Similar shapes = Features may be correlated
- Different shapes = Features affect prediction differently
""")

Task: Create a centered ICE plot (c-ICE) to better detect heterogeneous effects across samples.

Show Solution

from sklearn.inspection import PartialDependenceDisplay

# Centered ICE: all lines start at 0, easier to compare slopes
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Regular ICE
PartialDependenceDisplay.from_estimator(
    rf, X_test, features=[top_feature],
    kind='individual', subsample=50, ax=axes[0]
)
axes[0].set_title('Regular ICE')

# Centered ICE (lines centered at left edge)
PartialDependenceDisplay.from_estimator(
    rf, X_test, features=[top_feature],
    kind='individual', subsample=50, ax=axes[1],
    centered=True  # Center all lines at the starting point
)
axes[1].set_title('Centered ICE (c-ICE)')

plt.suptitle(f'ICE vs c-ICE for {top_feature}')
plt.tight_layout()
plt.show()

print("""
Why c-ICE is useful:
- All lines start at 0 (differences from starting point)
- Easier to spot if slopes DIFFER between samples
- If lines diverge (spread out) → heterogeneous effects
- If lines stay parallel → homogeneous effects, PDP is reliable
""")

Task: Create PDP for categorical features (e.g., after one-hot encoding) and interpret the bar heights.

Show Solution

from sklearn.inspection import PartialDependenceDisplay, partial_dependence
import matplotlib.pyplot as plt

# Assuming you have one-hot encoded categorical features
# Let's find features that might be categorical (binary values)
categorical_candidates = []
for col in X.columns:
    unique = X[col].nunique()
    if unique <= 5:  # Likely categorical
        categorical_candidates.append(col)

if categorical_candidates:
    cat_feature = categorical_candidates[0]
    print(f"Creating PDP for categorical feature: {cat_feature}")
    
    # For categorical, we'll show as bar chart
    pdp_result = partial_dependence(
        rf, X_test, features=[cat_feature],
        kind='average', grid_resolution=10
    )
    
    values = pdp_result['values'][0]
    average = pdp_result['average'][0]
    
    plt.figure(figsize=(10, 6))
    plt.bar(range(len(values)), average)
    plt.xticks(range(len(values)), [f'{v:.1f}' for v in values])
    plt.xlabel(cat_feature)
    plt.ylabel('Partial Dependence')
    plt.title(f'PDP for {cat_feature} (Categorical)')
    plt.show()
    
    print("Bar heights show average prediction for each category")
else:
    print("No categorical features found. Try with encoded features.")

Key Takeaways

Tree Importance Has Bias

Built-in feature_importances_ can favor high-cardinality features. Always validate with permutation importance on test data.

Permutation is Model-Agnostic

Permutation importance works with any model and measures predictive importance on held-out data, not just training fit.

SHAP Has Theoretical Guarantees

SHAP values are additive (sum to prediction) and consistent. Use for both global importance and individual explanations.

LIME is Local & Flexible

LIME explains individual predictions with interpretable rules. Works with any model but explanations may be unstable.

PDP Shows Average Effects

PDPs reveal marginal feature effects. Use ICE plots to detect heterogeneous effects hidden by averaging.

Choose for Your Audience

SHAP for data scientists, LIME for stakeholders who want rules, PDPs for understanding feature relationships.

What You'll Learn

Contents

Tree-Based Feature Importance

Interpretability vs Explainability

Random Forest Feature Importance

Practice Questions

Permutation Importance

How Permutation Importance Works

Comparing Tree vs Permutation Importance

Practice Questions

SHAP Values

Shapley Values

Individual Prediction Explanations

Practice Questions

LIME Explanations

How LIME Works

LIME Advantages

LIME Limitations

Practice Questions

Partial Dependence & ICE Plots

Partial Dependence Plots

ICE Plots

Practice Questions

Key Takeaways

Tree Importance Has Bias

Permutation is Model-Agnostic

SHAP Has Theoretical Guarantees

LIME is Local & Flexible

PDP Shows Average Effects

Choose for Your Audience

Knowledge Check

Model Interpretability & Explainability

What You'll Learn

Contents

Tree-Based Feature Importance

Interpretability vs Explainability

Random Forest Feature Importance

Practice Questions

Easy Compare Importance Across Models

Easy Normalize Feature Importance

Medium Cumulative Importance Plot

Medium Feature Importance with Confidence Intervals

Hard Detect Data Leakage via Importance

Hard Feature Selection Using Importance

Permutation Importance

How Permutation Importance Works

Comparing Tree vs Permutation Importance

Practice Questions

Medium Permutation Importance for Linear Model

Easy Basic Permutation Importance

Easy Train vs Test Permutation Importance

Medium Custom Scoring for Permutation

Hard Handle Correlated Features in Permutation

Hard Regression Permutation Importance

SHAP Values

Shapley Values

Individual Prediction Explanations

Practice Questions

Hard SHAP Dependence Plot

Easy Calculate Basic SHAP Values

Easy Explain Single Prediction with SHAP

Medium SHAP Beeswarm Plot Analysis

Medium SHAP for Regression Models

Hard KernelSHAP for Any Model

LIME Explanations

How LIME Works

LIME Advantages

LIME Limitations

Practice Questions

Medium Compare LIME vs SHAP for Same Sample

Easy Basic LIME Explanation

Easy LIME Visual Plot

Medium LIME Stability Check

Medium LIME for Regression

Hard Save LIME Explanation as HTML

Partial Dependence & ICE Plots

Partial Dependence Plots

ICE Plots

Practice Questions

Hard Identify Feature Interactions with 2D PDP

Easy Basic Partial Dependence Plot

Easy ICE Plot for Individual Variation

Medium PDP for Multiple Features

Medium Centered ICE (c-ICE) Plot

Hard PDP for Categorical Features

Key Takeaways

Tree Importance Has Bias

Permutation is Model-Agnostic

SHAP Has Theoretical Guarantees

LIME is Local & Flexible

PDP Shows Average Effects

Choose for Your Audience

Knowledge Check