Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement shap-feature-importance #16385

Open
wendycwong opened this issue Sep 11, 2024 · 0 comments
Open

implement shap-feature-importance #16385

wendycwong opened this issue Sep 11, 2024 · 0 comments
Labels

Comments

@wendycwong
Copy link
Contributor

wendycwong commented Sep 11, 2024

We have shap-summary-plots but user wants to see the actual values. Here is the answer according to @tomasfryda

tomf
Yesterday at 11:59 PM
AFAIK we don’t have a method/function to do that. Usually mean absolute contribution is used for variable importance (https://christophm.github.io/interpretable-ml-book/shap.html#shap-feature-importance) but I don’t think there is just one correct way to do it.
Also I would probably recommend shap summary plot instead as it shows more information without additional computation.
The calculation itself is quite trivial:
contr = model.predict_contributions(test)#, background_frame=train)
feature_importances = dict(zip(contr.names, contr.abs().mean()))

import matplotlib.pyplot as plt
fi = sorted(feature_importances.items(), key=lambda x: x[1])
plt.barh([x[0] for x in fi], [x[1] for x in fi])
plt.title("Feature Importances")
plt.show()
For tree models you don’t have to specify the background frame. Calculation with background frame is usually much slower (IIRC the number of operations is number of rows in background frame * number of operations without background frame).
Generally, it’s recommended to use background_frame as the choice of background frame influences the results. The problem with not using background frame is that you don’t know how important individual splits in the trees are (e.g., if the model denies mortgage for people taller than 3m (~10 ft) the contributions calculated without background frame would consider this split as important as other splits but with background frame we would find out that there are no people that tall (or at least there is not many people like that) so the contribution would end up lower.
Screenshot 2024-09-11 at 8.51.44.png

Screenshot 2024-09-11 at 8.51.44.png

christophm.github.io
9.6 SHAP (SHapley Additive exPlanations) | Interpretable Machine Learning
Machine learning algorithms usually operate as black boxes and it is unclear how they derived a certain decision. This book is a guide for practitioners to make machine learning decisions interpretable.

Implement this for R and Python clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant