Overview: Interpretability tools make machine learning models more transparent by displaying how each feature influences ...
Pathologies of Neural Models Make Interpretations Difficult Model pathologies are due to model over-confidence or second order sensitivity. The first issue is addressed by training a classifier to be ...
Background: There are clinical trials using composite measures, indices, or scales as proxy for independent variables or outcomes. Interpretability of derived measures may not be satisfying. Adopting ...
Abstract: Interpretability for machine learning models in medical imaging (MLMI) is an important direction of research. However, there is a general sense of murkiness in what interpretability means.
Introduction: Local interpretability methods such as LIME and SHAP are widely used to explain model decisions. However, they rely on assumptions of local continuity that often fail in recursive, ...
This repository contains two projects aimed at enhancing the mechanistic interpretability of transformer-based models, specifically focusing on GPT-2. The projects provide insights into two critical ...