Grace Guo, Lifu Deng, et al.
FAccT 2024
In this paper, we investigate how concept-based models (CMs) respond to out-of-distribution (OOD) inputs. CMs are interpretable neural architectures that first predict a set of high-level \textit{concepts} (e.g., \texttt{stripes}, \texttt{black}) and then predict a task label from those concepts. In particular, we study the impact of \textit{concept interventions} (i.e.,~operations where a human expert corrects a CM’s mispredicted concepts at test time) on CMs' task predictions when inputs are OOD. Our analysis reveals a weakness in current state-of-the-art CMs, which we term \textit{leakage poisoning}, that prevents them from properly improving their accuracy when intervened on for OOD inputs. To address this, we introduce \mbox{MixCEM}, a new CM that learns to dynamically exploit leaked information missing from its concepts only when this information is in-distribution. Our results across tasks with and without complete sets of concept annotations demonstrate that MixCEMs outperform strong baselines by significantly improving their accuracy for both in-distribution and OOD samples in the presence and absence of concept interventions.
Grace Guo, Lifu Deng, et al.
FAccT 2024
Taku Ito, Luca Cocchi, et al.
ICML 2025
Natalia Martinez Gil, Kanthi Sarpatwar, et al.
NeurIPS 2023
Chen-chia Chang, Wan-hsuan Lin, et al.
ICML 2025