View all topics

Explainable AI

To trust AI systems, explanations can go a long way. We’re creating tools to help debug AI, where systems can explain what they’re doing. This includes training highly optimized, directly interpretable models, as well as explanations of black-box models and visualizations of neural network information flows.

Our work

Debugging LLMs to improve their credibility
Research
Kim Martineau
30 Jul 2025
Teaching AI models to improve themselves
Research
Peter Hess
14 Aug 2024
IBM and RPI researchers demystify in-context learning in large language models
News
Peter Hess
25 Jul 2024
The latest AI safety method is a throwback to our maritime past
Research
Kim Martineau
16 Nov 2023
Find and fix IT glitches before they crash the system
News
Kim Martineau
28 Sep 2023
What is retrieval-augmented generation?
Explainer
Kim Martineau
22 Aug 2023
See more of our work on Explainable AI

Publications

CLEAR: Error Analysis via LLM-as-a-Judge Made Easy
- - Asaf Yehudai
  - Lilach Edelstein
  - et al.
- 2026
- AAAI 2026
Advances in Emulating Earth System Models
- - Björn Lütjens
  - Kalyn Dorheim
  - et al.
- 2025
- AGU 2025
Secure and Safe AI Agents for Big Data Infrastructures
- - Bhavya Bhavya
  - Sai Sree Laya Chukkapalli
- 2025
- Big Data 2025
Toward a Coherent Virtual Cell Model: Probing Biological World-Model Coherence in Transcriptomic Foundation Models
- - Noa Moriel
  - Yishai Shimoni
  - et al.
- 2025
- NeurIPS 2025
Deferring Concept Bottleneck Models: Learning to Defer Interventions to Inaccurate Experts
- - Andrea Pugnana
  - Riccardo Massidda
  - et al.
- 2025
- NeurIPS 2025
Specifying exact circuit algorithms in universal transformers
- - Taku Ito
  - Ruchir Puri
  - et al.
- 2025
- NeurIPS 2025

View all publications