Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010
Recent advances in Large Language Models (LLMs) have yielded impressive successes on many language tasks. However, efficient processing of long contexts using LLMs remains a significant challenge. We introduce EpMAN - a method for processing long contexts in an episodic memory module while holistically attending to semantically relevant context chunks. The output of episodic attention is then used to reweigh the decoder's self-attention to the stored KV cache of the context during training and generation. When an LLM decoder is trained using EpMAN, its performance on multiple challenging single-hop long-context recall and question-answering benchmarks is found to be stronger and more robust across the range from 16k to 256k tokens than baseline decoders trained with self-attention, and popular retrieval-augmented generation frameworks. Our source code will be made available at https://github.com/IBM/epman.
Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010
Lina Berrayana, Sean Rooney, et al.
ACL 2025
Navve Wasserman, Roi Pony, et al.
ACL 2025
Chen-chia Chang, Wan-hsuan Lin, et al.
ICML 2025