Recurrent Transformers Trade-off Parallelism for Length Generalization on Regular LanguagesPaul SoulosAleksandar Terzicet al.2024NeurIPS 2024
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence ProcessingAleksandar TerzicMichael Herscheet al.2023NeurIPS 2023