Learning on Transformers is Provable Low-Rank and Sparse: A One-layer AnalysisHongkang LiMeng Wanget al.2024SAM 2024