Lessons Learned: A Multi-Agent Framework for Code LLMs to Learn and ImproveYuanzhe LiuRyan Denget al.2025NeurIPS 2025
MermaidSeqBench: An Evaluation Benchmark for LLM-to-Mermaid Sequence Diagram GenerationBasel ShbitaFarhan Ahmedet al.2025NeurIPS 2025
When Agents go Astray: Course-Correcting SWE Agents with PRMsShubham GandhiJason Tsayet al.2025NeurIPS 2025
Multiple Schema-Conformant Declarative Code GenerationMehant KammakomatiSrikanth Tamilselvam2025ASE 2025
Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code UnderstandingZiv NevoOrna Razet al.2025ASE 2025
Vintage Code, Modern Judges: Meta-Validation in Low Data RegimesGal AmramOra Nova Fandinaet al.2025ASE 2025
Towards Enforcing Company Policy Adherence in Agentic WorkflowsNaama ZwerdlingDavid Boazet al.2025EMNLP 2025