NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API CallsKinjal BasuIbrahim Abdelazizet al.2025EMNLP 2025
Group, Embed and Reason: A Hybrid LLM and Embedding Framework for Semantic Attribute AlignmenShramona ChakrabortyShashank Mujumdaret al.2025EMNLP 2025
Mind the Query: A Benchmark Dataset towards Text2Cypher TaskVashu ChauhanShobhit Rajet al.2025EMNLP 2025
Classifier-Augmented Generation for Structured Workflow PredictionThomas GschwindShramona Chakrabortyet al.2025EMNLP 2025
Towards Enforcing Company Policy Adherence in Agentic WorkflowsNaama ZwerdlingDavid Boazet al.2025EMNLP 2025
Declarative Techniques for NL Queries over Heterogeneous DataElham KhabiriJeff Kephartet al.2025EMNLP 2025
Enhancing Study-Level Inference from Clinical Trial Papers via Reinforcement Learning-Based Numeric ReasoningMassimiliano PronestiMichela Lorandiet al.2025EMNLP 2025
Divide, Link, and Conquer: Recall-oriented Schema Linking for NL-to-SQL via Question DecompositionKiran PradeepKirushikesh D Bet al.2025EMNLP 2025
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech EvaluationNoy SternlichtAriel Geraet al.2025EMNLP 2025