Control Flow Operators in PyTorch
Yidi Wu, Thomas Bohnstingl, et al.
ICML 2025
In many industrial settings, users wish to ask questions in natural language, the answers to which require assembling information from diverse structured data sources. With the advent of Large Language Models (LLMs), applications can now translate natural language questions into a set of API calls or database calls, execute them, and combine the results into an appropriate natural language response. However, these applications remain impractical in realistic industrial settings because they do not cope with the data source heterogeneity that typifies such environments. In this paper, we simulate the heterogeneity of real industry settings by introducing two extensions of the popular Spider benchmark dataset that require a combination of database and API calls. Then, we introduce and evaluate a declarative approach to handling such data heterogeneity. We demonstrate that our declarative approach does a significantly better job of coping with data source heterogeneity than state-of-the-art LLM-based agentic or imperative code generation systems. Our augmented benchmarks will soon be available to the research community.
Yidi Wu, Thomas Bohnstingl, et al.
ICML 2025
Gosia Lazuka, Andreea Simona Anghel, et al.
SC 2024
Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010
Chen-chia Chang, Wan-hsuan Lin, et al.
ICML 2025