Conference paper

A Perspective on LLM Data Generation with Few-shot Examples: from Intent to Kubernetes Manifest

Abstract

The advent of Large Language Models (LLMs) has transformed how complex tasks across various domains can be automated, including cloud computing. In this domain, one main area is the service deployments for generating Kubernetes (K8s) manifests, structured files that define the containerized environment. However, applying LLMs effectively to a specific domain often reveals gaps in domain-specific knowledge that impact the generated output. To address this, fine-tuning techniques are adopted to specialize LLMs to the domain of interest by training them on a customized dataset. However, fine-tuning these models for domain-specific applications presents unique challenges. First, a high-quality and diverse dataset that can represent the domain is needed. The scarcity of such datasets, combined with the highly structured form of the service deployment domain, can impact the fine-tuning process. Secondly, ensuring that the fine-tuned model generates outputs that are not only syntactically correct but also valid in terms of YAML structure and K8s-specific requirements. Finally, the computational cost required for fine-tuning large-scale models can be important in both hardware requirements and expenses, highlighting the need for selecting models that balance efficiency and scalability to optimize resource usage.

To address these challenges, in this paper, we propose BRAT, a pipeline for generating K8s manifests directly from user-described intents using LLMs. Our approach leverages an extensive n-shot learning analysis to choose the appropriate number of examples that can better guide the adopted models in generating the manuscripts while also looking at the computational cost. This combination can then be used to populate a dataset for fine-tuning the models. Surprisingly, our results show that while increasing the number of n-shot examples can improve the quality of the generated configurations when adopting more specialized models, such as Mixtral-8x7B (which uses the mixture of experts approach), for other more general-purpose models like Llama3-8B and Llama3-70B, it can lead to less valid K8s manifests. These results highlight that each analyzed LLM performed differently when generating structured Kubernetes manifests, with smaller models sometimes outperforming bigger ones, encouraging an in-depth LLM analysis to determine the most effective setup for each domain-specific task.

Related