Talk

Scaling Up Code Agents: Infrastructure, Quality Assessment and Synthetic Data

Room 2

This talk will cover the practical challenge of scaling code agents—from individual local runs to large-scale experiments, reproducible quality assessment, and the generation of synthetic data for training.

I will demonstrate how to build an infrastructure for code agents based on DAG pipelines, where each experiment is described as a sequence of isolated steps: running the agent in a repository environment, extracting changes (git diff/patch), running tests, calculating quality metrics, and automatically filtering out poor-quality steps using LLM-as-judge. Particular focus will be given to the generation and cleaning of synthetic data, as well as building a feedback loop for further training of the agents.

The talk will cover the following technologies and approaches: containerisation and environment isolation, orchestration via Argo Workflows, DAG-based experiment modelling, automated testing, LLM-as-judge, and asynchronous training and model update pipelines.

Target audience: ML engineers, backend and infrastructure engineers, researchers and team leads who work with LLMs, agents or ML systems in production and face challenges related to scaling, reproducibility and quality assessment.

Speakers

Schedule