Talk

Scaling Up Code Agents: Infrastructure, Quality Assessment and Synthetic Data

Room 2In Russian

This talk will cover the practical challenge of scaling code agents—from individual local runs to large-scale experiments, reproducible quality assessment, and the generation of synthetic data for training.

I will demonstrate how to build an infrastructure for code agents based on DAG pipelines, where each experiment is described as a sequence of isolated steps: running the agent in a repository environment, extracting changes (git diff/patch), running tests, calculating quality metrics, and automatically filtering out poor-quality steps using LLM-as-judge. Particular focus will be given to the generation and cleaning of synthetic data, as well as building a feedback loop for further training of the agents.

The talk will cover the following technologies and approaches: containerisation and environment isolation, orchestration via Argo Workflows, DAG-based experiment modelling, automated testing, LLM-as-judge, and asynchronous training and model update pipelines.

Target audience: ML engineers, backend and infrastructure engineers, researchers and team leads who work with LLMs, agents or ML systems in production and face challenges related to scaling, reproducibility and quality assessment.

Speakers

Invited experts

Schedule