Submit your SLM to the arena. We spawn a GPU container, stream 15 Polars questions through it, execute the generated code against gold outputs, and rank your team in real time.
Submit a GitHub repo. We clone, install, and spin up your inference server on GPU.
Test with full visibility. Global with score-only for fair ranking.
15 questions, each streamed via SSE. Watch your model think in real time.