Small Models.
Big Queries.

Submit your SLM to the arena. We spawn a GPU container, stream 15 Polars questions through it, execute the generated code against gold outputs, and rank your team in real time.

Submit a repo View leaderboard

$ polars-bench run --repo=https://github.com/team/slm

[01/15] question_startedCount premium customers by country...

[01/15] question_result✓ exact_matchgen=2.1s · peak_ram=1.2GB

[02/15] question_startedCompute total revenue...

[02/15] question_result✗ mismatch

[03/15] question_started _

Repo-based

Submit a GitHub repo. We clone, install, and spin up your inference server on GPU.

Two benchmarks

Test with full visibility. Global with score-only for fair ranking.

Live stream

15 questions, each streamed via SSE. Watch your model think in real time.

Small Models.Big Queries.

Repo-based

Two benchmarks

Live stream

Small Models.
Big Queries.