argot docs GitHub ↗

Limitations

argot is alpha. Here's what's honest about where it works, where it doesn't, and the v1 roadmap.

argot is alpha software. It ships honest benchmarks and a public research log, but real gaps remain — both in the model and in the surfaces around it. The GitHub issue tracker is the source of truth.

Where it works today

argot’s benchmark harness runs the production scorer against six pinned open-source repos — fastapi, rich, faker (Python) and hono, ink, faker-js (TypeScript) — using a hand-crafted catalog of paradigm-break fixtures scored against hundreds of thousands of real PR hunks as negative controls. Recent results: 108 of 115 fixtures caught, with a false-positive rate ≤ 2.0% on all six corpora, and a reproducible threshold (CV = 0% across seeds).

Modeling caveats

Surface gaps

These are the adoption-blockers we’re building toward v1:

What v1 needs

GoalWhy it blocks v1
Push FP ≤ 1% and close the recall gapTrust at the gate
Validate on application corporaProve it beyond libraries
Suppression mechanismOne stubborn FP shouldn’t be un-silenceable
Repo suitability checkTell users up front if it’ll work
Official CI integrationAction + pre-commit + SARIF
This documentation siteTutorials, how-tos, reference

Already shipped since the early roadmap: per-language calibration for mixed monorepos, and a per-hunk evidence line that names the tokens carrying each score.

Browse everything, including non-v1 work, at the issue tracker.