LLM paste-through
A block whose style diverges sharply from the surrounding file — fluent in the average voice of every public repo, not yours.
argot learns your repo’s voice from its own git history, then flags the hunks that don’t sound like anyone on your team wrote them. No model. No cloud. No GPU.
MIT · macOS & Linux · fits in seconds, scores in milliseconds
The second question
Linters and type checkers answer “is this valid?” They can’t answer “is this how this team writes things?” That used to live in code review — until an LLM could bury it under a hundred clean, type-correct PRs in an afternoon. argot is the layer that asks it back.
@router.get("/{user_id}", response_model=UserResponse)
async def get_user(user_id: int, db=Depends(get_db)) -> UserResponse:
user = db.get(user_id)
if user is None:
raise ValueError(f"User {user_id} not found")
return user argot check · 1 hunk above threshold (1 foreign) note: argot is a probabilistic style linter — verify before action. routers/users.py ● L11-L14 8.21 foreign · workdir · wrong exception type (bpe) ↳ ValueError (0×) — repo raises HTTPException (214×) 11 │ user = db.get(user_id) 12 │ if user is None: 13 │ raise ValueError(f"User {user_id} not found") 14 │ return user
Decorators, Depends, the typed return — all idiomatic FastAPI. The one break is a bare ValueError where this repo always raises HTTPException. mypy is happy. The linter has nothing. argot flags the line.
What it catches
argot does not replace ESLint, ruff, or your type checker. It catches the things they can’t articulate — the patterns your team agreed on by repetition, never by writing them down.
A block whose style diverges sharply from the surrounding file — fluent in the average voice of every public repo, not yours.
Error handling, logging, or control-flow shapes that don’t match how the rest of the codebase does it.
Class-based OOP dropped into a functional codebase. A sync def on a hot async path. The wrong import for the job.
Code that’s correct, typed, and lint-clean — but doesn’t sound like anyone on this team wrote it.
How it stays honest
argot builds two token distributions — one from your repo, one from a generic open-source baseline — and flags hunks far more likely under the generic one. That is the whole model. It fits on CPU in seconds and ships its threshold per repo.
No GPU, no cloud, no telemetry. The model is two frequency tables and a max log-ratio — it fits in seconds and scores in milliseconds.
The threshold is set from your own code, so “normal” means normal here — not the average of every public repo a model trained on.
A tree-sitter tokenizer parses partial, invalid hunks. Python and TypeScript out of the box; mixed monorepos get one threshold per language.
Every flag names the tokens that carried the score, how often they appear in your repo, and what’s common here instead.
Why argot
argot check runs on every commit, groups hits by file, and exits non-zero when something diverges. Wire it in like ESLint.
Point it at a repo, run extract → fit once, then check forever. No annotations, no config to get started.
Each hit shows the offending tokens with their repo attestation — startedAt (0×) vs use (88×) — and the repo’s typical vocabulary instead.
unusual · suspicious · foreign, relative to the calibrated threshold. Filter the noise with --min-severity.
A Python + TypeScript monorepo gets one threshold per language, dispatched by file extension. No single distribution to dominate the other.
Public benchmarks, a 35-doc research log, and a probabilistic-linter disclaimer printed in every run. Verify before you act.
argot is MIT and alpha. Calibrate it on your repo in a couple of minutes, then see what it flags.