Document the contributor onboarding path: - CONTRIBUTING.md: setup, npm scripts, coding conventions, PR process, the block-claude-coauthor enforcement, and the five providers without test coverage today (claude, gemini, goose, qwen, antigravity). - docs/architecture.md: 12-command CLI surface, parser pipeline, three cache layers, 14 optimize detectors, and the mac / gnome / build layouts with cited line numbers. - docs/providers/: one file per provider (17 providers plus the shared vscode-cline-parser helper). Each covers data path, storage format, caching, dedup key, quirks, and a "when fixing a bug here" checklist. Also fix two pre-existing documentation issues surfaced while writing the new docs: - RELEASING.md claimed GitHub Actions auto-publishes the CLI when a v* tag is pushed. There is no such workflow; CLI publishing is manual via npm publish. Updated the CLI section to reflect reality and kept the menubar (mac-v* tag) automation accurate. - .gitignore had CLAUDE.md unanchored, which on case-insensitive filesystems also matched docs/providers/claude.md. Anchored to /CLAUDE.md so the root-level memory file stays ignored without affecting subdirectory docs. All cited file paths, line numbers, function names, and test counts were verified against current code (41 test files, 558 tests passing).
2.9 KiB
Codex
OpenAI Codex CLI.
- Source:
src/providers/codex.ts - Loading: eager (
src/providers/index.ts:2) - Test:
tests/providers/codex.test.ts(374 lines)
Where it reads from
$CODEX_HOME if set, otherwise ~/.codex. Sessions are nested by date:
~/.codex/sessions/<YYYY>/<MM>/<DD>/rollout-*.jsonl
The discovery walk uses strict regex (^\d{4}$, ^\d{2}$) on each path component.
Storage format
JSONL. The first line must be a session_meta entry with payload.originator starting with codex (case-insensitive). Files that fail this check are silently skipped.
The first line read is capped at 1 MB (FIRST_LINE_READ_CAP). Codex CLI 0.128+ embeds the full system prompt in session_meta, which can run 20-27 KB; the cap leaves headroom while bounding memory if a corrupt file has no newline.
Caching
src/codex-cache.ts writes ~/.cache/codeburn/codex-results.json (or $CODEBURN_CACHE_DIR/codex-results.json). Each entry is keyed by absolute file path and validated against mtimeMs + sizeBytes. Cached entries are returned wholesale.
A session that yielded zero parseable lines does not write to the cache (codex.ts:419); this prevents a transient read failure from pinning an empty result against a fingerprint.
Deduplication
codex:<sessionId>:<timestamp>:<cumulativeTotal> for accounted events, plus codex:<sessionId>:<timestamp>:est<n> for estimated events that fall back to char-counting.
Quirks
- Codex CLI emits both
last_token_usage(per turn) andtotal_token_usage(cumulative). The parser handles three modes:last_token_usagepresent: use it directly.- Only cumulative: compute deltas against the prior turn.
- Neither: estimate from message text length (
CHARS_PER_TOKEN = 4).
prevCumulativeTotalis initialized tonull, not0. A session whose first event reportstotal = 0would otherwise be dropped as a "duplicate" of the initial state.prev*token counters are advanced on every event, including ones that usedlast_token_usage. Earlier code only updated them on the fallback branch, which double-counted any session that mixed modes.- OpenAI counts cached tokens inside
input_tokens. The parser subtracts them so the rest of the codebase can assume Anthropic semantics (cached are separate).
When fixing a bug here
- Reproduce against a real
rollout-*.jsonlif you can. Drop a redacted copy undertests/fixtures/codex/and reference it fromtests/providers/codex.test.ts. - If the bug is "zero tokens reported", first check whether the file is being skipped by
isValidCodexSession. - If the bug is "tokens counted twice", look at
prevCumulativeTotaland the prev-counter advancement. - If you change the dedup key shape, run
tests/providers/codex.test.tsandtests/parser-filter.test.tstogether; cross-provider dedup happens via the globalseenKeysSet.