mirror of
https://github.com/AgentSeal/codeburn.git
synced 2026-05-17 03:56:45 +00:00
Document the contributor onboarding path: - CONTRIBUTING.md: setup, npm scripts, coding conventions, PR process, the block-claude-coauthor enforcement, and the five providers without test coverage today (claude, gemini, goose, qwen, antigravity). - docs/architecture.md: 12-command CLI surface, parser pipeline, three cache layers, 14 optimize detectors, and the mac / gnome / build layouts with cited line numbers. - docs/providers/: one file per provider (17 providers plus the shared vscode-cline-parser helper). Each covers data path, storage format, caching, dedup key, quirks, and a "when fixing a bug here" checklist. Also fix two pre-existing documentation issues surfaced while writing the new docs: - RELEASING.md claimed GitHub Actions auto-publishes the CLI when a v* tag is pushed. There is no such workflow; CLI publishing is manual via npm publish. Updated the CLI section to reflect reality and kept the menubar (mac-v* tag) automation accurate. - .gitignore had CLAUDE.md unanchored, which on case-insensitive filesystems also matched docs/providers/claude.md. Anchored to /CLAUDE.md so the root-level memory file stays ignored without affecting subdirectory docs. All cited file paths, line numbers, function names, and test counts were verified against current code (41 test files, 558 tests passing).
50 lines
2.9 KiB
Markdown
50 lines
2.9 KiB
Markdown
# Codex
|
|
|
|
OpenAI Codex CLI.
|
|
|
|
- **Source:** `src/providers/codex.ts`
|
|
- **Loading:** eager (`src/providers/index.ts:2`)
|
|
- **Test:** `tests/providers/codex.test.ts` (374 lines)
|
|
|
|
## Where it reads from
|
|
|
|
`$CODEX_HOME` if set, otherwise `~/.codex`. Sessions are nested by date:
|
|
|
|
```
|
|
~/.codex/sessions/<YYYY>/<MM>/<DD>/rollout-*.jsonl
|
|
```
|
|
|
|
The discovery walk uses strict regex (`^\d{4}$`, `^\d{2}$`) on each path component.
|
|
|
|
## Storage format
|
|
|
|
JSONL. The first line must be a `session_meta` entry with `payload.originator` starting with `codex` (case-insensitive). Files that fail this check are silently skipped.
|
|
|
|
The first line read is capped at 1 MB (`FIRST_LINE_READ_CAP`). Codex CLI 0.128+ embeds the full system prompt in `session_meta`, which can run 20-27 KB; the cap leaves headroom while bounding memory if a corrupt file has no newline.
|
|
|
|
## Caching
|
|
|
|
`src/codex-cache.ts` writes `~/.cache/codeburn/codex-results.json` (or `$CODEBURN_CACHE_DIR/codex-results.json`). Each entry is keyed by absolute file path and validated against `mtimeMs + sizeBytes`. Cached entries are returned wholesale.
|
|
|
|
A session that yielded zero parseable lines does **not** write to the cache (`codex.ts:419`); this prevents a transient read failure from pinning an empty result against a fingerprint.
|
|
|
|
## Deduplication
|
|
|
|
`codex:<sessionId>:<timestamp>:<cumulativeTotal>` for accounted events, plus `codex:<sessionId>:<timestamp>:est<n>` for estimated events that fall back to char-counting.
|
|
|
|
## Quirks
|
|
|
|
- Codex CLI emits both `last_token_usage` (per turn) and `total_token_usage` (cumulative). The parser handles three modes:
|
|
1. `last_token_usage` present: use it directly.
|
|
2. Only cumulative: compute deltas against the prior turn.
|
|
3. Neither: estimate from message text length (`CHARS_PER_TOKEN = 4`).
|
|
- `prevCumulativeTotal` is initialized to `null`, not `0`. A session whose first event reports `total = 0` would otherwise be dropped as a "duplicate" of the initial state.
|
|
- `prev*` token counters are advanced on **every** event, including ones that used `last_token_usage`. Earlier code only updated them on the fallback branch, which double-counted any session that mixed modes.
|
|
- OpenAI counts cached tokens **inside** `input_tokens`. The parser subtracts them so the rest of the codebase can assume Anthropic semantics (cached are separate).
|
|
|
|
## When fixing a bug here
|
|
|
|
1. Reproduce against a real `rollout-*.jsonl` if you can. Drop a redacted copy under `tests/fixtures/codex/` and reference it from `tests/providers/codex.test.ts`.
|
|
2. If the bug is "zero tokens reported", first check whether the file is being skipped by `isValidCodexSession`.
|
|
3. If the bug is "tokens counted twice", look at `prevCumulativeTotal` and the prev-counter advancement.
|
|
4. If you change the dedup key shape, run `tests/providers/codex.test.ts` and `tests/parser-filter.test.ts` together; cross-provider dedup happens via the global `seenKeys` Set.
|