airi/apps/server/docs/ai-context/billing-architecture.md
RainbowBird c627bce9c9
refactor(server): split services into domain/adapter layers, drop dead code
Why
- src/services/ was an unordered mix of single-file services and module
  directories with no shared classification axis, plus several long-dead
  admin batch helpers that survived the move to the simpler synchronous
  admin-flux-grants flow.

What
- services/ now has two top-level layers:
    domain/   — DB state + business rules (billing, characters, chats,
                flux, flux-transaction, llm-router, providers, request-log,
                stripe, user-deletion, admin/{flux-grants,router-config})
    adapters/ — thin wrappers over external SDKs / infra (config-kv, email,
                posthog, tts/)
- admin/* moved under domain/admin/ with consistent plural names
  (flux-grants, router-config).
- tts-adapters/ collapsed to adapters/tts/ (no redundant -adapters suffix
  once nested under adapters/).
- 63 src files + scripts/e2e-llm-router.ts + tests/verifications/_harness.ts
  had relative imports rewritten; git mv preserves blame.
- apps/server/CLAUDE.md and docs/ai-context/*.md updated to match new paths.

Dead code removed
- services/admin-flux-grant-batches/ (service + worker + tests, 1090 LOC) —
  superseded by admin-flux-grants and never wired into app.ts.
- routes/admin/flux-grant-batches/ — same.
- utils/redis-compressed.ts + test — zero production call sites.
- llm-router/index.ts re-exports trimmed from 26 to 6; only symbols with
  external consumers are kept.

Intentionally kept
- schemas/flux-grant-batch.ts and its schemas/index.ts export remain so the
  drizzle-kit generate diff stays empty. Removing them is a separate PR
  that owns the drop-table migration for flux_grant_batch /
  flux_grant_batch_recipient.

Verification
- pnpm -F @proj-airi/server typecheck: passes.
- pnpm exec eslint apps/server: 49 errors, identical to main baseline
  (all are pre-existing node/prefer-global/buffer in envelope-crypto and
  scripts/e2e-llm-router; untouched by this change).
- Vitest passes per-file; the 6 mockDB hook timeouts under full-parallel
  run are the known pushSchema-per-worker infra cost, not a regression.
2026-05-18 23:36:45 +08:00

5.9 KiB
Raw Permalink Blame History

Billing Architecture

架构概述

apps/server 的计费链:Postgres 是唯一账本真相源所有余额写操作debit / credit和 ledger 行写入都在同一个 DB 事务里完成。Redis 只承担余额读缓存。不再使用 Redis Stream / 后台 consumer 处理计费副作用。

数据模型

  • user_flux — 用户余额快照(单行/用户)
  • flux_transaction — append-only 账务流水type: credit / debit / initial / promo, amount, balanceBefore, balanceAfter, requestId, metadata
    • partial unique index (userId, requestId) WHERE requestId IS NOT NULLDB 层幂等防重
  • llm_request_log — 每个 LLM/TTS 请求的可观测记录model / status / duration / fluxConsumed / token 用量)

debitFlux 链路

BillingService.consumeFluxForLLM() 调用 debitFlux(),单个事务内:

  1. 若有 requestId,先查 flux_transaction 是否已存在同 (userId, requestId) 行 → 命中则直接返回历史结果,不再扣费、不写新行(幂等回放)
  2. SELECT user_flux FOR UPDATE 锁行
  3. 检查余额(不足返回 402
  4. 更新 user_flux.flux
  5. INSERT INTO flux_transaction (...)把扣费金额、token 用量、source 写进 metadata
  6. 事务提交后 best-effort redis.set 更新 Flux 余额缓存(失败仅 warn 日志)

credit 链路

creditFlux() / creditFluxFromStripeCheckout() / creditFluxFromInvoice() 全部在事务内同步:

  • claim 行Stripe 路径)/ 幂等查 flux_transactionadmin 路径)
  • user_flux 行 → 加额 → 更新
  • flux_transaction
  • 事务提交后 redis.set 更新缓存

Stripe 路径靠 stripe_checkout_session.fluxCredited / stripe_invoice.fluxCredited 标志做对象级幂等admin 路径靠 (userId, requestId) 唯一索引做幂等。

LLM 请求日志

OpenAI route (routes/openai/v1/index.ts) 在 consumeFluxForLLM 完成后调用 requestLogService.logRequest(...) 同步写 llm_request_log。失败被记为 warn 日志,不阻断已经返回给用户的响应(流式响应已发出,错误兜不回来;非流式情况下 debit 已扣request log 丢失也只是观测层面的损失)。

llm_request_log 没有 FK没有二级索引单纯追加写入成本可以忽略。

进程角色

只有 api 一个 rolesrc/bin/run.ts),且没有任何"常驻后台 loop"或"fire-and-forget 异步任务"。所有写路径(包括 admin flux grant都在请求线程内完成多实例安全靠 (userId, requestId) 幂等索引。详见 workers-and-runtime.md

Stripe 定价

Flux 充值定价完全由 Stripe Product/Price 管理,详见 stripe-pricing.md

Sub-Flux 计量服务(债务账本)

TTS 字符、STT 秒等单价 < 1 Flux 的服务通过 FluxMeter 累计零头,跨阈值才下扣,避免短请求被向上取整为 1 Flux。详见 flux-meter.md

关键服务

BillingService (services/domain/billing/billing-service.ts)

所有余额写操作的唯一入口:

  • consumeFluxForLLM() — LLM 请求扣费包装;事务内 lock → check → update → insert ledger,提交后刷 Redis 缓存;带 requestId 时支持幂等回放
  • creditFlux() — 通用充值admin promo / 普通 credit幂等
  • creditFluxFromStripeCheckout() — Stripe 一次性支付充值,按 session 幂等
  • creditFluxFromInvoice() — Stripe 订阅发票充值,按 invoice 幂等

FluxService (services/domain/flux.ts)

只负责读操作:

  • getFlux() — Redis cache-aside 读miss → DB → 填充 Redis新用户自动初始化
  • updateStripeCustomerId()

Redis 职责边界

Redis 不是余额真相源,仅用于:

  • getFlux() 读缓存(丢失无影响)
  • 配置 KV
  • WebSocket 广播

不再使用 Redis Streams 做计费链路。

实现状态

Phase 状态 关键点
1. DB-first 账本 flux_transaction 表,SELECT FOR UPDATE 原子扣减Redis 降为缓存
2. 同步事务 ledger 写入 debit / credit 在单一事务内同时改余额和写 ledger不再有 stream consumer
3. Stripe 幂等 checkout + invoice 事务内幂等检查
4. LLM 计费优化 ⚠️ 已有 requestId 和 DB 事务扣费,待加 tiktoken fallback
5. 单进程部署 只剩 api roleadmin flux grant 在 POST 请求线程内同步执行,没有后台 loop
6. 幂等防重 flux_transaction partial unique index on (userId, requestId) + 事务内回放命中检查

已删除

  • flux-write-back.ts — 定时回写补偿机制
  • FluxService.consumeFlux() / addFlux() — 写操作集中到 BillingService
  • llm_request_log.settled — 无消费者
  • outbox_events 表及 outbox-dispatcher 进程
  • cache-sync-consumer 进程角色
  • Redis Stream billing-events + worker role + billing-consumer-handler — 异步副作用全部回收到事务内同步执行;不再有“事务提交了但 XADD 失败 → ledger 丢行”的窗口
  • 相关 envBILLING_EVENTS_STREAM / BILLING_EVENTS_CONSUMER_NAME / BILLING_EVENTS_BATCH_SIZE / BILLING_EVENTS_BLOCK_MS / BILLING_EVENTS_MIN_IDLE_MS

剩余 TODO

LLM 计费精度

  • tiktoken fallback — gateway 未返回 usage 时用 tiktoken 从 request messages + response body 自算 token 数
  • 消除静默失败 — non-streaming: debit 失败直接抛错阻断响应streaming: 已发送无法撤回,改为 error 级别日志 + 记录 requestId 便于追查

明确不做

  • 不引入 Kafka / RabbitMQ
  • 不拆成多个独立 repo
  • 不做预扣模式(无法准确估算 LLM 响应 token 数)
  • 不再为“异步副作用”单独拉一个 worker 进程;事务内同步搞定就够了。如果以后真有阻塞型耗时副作用,单独评估时再说