mirror of
https://github.com/badlogic/pi-mono.git
synced 2026-05-31 05:14:19 +00:00
Fireworks prompt caching is enabled by default (automatic prefix matching), but on serverless infrastructure, requests hit random replicas. Without session affinity, the per-replica cache misses, negating cache hit rates and the discounted cacheRead pricing. Changes: - Add sendSessionAffinityHeaders and supportsCacheControlOnTools to AnthropicMessagesCompat interface - Send x-session-affinity header for Fireworks (and Cloudflare AI Gateway Anthropic) when sessionId is available and caching is enabled - Omit cache_control on tool definitions for Fireworks (unsupported per https://docs.fireworks.ai/tools-sdks/anthropic-compatibility) - Default supportsEagerToolInputStreaming to false for Fireworks (unsupported field) - Default supportsLongCacheRetention to false for Fireworks (cache_control.ttl not supported) - Add compat settings to Fireworks models in generate-models.ts - Update generated models with Fireworks compat settings - Add integration tests for session affinity and tool compat Refs: https://docs.fireworks.ai/guides/prompt-caching Refs: https://docs.fireworks.ai/tools-sdks/anthropic-compatibility |
||
|---|---|---|
| .. | ||
| generate-image-models.ts | ||
| generate-models.ts | ||
| generate-test-image.ts | ||