diff --git a/docs/developers/channel-plugins.md b/docs/developers/channel-plugins.md index 7837532b6..4dbcdadaf 100644 --- a/docs/developers/channel-plugins.md +++ b/docs/developers/channel-plugins.md @@ -68,23 +68,61 @@ export class MyChannel extends ChannelBase { The normalized message object you build from platform data. The boolean flags drive gate logic, so they must be accurate. -| Field | Type | Required | Notes | -| ---------------- | ------- | -------- | -------------------------------------------------------------------------- | -| `channelName` | string | Yes | Use `this.name` | -| `senderId` | string | Yes | Must be stable across messages (used for session routing + access control) | -| `senderName` | string | Yes | Display name | -| `chatId` | string | Yes | Must distinguish DMs from groups | -| `text` | string | Yes | Strip bot @mentions | -| `threadId` | string | No | For `sessionScope: "thread"` | -| `messageId` | string | No | Platform message ID — useful for response correlation | -| `isGroup` | boolean | Yes | GroupGate relies on this | -| `isMentioned` | boolean | Yes | GroupGate relies on this | -| `isReplyToBot` | boolean | Yes | GroupGate relies on this | -| `referencedText` | string | No | Quoted message — prepended as context | -| `imageBase64` | string | No | Base64-encoded image for multimodal models | -| `imageMimeType` | string | No | e.g., `image/jpeg` | +| Field | Type | Required | Notes | +| ---------------- | ------------ | -------- | -------------------------------------------------------------------------- | +| `channelName` | string | Yes | Use `this.name` | +| `senderId` | string | Yes | Must be stable across messages (used for session routing + access control) | +| `senderName` | string | Yes | Display name | +| `chatId` | string | Yes | Must distinguish DMs from groups | +| `text` | string | Yes | Strip bot @mentions | +| `threadId` | string | No | For `sessionScope: "thread"` | +| `messageId` | string | No | Platform message ID — useful for response correlation | +| `isGroup` | boolean | Yes | GroupGate relies on this | +| `isMentioned` | boolean | Yes | GroupGate relies on this | +| `isReplyToBot` | boolean | Yes | GroupGate relies on this | +| `referencedText` | string | No | Quoted message — prepended as context | +| `imageBase64` | string | No | Base64-encoded image (legacy — prefer `attachments`) | +| `imageMimeType` | string | No | e.g., `image/jpeg` (legacy — prefer `attachments`) | +| `attachments` | Attachment[] | No | Structured media attachments (see below) | -For **files**: download from your platform, save to a temp directory, include the file path in `text`. +### Attachments + +Use the `attachments` array for images, files, audio, and video. `handleInbound()` resolves them automatically: images with base64 `data` are sent to the model as vision input, files with a `filePath` get their path appended to the prompt so the agent can read them. + +```typescript +interface Attachment { + type: 'image' | 'file' | 'audio' | 'video'; + data?: string; // base64-encoded data (images, small files) + filePath?: string; // absolute path to local file (large files saved to disk) + mimeType: string; // e.g. 'application/pdf', 'image/jpeg' + fileName?: string; // original file name from the platform +} +``` + +Example — handling a file upload in your adapter: + +```typescript +import { writeFileSync, mkdirSync, existsSync } from 'node:fs'; +import { join } from 'node:path'; +import { tmpdir } from 'node:os'; + +const buf = await downloadFromPlatform(fileId); +const dir = join(tmpdir(), 'channel-files'); +if (!existsSync(dir)) mkdirSync(dir, { recursive: true }); +const filePath = join(dir, fileName); +writeFileSync(filePath, buf); + +envelope.attachments = [ + { + type: 'file', + filePath, + mimeType: 'application/pdf', + fileName, + }, +]; +``` + +The legacy `imageBase64`/`imageMimeType` fields still work for backwards compatibility but `attachments` is preferred for new code. ## Extension Manifest @@ -126,7 +164,11 @@ override async handleInbound(envelope: Envelope): Promise { **Tool call hooks** — override `onToolCall()` to display agent activity (e.g., "Running shell command..."). -**Media** — download from your platform, set `imageBase64`/`imageMimeType` on the Envelope before calling `handleInbound()`. +**Streaming hooks** — override `onResponseChunk(chatId, chunk, sessionId)` for per-chunk progressive display (e.g., editing a message in-place). Override `onResponseComplete(chatId, fullText, sessionId)` to customize final delivery. + +**Block streaming** — set `blockStreaming: "on"` in the channel config. The base class automatically splits responses into multiple messages at paragraph boundaries. No plugin code needed — it works alongside `onResponseChunk`. + +**Media** — populate `envelope.attachments` with images/files. See [Attachments](#attachments) above. ## Reference Implementations diff --git a/docs/users/features/channels/overview.md b/docs/users/features/channels/overview.md index 80bbe6f6a..471b63f0a 100644 --- a/docs/users/features/channels/overview.md +++ b/docs/users/features/channels/overview.md @@ -47,20 +47,23 @@ Channels are configured under the `channels` key in `settings.json`. Each channe ### Options -| Option | Required | Description | -| -------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------- | -| `type` | Yes | Channel type: `telegram`, `weixin`, `dingtalk`, or a custom type from an extension (see [Plugins](./plugins)) | -| `token` | Telegram | Bot token. Supports `$ENV_VAR` syntax to read from environment variables. Not needed for WeChat or DingTalk | -| `clientId` | DingTalk | DingTalk AppKey. Supports `$ENV_VAR` syntax | -| `clientSecret` | DingTalk | DingTalk AppSecret. Supports `$ENV_VAR` syntax | -| `model` | No | Model to use for this channel (e.g., `qwen3.5-plus`). Overrides the default model. Useful for multimodal models that support image input | -| `senderPolicy` | No | Who can talk to the bot: `allowlist` (default), `open`, or `pairing` | -| `allowedUsers` | No | List of user IDs allowed to use the bot (used by `allowlist` and `pairing` policies) | -| `sessionScope` | No | How sessions are scoped: `user` (default), `thread`, or `single` | -| `cwd` | No | Working directory for the agent. Defaults to the current directory | -| `instructions` | No | Custom instructions prepended to the first message of each session | -| `groupPolicy` | No | Group chat access: `disabled` (default), `allowlist`, or `open`. See [Group Chats](#group-chats) | -| `groups` | No | Per-group settings. Keys are group chat IDs or `"*"` for defaults. See [Group Chats](#group-chats) | +| Option | Required | Description | +| ------------------------ | -------- | ---------------------------------------------------------------------------------------------------------------------------------------- | +| `type` | Yes | Channel type: `telegram`, `weixin`, `dingtalk`, or a custom type from an extension (see [Plugins](./plugins)) | +| `token` | Telegram | Bot token. Supports `$ENV_VAR` syntax to read from environment variables. Not needed for WeChat or DingTalk | +| `clientId` | DingTalk | DingTalk AppKey. Supports `$ENV_VAR` syntax | +| `clientSecret` | DingTalk | DingTalk AppSecret. Supports `$ENV_VAR` syntax | +| `model` | No | Model to use for this channel (e.g., `qwen3.5-plus`). Overrides the default model. Useful for multimodal models that support image input | +| `senderPolicy` | No | Who can talk to the bot: `allowlist` (default), `open`, or `pairing` | +| `allowedUsers` | No | List of user IDs allowed to use the bot (used by `allowlist` and `pairing` policies) | +| `sessionScope` | No | How sessions are scoped: `user` (default), `thread`, or `single` | +| `cwd` | No | Working directory for the agent. Defaults to the current directory | +| `instructions` | No | Custom instructions prepended to the first message of each session | +| `groupPolicy` | No | Group chat access: `disabled` (default), `allowlist`, or `open`. See [Group Chats](#group-chats) | +| `groups` | No | Per-group settings. Keys are group chat IDs or `"*"` for defaults. See [Group Chats](#group-chats) | +| `blockStreaming` | No | Progressive response delivery: `on` or `off` (default). See [Block Streaming](#block-streaming) | +| `blockStreamingChunk` | No | Chunk size bounds: `{ "minChars": 400, "maxChars": 1000 }`. See [Block Streaming](#block-streaming) | +| `blockStreamingCoalesce` | No | Idle flush: `{ "idleMs": 1500 }`. See [Block Streaming](#block-streaming) | ### Sender Policy @@ -219,6 +222,34 @@ Files work with any model — no multimodal support required. | Files | Direct download via Bot API (20MB limit) | CDN download with AES decryption | downloadCode API (two-step) | | Captions | Photo/file captions included as message text | Not applicable | Rich text: mixed text + images in one message | +## Block Streaming + +By default, the agent works for a while and then sends one large response. With block streaming enabled, the response arrives as multiple shorter messages while the agent is still working — similar to how ChatGPT or Claude show progressive output. + +```json +{ + "channels": { + "my-channel": { + "type": "telegram", + "blockStreaming": "on", + "blockStreamingChunk": { "minChars": 400, "maxChars": 1000 }, + "blockStreamingCoalesce": { "idleMs": 1500 }, + ... + } + } +} +``` + +### How it works + +- The agent's response is split into blocks at paragraph boundaries and sent as separate messages +- `minChars` (default 400) — don't send a block until it's at least this long, to avoid spamming tiny messages +- `maxChars` (default 1000) — if a block gets this long without a natural break, send it anyway +- `idleMs` (default 1500) — if the agent pauses (e.g., running a tool), send what's buffered so far +- When the agent finishes, any remaining text is sent immediately + +Only `blockStreaming` is required. The chunk and coalesce settings are optional and have sensible defaults. + ## Slash Commands Channels support slash commands. These are handled locally (no agent round-trip): diff --git a/packages/channels/base/README.md b/packages/channels/base/README.md index 65a46043a..ef89ca1f4 100644 --- a/packages/channels/base/README.md +++ b/packages/channels/base/README.md @@ -59,15 +59,16 @@ For a complete working example, see [`@qwen-code/channel-plugin-example`](../plu ``` Inbound: Platform message - → Envelope + → Envelope (with attachments) → GroupGate (group policy + mention gating) → SenderGate (allowlist / pairing / open) → Slash commands (/clear, /help, /status) → SessionRouter (resolve or create ACP session) + → Resolve attachments (images → bridge, files → prompt text) → AcpBridge.prompt() → agent Outbound: Agent response - → ChannelBase + → BlockStreamer (if enabled: split into blocks at paragraph boundaries) → sendMessage() → platform ``` @@ -81,6 +82,7 @@ Everything between `handleInbound()` and `sendMessage()` is handled by the base | --------------- | ---------------------------------------------------------------- | | `ChannelBase` | Abstract base class — extend this to build a channel adapter | | `AcpBridge` | Spawns and communicates with the `qwen-code --acp` agent process | +| `BlockStreamer` | Progressive multi-message delivery for block streaming | | `SessionRouter` | Maps senders to ACP sessions with configurable scoping | | `SenderGate` | DM access control (allowlist / pairing / open) | | `GroupGate` | Group chat policy and @mention gating | @@ -90,6 +92,7 @@ Everything between `handleInbound()` and `sendMessage()` is handled by the base | Type | Description | | --------------- | ---------------------------------------------- | +| `Attachment` | Structured file/image/audio/video attachment | | `ChannelConfig` | Channel configuration from `settings.json` | | `ChannelPlugin` | Plugin factory interface (what you export) | | `Envelope` | Normalized inbound message format | @@ -117,12 +120,16 @@ constructor(name: string, config: ChannelConfig, bridge: AcpBridge, options?: Ch **Provided methods:** -| Method | Description | -| -------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | -| `handleInbound(envelope)` | Route an inbound message through the full pipeline (gate checks, commands, session, prompt). Call this from your message handler. | -| `setBridge(bridge)` | Replace the ACP bridge after crash recovery | -| `registerCommand(name, handler)` | Register a custom slash command (e.g. `/mycommand`) | -| `onToolCall(chatId, event)` | Hook called on agent tool invocations — override to show indicators | +| Method | Description | +| ------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | +| `handleInbound(envelope)` | Route an inbound message through the full pipeline (gate checks, commands, session, prompt). Call this from your message handler. | +| `setBridge(bridge)` | Replace the ACP bridge after crash recovery | +| `registerCommand(name, handler)` | Register a custom slash command (e.g. `/mycommand`) | +| `onToolCall(chatId, event)` | Hook called on agent tool invocations — override to show indicators | +| `onResponseChunk(chatId, chunk, sessionId)` | Hook called per streaming text chunk — override for progressive display (default: no-op) | +| `onResponseComplete(chatId, fullText, sessionId)` | Hook called when full response is ready — override to customize delivery (default: `sendMessage()`) | + +**Block streaming:** When `blockStreaming: "on"` is set in the channel config, the base class automatically splits the agent's streaming response into multiple messages at paragraph boundaries. See [Block Streaming](#block-streaming) below. **Built-in slash commands:** `/clear` (`/reset`, `/new`), `/help`, `/status` @@ -244,11 +251,44 @@ interface Envelope { isMentioned: boolean; // true if bot was @mentioned isReplyToBot: boolean; // true if replying to bot's message referencedText?: string; // quoted message text - imageBase64?: string; // base64-encoded image - imageMimeType?: string; // e.g. 'image/jpeg' + imageBase64?: string; // base64-encoded image (legacy — prefer attachments) + imageMimeType?: string; // e.g. 'image/jpeg' (legacy — prefer attachments) + attachments?: Attachment[]; // structured file/image/audio/video attachments +} + +interface Attachment { + type: 'image' | 'file' | 'audio' | 'video'; + data?: string; // base64-encoded data (images, small files) + filePath?: string; // absolute path to local file (large files) + mimeType: string; // e.g. 'application/pdf', 'image/jpeg' + fileName?: string; // original file name from the platform } ``` +`handleInbound()` automatically resolves attachments: images with `data` are sent to the model as vision input, files with `filePath` get their path appended to the prompt text so the agent can read them with its tools. + +## Block Streaming + +When `blockStreaming: "on"` is set in a channel's config, the agent's response is delivered as multiple separate messages instead of one large wall of text. The `BlockStreamer` accumulates streaming chunks and emits completed blocks based on paragraph boundaries and size heuristics. + +**Config fields** (on `ChannelConfig`): + +| Field | Type | Default | Description | +| ------------------------ | ------------------------ | --------------- | --------------------------------------------------------------------------- | +| `blockStreaming` | `'on' \| 'off'` | `'off'` | Enable/disable block streaming | +| `blockStreamingChunk` | `{ minChars, maxChars }` | `{ 400, 1000 }` | `minChars`: don't emit until this size. `maxChars`: force-emit at this size | +| `blockStreamingCoalesce` | `{ idleMs }` | `{ 1500 }` | Emit buffered text after this many ms of silence from the agent | + +**How it works:** + +1. Text accumulates as the agent streams its response +2. When the buffer reaches `minChars` and hits a paragraph break (`\n\n`), that block is sent as a separate message +3. If the buffer reaches `maxChars` without a paragraph break, it force-splits at the best break point (newline > space) +4. If the agent goes quiet for `idleMs`, the buffer is flushed (as long as it's past `minChars`) +5. When the agent finishes, any remaining text is sent immediately regardless of `minChars` + +Block streaming and `onResponseChunk` work independently — plugins can override `onResponseChunk` for their own purposes while block streaming handles delivery. + ## Further reading - [Channel Plugin Developer Guide](../../docs/developers/channel-plugins.md) diff --git a/packages/channels/plugin-example/README.md b/packages/channels/plugin-example/README.md index a814fdd54..a17461080 100644 --- a/packages/channels/plugin-example/README.md +++ b/packages/channels/plugin-example/README.md @@ -94,4 +94,11 @@ See `src/MockPluginChannel.ts` for a working example. The key points: 3. Export a `plugin` object conforming to `ChannelPlugin` 4. Add a `qwen-extension.json` manifest +### Features you get for free + +- **Block streaming** — enable `blockStreaming: "on"` in config and the agent's response is automatically split into multiple messages at paragraph boundaries +- **Attachments** — populate `envelope.attachments` with images/files and `handleInbound()` routes them to the agent (images as vision input, files as paths in the prompt) +- **Streaming hooks** — override `onResponseChunk()` for progressive display (e.g., editing a message in-place) +- Access control (allowlist, pairing, open), session routing, slash commands, crash recovery + Full guide: [Channel Plugin Developer Guide](../../docs/developers/channel-plugins.md)