chat-relay/docs/server-architecture.md
BinaryBeastMaster 25abcd2974 Initial commit
2025-05-10 15:09:23 -07:00

9.5 KiB

Server Architecture Overview

This document outlines the architecture and operation of the WebSocket relay server found in api-relay-server/src/server.ts. The system acts as an intermediary between browser extensions and an HTTP-based API, routing JSON messages bi-directionally, and managing request flow to prevent overloading the browser extension.


🌐 Server Layers

graph TD
    subgraph HTTP Layer
        A[Express App] --> B{/v1/chat/completions}
    end

    subgraph Request Handling Logic
        B -- Request --> C{Extension Busy?}
        C -- No --> D[processRequest()]
        C -- Yes --> E{Behavior: Drop or Queue?}
        E -- Drop --> F[Respond 429]
        E -- Queue --> G[requestQueue]
        G -- Dequeue --> D
    end

    subgraph WebSocket Communication
        D -- SEND_CHAT_MESSAGE --> H[activeConnections[0]]
        H -- CHAT_RESPONSE_* --> I[WebSocket Server]
        I -- Resolve/Reject --> D
    end
    
    D --> J[pendingRequests Map]
    D --> K[finishProcessingRequest()]

    subgraph Admin UI & Config
        L[Express App] --> M[/admin & /v1/admin/*]
        M <--> N[server-config.json]
    end

📁 Core File

  • server.ts: Main file where the entire Express server, WebSocket infrastructure, request queuing, and admin interface logic is defined.

🧩 Components

1. Express HTTP API

  • /v1/chat/completions: Accepts OpenAI-compatible requests.
    • Implements logic to check if a browser extension is busy (via activeExtensionProcessingId).
    • Based on newRequestBehavior setting ('queue' or 'drop'):
      • Queue: Adds incoming request to requestQueue if extension is busy. The HTTP response is deferred.
      • Drop: Responds with 429 Too Many Requests if extension is busy.
    • If extension is free, directly calls processRequest().
  • /v1/admin/server-info: Provides current server status and configuration, including port, requestTimeoutMs, and newRequestBehavior.
  • /v1/admin/update-settings: Allows updating port, requestTimeoutMs, and newRequestBehavior. Changes are saved to server-config.json.
  • /v1/admin/message-history: Retrieves recent message logs for the admin UI.
  • /v1/admin/restart-server: Triggers a server restart.
  • /admin: Serves the admin HTML interface.
  • /health: Basic health check.

2. WebSocket Server

  • WebSocketServer: Accepts WebSocket connections from browser extensions.
  • activeConnections: Array storing active WebSocket client connections. Currently, only the first connection (activeConnections[0]) is used for sending messages.
  • Message Handling: Receives messages from the extension (e.g., CHAT_RESPONSE, CHAT_RESPONSE_CHUNK, CHAT_RESPONSE_ERROR) and resolves or rejects promises in the pendingRequests map.

3. Queuing & Processing System

  • activeExtensionProcessingId: number | null: Tracks the requestId of the message currently being processed by the extension. If null, the extension is considered free.
  • newRequestBehavior: 'queue' | 'drop': Global variable determining how to handle new requests when the extension is busy. Loaded from server-config.json (defaults to 'queue').
  • requestQueue: QueuedRequest[]: An in-memory array holding QueuedRequest objects when newRequestBehavior is 'queue' and the extension is busy.
  • QueuedRequest Interface: Defines the structure for storing an original HTTP request (req, res) and its parameters, to be processed later.
  • async function processRequest(queuedItem: QueuedRequest):
    • Sets activeExtensionProcessingId to the current queuedItem.requestId.
    • Logs CHAT_REQUEST_PROCESSING.
    • Sends the SEND_CHAT_MESSAGE to the extension via WebSocket.
    • Manages a Promise in pendingRequests for the response, including a timeout (currentRequestTimeoutMs).
    • On response/error/timeout, formats and sends the HTTP response using the stored queuedItem.res.
    • Calls finishProcessingRequest() in a finally block.
  • function finishProcessingRequest(completedRequestId: number):
    • Clears activeExtensionProcessingId.
    • Removes the request from pendingRequests.
    • If newRequestBehavior is 'queue' and requestQueue is not empty, dequeues the next request and calls processRequest() for it.

4. State Management

  • pendingRequests: A Map that stores Promise resolve/reject handlers, keyed by requestId. Used by processRequest to await responses from the WebSocket.
  • requestCounter: Generates unique requestIds.
  • adminMessageHistory: In-memory store for admin log entries.

🔄 Lifecycle Flow (with Queuing)

sequenceDiagram
    participant User
    participant ServerAPI
    participant RequestLogic
    participant ProcessRequestFunc
    participant Extension
    participant RequestQueue

    User->>ServerAPI: POST /v1/chat/completions (req1)
    ServerAPI->>RequestLogic: Handle req1
    alt Extension is Free
        RequestLogic->>ProcessRequestFunc: processRequest(req1)
        ProcessRequestFunc->>Extension: SEND_CHAT_MESSAGE (req1)
        Note over ProcessRequestFunc,Extension: activeExtensionProcessingId = req1.id
        User->>ServerAPI: POST /v1/chat/completions (req2)
        ServerAPI->>RequestLogic: Handle req2
        RequestLogic->>RequestLogic: Extension Busy (req1.id)
        alt newRequestBehavior == 'queue'
            RequestLogic->>RequestQueue: Enqueue req2
            Note over RequestLogic: HTTP Response for req2 deferred
        else newRequestBehavior == 'drop'
            RequestLogic-->>ServerAPI: Respond 429 for req2
            ServerAPI-->>User: HTTP 429 (req2 dropped)
        end
        Extension-->>ProcessRequestFunc: CHAT_RESPONSE (req1)
        ProcessRequestFunc-->>ServerAPI: Respond HTTP OK (req1)
        ServerAPI-->>User: HTTP OK (req1)
        ProcessRequestFunc->>RequestLogic: finishProcessingRequest(req1.id)
        RequestLogic->>RequestLogic: activeExtensionProcessingId = null
        alt Queue Not Empty and Behavior is 'queue'
            RequestLogic->>RequestQueue: Dequeue req2
            RequestLogic->>ProcessRequestFunc: processRequest(req2)
            ProcessRequestFunc->>Extension: SEND_CHAT_MESSAGE (req2)
            Note over ProcessRequestFunc,Extension: activeExtensionProcessingId = req2.id
            Extension-->>ProcessRequestFunc: CHAT_RESPONSE (req2)
            ProcessRequestFunc-->>ServerAPI: Respond HTTP OK (req2 via stored res)
            ServerAPI-->>User: HTTP OK (req2)
            ProcessRequestFunc->>RequestLogic: finishProcessingRequest(req2.id)
        end
    else Extension is Busy (initial state)
        RequestLogic->>RequestLogic: Extension Busy
        alt newRequestBehavior == 'queue'
            RequestLogic->>RequestQueue: Enqueue req1
        else newRequestBehavior == 'drop'
            RequestLogic-->>ServerAPI: Respond 429 for req1
            ServerAPI-->>User: HTTP 429 (req1 dropped)
        end
    end

🛡️ Error Handling

  • If no browser extension is connected when a request arrives: Server responds with 503 Service Unavailable.
  • If no browser extension is connected when processRequest attempts to send a message (e.g., after being dequeued): The request is failed, and an error is sent to the original client if headers not already sent.
  • If newRequestBehavior is 'drop' and the extension is busy: Server responds with 429 Too Many Requests.
  • Request Timeout: Each request processed by processRequest has a timeout (currentRequestTimeoutMs, configurable). If the extension doesn't respond in time, the promise is rejected, and an error is sent to the client.
  • Errors from extension (CHAT_RESPONSE_ERROR): Logged, and the corresponding request promise is rejected, leading to an error response to the client.

⚙️ Configuration

The server's behavior can be configured via server-config.json located in the dist directory (created/managed by server.ts). The Admin UI also allows viewing and modifying these settings.

Key configurable options:

  • port: The port on which the server listens. Requires server restart.
  • requestTimeoutMs: Timeout in milliseconds for waiting for a response from the browser extension. Effective immediately.
  • newRequestBehavior: Determines how new requests are handled if the extension is busy. Can be:
    • 'queue' (default): New requests are queued and processed sequentially.
    • 'drop': New requests are rejected with a 429 error. Effective immediately.

🔌 Connection Monitoring

  • The server maintains an array of activeConnections.
  • WebSocket connections have built-in ping/pong mechanisms for keep-alive, managed by the ws library. Explicit server-side ping logic is not currently implemented in server.ts.
  • Disconnected clients are removed from activeConnections.
  • pendingRequests are cleared on timeout or when a request completes (successfully or with an error) via finishProcessingRequest.

Summary

This architecture creates a decoupled, resilient relay system. The new queuing/dropping mechanism ensures that the browser extension processes only one message at a time, preventing race conditions and allowing for configurable behavior when the extension is busy. The Admin UI provides visibility and control over key operational parameters.