Server Architecture Overview

This document outlines the architecture and operation of the WebSocket relay server found in api-relay-server/src/server.ts. The system acts as an intermediary between browser extensions and an HTTP-based API, routing JSON messages bi-directionally, and managing request flow to prevent overloading the browser extension.

🌐 Server Layers

graph TD
    subgraph HTTP Layer
        A[Express App] --> B{/v1/chat/completions}
    end

    subgraph Request Handling Logic
        B -- Request --> C{Extension Busy?}
        C -- No --> D[processRequest()]
        C -- Yes --> E{Behavior: Drop or Queue?}
        E -- Drop --> F[Respond 429]
        E -- Queue --> G[requestQueue]
        G -- Dequeue --> D
    end

    subgraph WebSocket Communication
        D -- SEND_CHAT_MESSAGE --> H[activeConnections[0]]
        H -- CHAT_RESPONSE_* --> I[WebSocket Server]
        I -- Resolve/Reject --> D
    end
    
    D --> J[pendingRequests Map]
    D --> K[finishProcessingRequest()]

    subgraph Admin UI & Config
        L[Express App] --> M[/admin & /v1/admin/*]
        M <--> N[server-config.json]
    end

📁 Core File

server.ts: Main file where the entire Express server, WebSocket infrastructure, request queuing, and admin interface logic is defined.

🧩 Components

1. Express HTTP API

/v1/chat/completions: Accepts OpenAI-compatible requests.
- Implements logic to check if a browser extension is busy (via activeExtensionProcessingId).
- Based on newRequestBehavior setting ('queue' or 'drop'):
  - Queue: Adds incoming request to requestQueue if extension is busy. The HTTP response is deferred.
  - Drop: Responds with 429 Too Many Requests if extension is busy.
- If extension is free, directly calls processRequest().
/v1/admin/server-info: Provides current server status and configuration, including port, requestTimeoutMs, and newRequestBehavior.
/v1/admin/update-settings: Allows updating port, requestTimeoutMs, and newRequestBehavior. Changes are saved to server-config.json.
/v1/admin/message-history: Retrieves recent message logs for the admin UI.
/v1/admin/restart-server: Triggers a server restart.
/admin: Serves the admin HTML interface.
/health: Basic health check.

2. WebSocket Server

WebSocketServer: Accepts WebSocket connections from browser extensions.
activeConnections: Array storing active WebSocket client connections. Currently, only the first connection (activeConnections[0]) is used for sending messages.
Message Handling: Receives messages from the extension (e.g., CHAT_RESPONSE, CHAT_RESPONSE_CHUNK, CHAT_RESPONSE_ERROR) and resolves or rejects promises in the pendingRequests map.

3. Queuing & Processing System

activeExtensionProcessingId: number | null: Tracks the requestId of the message currently being processed by the extension. If null, the extension is considered free.
newRequestBehavior: 'queue' | 'drop': Global variable determining how to handle new requests when the extension is busy. Loaded from server-config.json (defaults to 'queue').
requestQueue: QueuedRequest[]: An in-memory array holding QueuedRequest objects when newRequestBehavior is 'queue' and the extension is busy.
QueuedRequest Interface: Defines the structure for storing an original HTTP request (req, res) and its parameters, to be processed later.
async function processRequest(queuedItem: QueuedRequest):
- Sets activeExtensionProcessingId to the current queuedItem.requestId.
- Logs CHAT_REQUEST_PROCESSING.
- Sends the SEND_CHAT_MESSAGE to the extension via WebSocket.
- Manages a Promise in pendingRequests for the response, including a timeout (currentRequestTimeoutMs).
- On response/error/timeout, formats and sends the HTTP response using the stored queuedItem.res.
- Calls finishProcessingRequest() in a finally block.
function finishProcessingRequest(completedRequestId: number):
- Clears activeExtensionProcessingId.
- Removes the request from pendingRequests.
- If newRequestBehavior is 'queue' and requestQueue is not empty, dequeues the next request and calls processRequest() for it.

4. State Management

pendingRequests: A Map that stores Promise resolve/reject handlers, keyed by requestId. Used by processRequest to await responses from the WebSocket.
requestCounter: Generates unique requestIds.
adminMessageHistory: In-memory store for admin log entries.

🔄 Lifecycle Flow (with Queuing)

sequenceDiagram
    participant User
    participant ServerAPI
    participant RequestLogic
    participant ProcessRequestFunc
    participant Extension
    participant RequestQueue

    User->>ServerAPI: POST /v1/chat/completions (req1)
    ServerAPI->>RequestLogic: Handle req1
    alt Extension is Free
        RequestLogic->>ProcessRequestFunc: processRequest(req1)
        ProcessRequestFunc->>Extension: SEND_CHAT_MESSAGE (req1)
        Note over ProcessRequestFunc,Extension: activeExtensionProcessingId = req1.id
        User->>ServerAPI: POST /v1/chat/completions (req2)
        ServerAPI->>RequestLogic: Handle req2
        RequestLogic->>RequestLogic: Extension Busy (req1.id)
        alt newRequestBehavior == 'queue'
            RequestLogic->>RequestQueue: Enqueue req2
            Note over RequestLogic: HTTP Response for req2 deferred
        else newRequestBehavior == 'drop'
            RequestLogic-->>ServerAPI: Respond 429 for req2
            ServerAPI-->>User: HTTP 429 (req2 dropped)
        end
        Extension-->>ProcessRequestFunc: CHAT_RESPONSE (req1)
        ProcessRequestFunc-->>ServerAPI: Respond HTTP OK (req1)
        ServerAPI-->>User: HTTP OK (req1)
        ProcessRequestFunc->>RequestLogic: finishProcessingRequest(req1.id)
        RequestLogic->>RequestLogic: activeExtensionProcessingId = null
        alt Queue Not Empty and Behavior is 'queue'
            RequestLogic->>RequestQueue: Dequeue req2
            RequestLogic->>ProcessRequestFunc: processRequest(req2)
            ProcessRequestFunc->>Extension: SEND_CHAT_MESSAGE (req2)
            Note over ProcessRequestFunc,Extension: activeExtensionProcessingId = req2.id
            Extension-->>ProcessRequestFunc: CHAT_RESPONSE (req2)
            ProcessRequestFunc-->>ServerAPI: Respond HTTP OK (req2 via stored res)
            ServerAPI-->>User: HTTP OK (req2)
            ProcessRequestFunc->>RequestLogic: finishProcessingRequest(req2.id)
        end
    else Extension is Busy (initial state)
        RequestLogic->>RequestLogic: Extension Busy
        alt newRequestBehavior == 'queue'
            RequestLogic->>RequestQueue: Enqueue req1
        else newRequestBehavior == 'drop'
            RequestLogic-->>ServerAPI: Respond 429 for req1
            ServerAPI-->>User: HTTP 429 (req1 dropped)
        end
    end

🛡️ Error Handling

If no browser extension is connected when a request arrives: Server responds with 503 Service Unavailable.
If no browser extension is connected when processRequest attempts to send a message (e.g., after being dequeued): The request is failed, and an error is sent to the original client if headers not already sent.
If newRequestBehavior is 'drop' and the extension is busy: Server responds with 429 Too Many Requests.
Request Timeout: Each request processed by processRequest has a timeout (currentRequestTimeoutMs, configurable). If the extension doesn't respond in time, the promise is rejected, and an error is sent to the client.
Errors from extension (CHAT_RESPONSE_ERROR): Logged, and the corresponding request promise is rejected, leading to an error response to the client.

⚙️ Configuration

The server's behavior can be configured via server-config.json located in the dist directory (created/managed by server.ts). The Admin UI also allows viewing and modifying these settings.

Key configurable options:

port: The port on which the server listens. Requires server restart.
requestTimeoutMs: Timeout in milliseconds for waiting for a response from the browser extension. Effective immediately.
newRequestBehavior: Determines how new requests are handled if the extension is busy. Can be:
- 'queue' (default): New requests are queued and processed sequentially.
- 'drop': New requests are rejected with a 429 error. Effective immediately.

🔌 Connection Monitoring

The server maintains an array of activeConnections.
WebSocket connections have built-in ping/pong mechanisms for keep-alive, managed by the ws library. Explicit server-side ping logic is not currently implemented in server.ts.
Disconnected clients are removed from activeConnections.
pendingRequests are cleared on timeout or when a request completes (successfully or with an error) via finishProcessingRequest.

✅ Summary

This architecture creates a decoupled, resilient relay system. The new queuing/dropping mechanism ensures that the browser extension processes only one message at a time, preventing race conditions and allowing for configurable behavior when the extension is busy. The Admin UI provides visibility and control over key operational parameters.

9.5 KiB Raw Blame History