Commit graph

3 commits

Author SHA1 Message Date
Alessandro
91f43e28b4 fix: preserve safe remote fetch compatibility for public sites
Restore remote document fetch compatibility for public sites after the
CVE-2026-4308 SSRF hardening.

The initial security fix correctly blocked non-public destinations, but
it also changed the outbound request fingerprint for `document_query`
remote fetches. Some public sites, including https://nvd.nist.gov/vuln/detail/CVE-2026-4308, used for testing, responded with HTTP
403 to the default `requests` user agent even though they remained safe
and publicly routable.

This change keeps the centralized SSRF protections in place while
restoring the previous request compatibility behavior by sending the
configured `USER_AGENT` header, falling back to the prior
`@mixedbread-ai/unstructured` value.

What is fixed:
- public URLs such as
  `https://nvd.nist.gov/vuln/detail/CVE-2026-4308`
  no longer fail with site-specific HTTP 403 due to request fingerprint
  changes introduced by the SSRF mitigation
2026-04-12 02:08:13 +02:00
Alessandro
6397acc092 Fix SSRF in document_query remote fetching (CVE-2026-4308)
Address CVE-2026-4308 in the document_query tool remote-fetch path.

The issue was originally reported by @YLChen-007.

This change replaces ad hoc remote document fetching with a centralized
safe fetch flow that validates remote URLs before any network request is
used for parsing. It blocks localhost and non-public IPv4/IPv6 targets,
validates every redirect hop, disables implicit trust of proxy env
settings for this path, and enforces a strict remote document size cap.

It also removes direct third-party loader access to attacker-controlled
URLs by prefetching remote content first and then parsing only trusted
local bytes or temp files for HTML, text, PDF, image, and unstructured
document handling.

Refs:
- CVE-2026-4308
- Report by @YLChen-007
2026-04-12 02:00:01 +02:00
keyboardstaff
1d81f72a31 refactor: Backend core rewrite - WsHandler + WsManager + handler migration
- Add WsHandler base class, WsManager (connection tracking / event routing / buffering), WsResult
- Extract network.py (is_loopback_address) and context_utils.py (use_context) to eliminate duplication
- Migrate three handlers to api/ following the ws_* py naming convention
- Simplify run_ui.py WebSocket init from ~170 lines to ~10
- Update import paths in api.py, plugins.py, state_monitor.py
2026-03-26 00:58:01 -07:00