Fileshed/docs/SPEC.md

466 lines
17 KiB
Markdown

# SPECIFICATION - Fileshed Tool
## Overview
Fileshed is an Open WebUI tool that allows users to store, manipulate, and organize their files persistently between conversations. The tool offers **three personal zones** for individual work plus **shared group spaces** for collaboration.
## Philosophy
### Personal space = Workshop
Your personal space is your **private workshop**. You can:
- Import files from chat (Uploads)
- Work freely with any file operation (Storage)
- Extract archives, run batch operations, experiment
- Keep versioned documents (Documents)
**It's okay to be messy here** — it's your space.
### Group space = Collaboration
Group spaces are for **sharing finalized documents** with your team. They are:
- **Documents only** (Git versioned)
- Clean and organized
- Collaborative with clear ownership
**Why no "Storage" in groups?**
- Each member already has personal Storage for messy work
- Group space is for publishing/collaborating on documents
- Avoids "who left this .tmp file?" issues
- Keeps collaboration focused
### Shell Commands First
Fileshed emphasizes using shell commands for all shell-doable operations:
```python
# ✅ CORRECT - use mkdir for directories
shed_exec(zone="storage", cmd="mkdir", args=["-p", "projects/2024"])
# ❌ WRONG - don't use patch_text to create directories
shed_patch_text(zone="storage", path="projects/2024/.keep", content="")
```
- **Reading files**: `shed_exec(cmd="cat/head/tail/sed", ...)`
- **Writing files**: `shed_patch_text()` (direct) or `shed_lockedit_*()` (with locking)
### Workflow
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Your Uploads │────▶│ Your Storage │────▶│ Group Space │
│ (import) │ │ (work) │ │ (collaborate) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
┌─────────────────┐
│ Your Documents │
│ (version) │
└─────────────────┘
```
## Architecture
### Personal zones (3 zones per user)
```
{storage_base_path}/
├── users/ # User personal spaces
│ └── {user_id}/
│ ├── Uploads/ # Import zone (read-only + delete)
│ │ └── {conv_id}/ # Isolated per conversation
│ │ └── files...
│ ├── Storage/ # Free zone (all operations)
│ │ ├── data/ # User files
│ │ ├── editzone/ # Temporary working copies
│ │ │ └── {conv_id}/
│ │ └── locks/ # Edit locks
│ └── Documents/ # Versioned zone (auto Git)
│ ├── data/ # Git repository
│ │ └── .git/
│ ├── editzone/
│ │ └── {conv_id}/
│ └── locks/
```
### Group zones (1 zone per group)
```
{storage_base_path}/
├── users/ # User personal spaces
│ └── ...
├── groups/ # Group shared spaces
│ ├── {group_id}/
│ │ ├── data/ # Git repository (Documents only)
│ │ │ └── .git/
│ │ ├── editzone/
│ │ │ └── {conv_id}/
│ │ └── locks/
│ └── {group_id_2}/
│ └── ...
└── access_auth.sqlite # Permission database
```
### Paths
| Path | Description |
| --- | --- |
| `{storage_base_path}/` | Storage root |
| `{storage_base_path}/users/{user_id}/` | User personal space |
| `{user}/Uploads/{conv_id}/` | Import zone |
| `{user}/Storage/data/` | Free workspace |
| `{user}/Documents/data/` | Versioned documents |
| `{storage_base_path}/groups/{group_id}/data/` | Group documents |
| `{storage_base_path}/access_auth.sqlite` | Permission database |
## Code architecture
The tool follows a strict layered architecture for maintainability and security.
All internal methods are in a separate `_FileshedCore` class, preventing the LLM from seeing them.
```
┌─────────────────────────────────────────────────────────────────┐
│ class Tools (PUBLIC API) │
│ async def shed_*() — 37 functions │
│ │
│ These are the ONLY functions visible to the LLM. │
│ Handle: parameter validation, zone resolution, response format │
│ Access internal methods via: self._core._method() │
├─────────────────────────────────────────────────────────────────┤
│ class _FileshedCore (INTERNAL) │
│ _exec_command() _git_run() _validate_*() │
│ _format_response() _resolve_zone() _db_*() │
│ │
│ Internal methods, NOT visible to LLM. │
│ Provide: subprocess wrapper, path validation, Git operations │
│ Instantiated in Tools.__init__: self._core = _FileshedCore() │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 3: INFRASTRUCTURE │
│ subprocess.run() sqlite3 shutil pathlib │
│ │
│ External dependencies. Never called directly from Tools. │
└─────────────────────────────────────────────────────────────────┘
```
**Why `_FileshedCore`?**
- Open WebUI exposes ALL methods of the `Tools` class to the LLM
- Without separation, LLM could see `_exec_command`, `_validate_path`, etc.
- LLMs sometimes attempted to call these internal methods directly
- Now only `shed_*` functions are visible to the LLM
**Critical rules:**
1. **Tools class calls _FileshedCore**`self._core._exec_command()`, not `subprocess.run()`
2. **All paths use `_resolve_chroot_path()`** — Prevents path traversal attacks
3. **All user input is validated**`_validate_command()`, `_validate_args()`, `_validate_relative_path()`
### Zone Resolution
The `_resolve_zone()` method centralizes all zone-specific logic and returns a `ZoneContext` dataclass:
```python
@dataclass
class ZoneContext:
zone_root: Path # Data directory path
zone_name: str # Canonical name (Storage, Documents, Uploads, Group:xxx)
zone_lower: str # Lowercase (storage, documents, uploads, group)
editzone_base: Path # Base for editzones (None for uploads)
conv_id: str # Conversation ID
group_id: Optional[str] # Group ID if zone=group, else None
git_commit: bool # Auto-commit after modifications
readonly: bool # True for uploads
whitelist: set # Allowed commands for this zone
```
Usage:
```python
ctx = self._core._resolve_zone(zone, group, __user__, __metadata__, require_write=True)
# Now use ctx.zone_root, ctx.git_commit, ctx.whitelist, etc.
```
### Key internal methods (in _FileshedCore)
| Method | Purpose |
| --- | --- |
| `_resolve_zone(zone, group, ...)` | Zone resolution, returns ZoneContext |
| `_exec_command(cmd, args, cwd, timeout)` | Subprocess wrapper with timeout, output truncation |
| `_git_run(args, cwd)` | Git operations for Documents zone versioning |
| `_validate_command(cmd)` | Whitelist check against `allowed_commands` |
| `_validate_args(args)` | Block dangerous patterns (`;`, ` |
| `_resolve_chroot_path(root, path)` | Resolve path within zone, prevent escape |
| `_format_response(success, data, message)` | Standardized JSON response format |
## API Design
### Unified Zone Parameter
All operations use a `zone=` parameter to specify the target zone:
```python
shed_exec(zone="storage", cmd="ls", args=["-la"])
shed_exec(zone="documents", cmd="git", args=["log"])
shed_exec(zone="uploads", cmd="cat", args=["file.txt"])
shed_exec(zone="group", group="team-alpha", cmd="ls", args=["-la"])
```
### Zone-Specific Parameters
Some parameters only apply to certain zones and are ignored otherwise:
| Parameter | Zones | Purpose |
| --- | --- | --- |
| `group` | group only | Required group identifier |
| `message` | documents, group | Git commit message |
| `mode` | group only | Ownership mode (owner, group, owner_ro) |
### Function Categories
**Core Operations (10 functions):**
- `shed_exec` — Execute shell commands (including reading files with cat/head/tail, stdout_file= for output redirection)
- `shed_patch_text` — Write/create text files (THE standard write function)
- `shed_patch_bytes` — Write binary data to files
- `shed_delete` — Delete files/folders
- `shed_rename` — Rename/move files
- `shed_lockedit_open` — Lock file for editing (locked edit workflow)
- `shed_lockedit_exec` — Run command on locked file
- `shed_lockedit_overwrite` — Overwrite locked file content
- `shed_lockedit_save` — Save changes and unlock
- `shed_lockedit_cancel` — Discard changes and unlock
**Zone-Aware Builtins (9 functions):**
- `shed_tree` — Directory tree view
- `shed_sqlite` — SQLite queries and CSV import
- `shed_zip` — Create ZIP archives
- `shed_unzip` — Extract ZIP archives
- `shed_zipinfo` — List ZIP contents
- `shed_file_type` — Detect file MIME type
- `shed_convert_eol` — Convert line endings
- `shed_hexdump` — Hex dump of binary files
- `shed_force_unlock` — Force unlock a stuck file
**Download Links (3 functions):**
- `shed_link_create` — Create download link (returns clickable_link in Markdown)
- `shed_link_list` — List your download links
- `shed_link_delete` — Delete a download link
**Group Functions (4 functions):**
- `shed_group_list` — List user's groups
- `shed_group_info` — Group details and members
- `shed_group_set_mode` — Change file permissions
- `shed_group_chown` — Transfer file ownership
**Zone Bridges (5 functions):**
- `shed_move_uploads_to_storage` — Move from Uploads to Storage
- `shed_move_uploads_to_documents` — Move from Uploads to Documents
- `shed_copy_storage_to_documents` — Copy from Storage to Documents
- `shed_move_documents_to_storage` — Move from Documents to Storage
- `shed_copy_to_group` — Copy to a group
**Utilities (6 functions):**
- `shed_import` — Import uploaded files
- `shed_help` — Documentation and guides
- `shed_stats` — Storage usage statistics
- `shed_parameters` — Configuration info
- `shed_allowed_commands` — List allowed shell commands
- `shed_maintenance` — Cleanup expired locks
**Total: 37 functions**
## Group permissions
### Access model
**Simple rule**: All members of an Open WebUI group can access the group's storage space.
Group membership is checked via Open WebUI's Groups API:
```python
from open_webui.models.groups import Groups
def _is_group_member(self, user_id: str, group_id: str) -> bool:
"""Check if user is member of group via Open WebUI API."""
user_groups = Groups.get_groups_by_member_id(user_id)
return any(g.id == group_id for g in user_groups)
```
### File ownership model
When a member uploads/creates a file in the group space, they choose the **write mode**:
| Mode | Description | Read | Write | Delete |
| --- | --- | --- | --- | --- |
| `owner` | I share but keep control | Everyone | Owner only | Owner only |
| `group` | Full collaboration | Everyone | Everyone | Everyone |
| `owner_ro` | I publish and protect | Everyone | Nobody | Nobody |
**Use cases:**
- `owner`: Share a template others can read but not modify
- `group`: Collaborative document everyone edits
- `owner_ro`: Finalized document (change mode first to modify or delete)
### Permission database (access_auth.sqlite)
```sql
-- File ownership in group spaces
CREATE TABLE file_ownership (
id INTEGER PRIMARY KEY AUTOINCREMENT,
group_id TEXT NOT NULL,
file_path TEXT NOT NULL, -- Relative path in data/
owner_id TEXT NOT NULL, -- Who created/uploaded
write_access TEXT NOT NULL -- 'owner' | 'group' | 'owner_ro'
CHECK(write_access IN ('owner', 'group', 'owner_ro')),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(group_id, file_path)
);
CREATE INDEX idx_file_ownership_group ON file_ownership(group_id);
CREATE INDEX idx_file_ownership_owner ON file_ownership(owner_id);
```
## Security
### Command whitelist
Commands are separated into two whitelists:
**WHITELIST_READONLY** (for Uploads zone):
- `cat`, `head`, `tail`, `less`, `more`
- `ls`, `find`, `tree`, `stat`, `file`
- `grep`, `wc`, `diff`, `sort`, `uniq`
- `md5sum`, `sha256sum`, `base64`
- `hexdump`, `xxd`, `strings`, `od`
**WHITELIST_READWRITE** (for Storage, Documents, Groups):
- All of READONLY plus:
- `cp`, `mv`, `rm`, `mkdir`, `touch`
- `sed`, `awk`, `cut`, `tr`, `paste`
- `tar`, `gzip`, `gunzip`, `zip`, `unzip`
- `git` (for Documents/Groups)
- `curl`, `wget` (if network_mode allows)
### Forbidden patterns
The following patterns are blocked in all arguments:
- Shell metacharacters: `;`, `|`, `&&`, `&`, `>`, `>>`, `$(`, `` ` ``
- Path traversal: `..` (normalized away)
- Dangerous options: `find -exec`, `awk system()`, `xargs`
### Network modes
| Mode | Description |
| --- | --- |
| `disabled` | No network access (default) |
| `safe` | Downloads only (curl -o, wget -O, git clone) |
| `all` | Full network access (⚠️ enables data exfiltration) |
### Quotas
| Setting | Default | Description |
| --- | --- | --- |
| `quota_per_user_mb` | 1000 | Personal space limit |
| `quota_per_group_mb` | 2000 | Group space limit |
| `max_file_size_mb` | 300 | Maximum single file size |
## Configuration (Valves)
| Setting | Default | Description |
| --- | --- | --- |
| `storage_base_path` | `/app/backend/data/user_files` | Root storage path |
| `quota_per_user_mb` | 1000 | User quota in MB |
| `quota_per_group_mb` | 2000 | Group quota in MB |
| `max_file_size_mb` | 300 | Max file size |
| `network_mode` | `disabled` | `disabled`, `safe`, or `all` |
| `exec_timeout_default` | 30 | Default command timeout |
| `exec_timeout_max` | 300 | Maximum command timeout |
| `max_output_default` | 50000 | Default output truncation (~50KB) |
| `max_output_absolute` | 5000000 | Absolute max output (~5MB) |
| `lock_max_age_hours` | 24 | Lock expiration time |
| `group_default_mode` | `group` | Default write mode for new group files |
| `openwebui_api_url` | `http://localhost:8080` | Open WebUI base URL for download links |
## Error Handling
All errors use the `StorageError` class with structured JSON responses:
```python
class StorageError(Exception):
def __init__(self, code: str, message: str, details: dict = None, hint: str = None):
self.code = code
self.message = message
self.details = details or {}
self.hint = hint
```
Response format:
```json
{
"success": false,
"error": {
"code": "FILE_NOT_FOUND",
"message": "File not found: config.json",
"details": {"path": "config.json", "zone": "storage"},
"hint": "Check the path and try again"
}
}
```
Common error codes:
- `FILE_NOT_FOUND` — Path does not exist
- `FILE_EXISTS` — Destination already exists
- `FILE_TOO_LARGE` — File exceeds max_file_size_mb limit
- `PATH_ESCAPE` — Path traversal attempt blocked
- `PERMISSION_DENIED` — Group ownership check failed
- `COMMAND_FORBIDDEN` — Command not in whitelist
- `QUOTA_EXCEEDED` — Storage quota exceeded
- `FILE_LOCKED` — File locked by another user/conversation
- `INVALID_ZONE` — Unknown zone parameter
- `ZONE_READONLY` — Write operation on read-only zone (Uploads)
- `MISSING_PARAMETER` — Required parameter missing
- `GROUP_ACCESS_DENIED` — User is not a member of the group
## Response Format
All functions return JSON with consistent structure:
**Success:**
```json
{
"success": true,
"data": { ... },
"message": "Operation completed"
}
```
**Error:**
```json
{
"success": false,
"error": { ... }
}
```
## Authors
- **Fade78** — Original author
- **Claude Opus 4.5** — Co-developer