navidrome/scanner
Deluan Quintão 5d1c9530ab
Some checks are pending
Pipeline: Test, Lint, Build / Upload Linux PKG (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Lint i18n files (push) Waiting to run
Pipeline: Test, Lint, Build / Check Docker configuration (push) Waiting to run
Pipeline: Test, Lint, Build / Build (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Test JS code (push) Waiting to run
Pipeline: Test, Lint, Build / Get version info (push) Waiting to run
Pipeline: Test, Lint, Build / Lint Go code (push) Waiting to run
Pipeline: Test, Lint, Build / Test Go code (push) Waiting to run
Pipeline: Test, Lint, Build / Test Go code (Windows) (push) Waiting to run
Pipeline: Test, Lint, Build / Build-1 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-2 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-3 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-4 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build Windows installers (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-5 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-6 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-7 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-8 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Package/Release (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-9 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-10 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Push to GHCR (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Push to Docker Hub (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Cleanup digest artifacts (push) Blocked by required conditions
POEditor export / push-translations (push) Waiting to run
feat(cli): add pls export/import subcommands for bulk playlist management (#5412)
* refactor: rename ImportFile to ImportFromFolder in playlists service

* feat: add ImportFile method with library/folder resolution

* feat: allow sync flag upgrade on re-import of non-synced playlists

* feat: add pls export subcommand with bulk and single export

Add `navidrome pls export` command that supports:
- Single playlist export to stdout (-p flag only)
- Single playlist export to directory (-p and -o flags)
- Bulk export of all playlists to a directory (-o flag only)
- Filtering by user (-u flag)
- Automatic filename sanitization and collision detection

Also extracts findPlaylist helper from runExporter for reuse.

* feat: add pls import subcommand with sync flag support

* fix: improve error message for export without output directory

* test: add tests for ImportFile sync flag and sync upgrade behavior

* refactor: streamline export and import logic by removing redundant comments and improving library path matching

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: update ImportFile method to include sync flag for playlist imports

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: implement fetchPlaylists function to streamline playlist retrieval

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: replace inline filename sanitization with centralized utility function

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: refactor playlist import logic to consolidate sync handling and improve method signatures

Signed-off-by: Deluan <deluan@navidrome.org>

* fix: address code review feedback on playlist import/export

- Fix duplicate playlist creation on non-sync re-import: only reconcile
  sync flag when the playlist was actually persisted (has an ID)
- Distinguish "not in any library" from real errors in resolveFolder
  using a sentinel error, so DB/folder errors propagate instead of
  falling back to ImportM3U
- Use bufio.Scanner in countM3UTrackLines instead of reading entire file

* feat: replace bufio.Scanner with UTF8Reader and LinesFrom utility for improved file reading

Signed-off-by: Deluan <deluan@navidrome.org>

* fix: record path for outside-library imports to prevent duplicates

Files outside all libraries now go through updatePlaylist with the
absolute path recorded, so re-importing the same file updates the
existing playlist instead of creating a duplicate.

* refactor: name guard condition in updatePlaylist for readability

Extracted the compound boolean expression into a named local variable
`alreadyImportedAndNotSynced` to make the intent of the early-return
guard clearer at a glance.

* add godocs

Signed-off-by: Deluan <deluan@navidrome.org>

---------

Signed-off-by: Deluan <deluan@navidrome.org>
2026-04-25 20:54:02 -04:00
..
metadata_old refactor: run Go modernize (#5002) 2026-02-08 09:57:30 -05:00
controller.go refactor: move playlist business logic from repositories to service layer (#5027) 2026-02-21 19:57:13 -05:00
controller_test.go feat: add artist image uploads and image-folder artwork source (#5198) 2026-03-15 22:19:55 -04:00
external.go chore(deps): bump golangci-lint to v2.10.0 and suppress new gosec false positives 2026-02-17 09:28:42 -05:00
external_test.go feat(scanner): implement file-based target passing for large target lists 2025-12-16 16:08:32 -05:00
folder_entry.go refactor: move playlist business logic from repositories to service layer (#5027) 2026-02-21 19:57:13 -05:00
folder_entry_test.go feat(scanner): implement selective folder scanning and file system watcher improvements (#4674) 2025-11-14 22:15:43 -05:00
ignore_checker.go refactor: run Go modernize (#5002) 2026-02-08 09:57:30 -05:00
ignore_checker_test.go feat(scanner): implement selective folder scanning and file system watcher improvements (#4674) 2025-11-14 22:15:43 -05:00
phase_1_folders.go fix(scanner): store scan errors in the database and update UI error handling 2026-02-01 16:18:26 +01:00
phase_2_missing_tracks.go fix(scanner): prevent duplicate tracks when multiple missing files match same target (#5183) 2026-03-14 00:07:21 -04:00
phase_2_missing_tracks_test.go fix(scanner): prevent duplicate tracks when multiple missing files match same target (#5183) 2026-03-14 00:07:21 -04:00
phase_3_refresh_albums.go feat(scanner): implement selective folder scanning and file system watcher improvements (#4674) 2025-11-14 22:15:43 -05:00
phase_3_refresh_albums_test.go feat(scanner): implement selective folder scanning and file system watcher improvements (#4674) 2025-11-14 22:15:43 -05:00
phase_4_playlists.go feat(cli): add pls export/import subcommands for bulk playlist management (#5412) 2026-04-25 20:54:02 -04:00
phase_4_playlists_test.go feat(cli): add pls export/import subcommands for bulk playlist management (#5412) 2026-04-25 20:54:02 -04:00
README.md docs(scanner): add overview README document 2025-04-25 12:54:29 -04:00
scanner.go refactor: move playlist business logic from repositories to service layer (#5027) 2026-02-21 19:57:13 -05:00
scanner_benchmark_test.go feat: add artist image uploads and image-folder artwork source (#5198) 2026-03-15 22:19:55 -04:00
scanner_internal_test.go feat(bfr): Big Refactor: new scanner, lots of new fields and tags, improvements and DB schema changes (#2709) 2025-02-19 20:35:17 -05:00
scanner_multilibrary_test.go ci: run Go tests on Windows (#5380) 2026-04-19 13:16:47 -04:00
scanner_selective_test.go ci: run Go tests on Windows (#5380) 2026-04-19 13:16:47 -04:00
scanner_suite_test.go fix(tests): update goleak check condition to use GOLEAK environment variable 2026-01-18 21:11:06 -05:00
scanner_test.go ci: run Go tests on Windows (#5380) 2026-04-19 13:16:47 -04:00
walk_dir_tree.go feat(scanner): implement selective folder scanning and file system watcher improvements (#4674) 2025-11-14 22:15:43 -05:00
walk_dir_tree_test.go ci: run Go tests on Windows (#5380) 2026-04-19 13:16:47 -04:00
watcher.go fix(scanner): increase watcher channel buffers to prevent dropped filesystem events 2026-03-12 17:07:34 -04:00
watcher_test.go ci: run Go tests on Windows (#5380) 2026-04-19 13:16:47 -04:00

Navidrome Scanner: Technical Overview

This document provides a comprehensive technical explanation of Navidrome's music library scanner system.

Architecture Overview

The Navidrome scanner is built on a multi-phase pipeline architecture designed for efficient processing of music files. It systematically traverses file system directories, processes metadata, and maintains a database representation of the music library. A key performance feature is that some phases run sequentially while others execute in parallel.

flowchart TD
    subgraph "Scanner Execution Flow"
        Controller[Scanner Controller] --> Scanner[Scanner Implementation]
        
        Scanner --> Phase1[Phase 1: Folders Scan]
        Phase1 --> Phase2[Phase 2: Missing Tracks]
        
        Phase2 --> ParallelPhases
        
        subgraph ParallelPhases["Parallel Execution"]
            Phase3[Phase 3: Refresh Albums]
            Phase4[Phase 4: Playlist Import]
        end
        
        ParallelPhases --> FinalSteps[Final Steps: GC + Stats]
    end
    
    %% Triggers that can initiate a scan
    FileChanges[File System Changes] -->|Detected by| Watcher[Filesystem Watcher]
    Watcher -->|Triggers| Controller
    
    ScheduledJob[Scheduled Job] -->|Based on Scanner.Schedule| Controller
    ServerStartup[Server Startup] -->|If Scanner.ScanOnStartup=true| Controller
    ManualTrigger[Manual Scan via UI/API] -->|Admin user action| Controller
    CLICommand[Command Line: navidrome scan] -->|Direct invocation| Controller
    PIDChange[PID Configuration Change] -->|Forces full scan| Controller
    DBMigration[Database Migration] -->|May require full scan| Controller
    
    Scanner -.->|Alternative| External[External Scanner Process]

The execution flow shows that Phases 1 and 2 run sequentially, while Phases 3 and 4 execute in parallel to maximize performance before the final processing steps.

Core Components

Scanner Controller (controller.go)

This is the entry point for all scanning operations. It provides:

  • Public API for initiating scans and checking scan status
  • Event broadcasting to notify clients about scan progress
  • Serialization of scan operations (prevents concurrent scans)
  • Progress tracking and monitoring
  • Error collection and reporting
type Scanner interface {
    // ScanAll starts a full scan of the music library. This is a blocking operation.
    ScanAll(ctx context.Context, fullScan bool) (warnings []string, err error)
    Status(context.Context) (*StatusInfo, error)
}

Scanner Implementation (scanner.go)

The primary implementation that orchestrates the four-phase scanning pipeline. Each phase follows the Phase interface pattern:

type phase[T any] interface {
    producer() ppl.Producer[T]
    stages() []ppl.Stage[T]
    finalize(error) error
    description() string
}

This design enables:

  • Type-safe pipeline construction with generics
  • Modular phase implementation
  • Separation of concerns
  • Easy measurement of performance

External Scanner (external.go)

The External Scanner is a specialized implementation that offloads the scanning process to a separate subprocess. This is specifically designed to address memory management challenges in long-running Navidrome instances.

// scannerExternal is a scanner that runs an external process to do the scanning. It is used to avoid
// memory leaks or retention in the main process, as the scanner can consume a lot of memory. The
// external process will be spawned with the same executable as the current process, and will run
// the "scan" command with the "--subprocess" flag.
//
// The external process will send progress updates to the main process through its STDOUT, and the main
// process will forward them to the caller.
sequenceDiagram
    participant MP as Main Process
    participant ES as External Scanner
    participant SP as Subprocess (navidrome scan --subprocess)
    participant FS as File System
    participant DB as Database
    
    Note over MP: DevExternalScanner=true
    MP->>ES: ScanAll(ctx, fullScan)
    activate ES
    
    ES->>ES: Locate executable path
    ES->>SP: Start subprocess with args:<br>scan --subprocess --configfile ... etc.
    activate SP
    
    Note over ES,SP: Create pipe for communication
    
    par Subprocess executes scan
        SP->>FS: Read files & metadata
        SP->>DB: Update database
    and Main process monitors progress
        loop For each progress update
            SP->>ES: Send encoded progress info via stdout pipe
            ES->>MP: Forward progress info
        end
    end
    
    SP-->>ES: Subprocess completes (success/error)
    deactivate SP
    ES-->>MP: Return aggregated warnings/errors
    deactivate ES

Technical details:

  1. Process Isolation

    • Spawns a separate process using the same executable
    • Uses the --subprocess flag to indicate it's running as a child process
    • Preserves configuration by passing required flags (--configfile, --datafolder, etc.)
  2. Inter-Process Communication

    • Uses a pipe for bidirectional communication
    • Encodes/decodes progress updates using Go's gob encoding for efficient binary transfer
    • Properly handles process termination and error propagation
  3. Memory Management Benefits

    • Scanning operations can be memory-intensive, especially with large music libraries
    • Memory leaks or excessive allocations are automatically cleaned up when the process terminates
    • Main Navidrome process remains stable even if scanner encounters memory-related issues
  4. Error Handling

    • Detects non-zero exit codes from the subprocess
    • Propagates error messages back to the main process
    • Ensures resources are properly cleaned up, even in error conditions

Scanning Process Flow

Phase 1: Folder Scan (phase_1_folders.go)

This phase handles the initial traversal and media file processing.

flowchart TD
    A[Start Phase 1] --> B{Full Scan?}
    B -- Yes --> C[Scan All Folders]
    B -- No --> D[Scan Modified Folders]
    C --> E[Read File Metadata]
    D --> E
    E --> F[Create Artists]
    E --> G[Create Albums]
    F --> H[Save to Database]
    G --> H
    H --> I[Mark Missing Folders]
    I --> J[End Phase 1]

Technical implementation details:

  1. Folder Traversal

    • Uses walkDirTree to traverse the directory structure
    • Handles symbolic links and hidden files
    • Processes .ndignore files for exclusions
    • Maps files to appropriate types (audio, image, playlist)
  2. Metadata Extraction

    • Processes files in batches (defined by filesBatchSize = 200)
    • Extracts metadata using the configured storage backend
    • Converts raw metadata to MediaFile objects
    • Collects and normalizes tag information
  3. Album and Artist Creation

    • Groups tracks by album ID
    • Creates album records from track metadata
    • Handles album ID changes by tracking previous IDs
    • Creates artist records from track participants
  4. Database Persistence

    • Uses transactions for atomic updates
    • Preserves album annotations across ID changes
    • Updates library-artist mappings
    • Marks missing tracks for later processing
    • Pre-caches artwork for performance

Phase 2: Missing Tracks Processing (phase_2_missing_tracks.go)

This phase identifies tracks that have moved or been deleted.

flowchart TD
    A[Start Phase 2] --> B[Load Libraries]
    B --> C[Get Missing and Matching Tracks]
    C --> D[Group by PID]
    D --> E{Match Type?}
    E -- Exact --> F[Update Path]
    E -- Same PID --> G[Update If Only One]
    E -- Equivalent --> H[Update If No Better Match]
    F --> I[End Phase 2]
    G --> I
    H --> I

Technical implementation details:

  1. Track Identification Strategy

    • Uses persistent identifiers (PIDs) to track tracks across scans
    • Loads missing tracks and potential matches from the database
    • Groups tracks by PID to limit comparison scope
  2. Match Analysis

    • Applies three levels of matching criteria:
      • Exact match (full metadata equivalence)
      • Single match for a PID
      • Equivalent match (same base path or similar metadata)
    • Prioritizes matches in order of confidence
  3. Database Update Strategy

    • Preserves the original track ID
    • Updates the path to the new location
    • Deletes the duplicate entry
    • Uses transactions to ensure atomicity

Phase 3: Album Refresh (phase_3_refresh_albums.go)

This phase updates album information based on the latest track metadata.

flowchart TD
    A[Start Phase 3] --> B[Load Touched Albums]
    B --> C[Filter Unmodified]
    C --> D{Changes Detected?}
    D -- Yes --> E[Refresh Album Data]
    D -- No --> F[Skip]
    E --> G[Update Database]
    F --> H[End Phase 3]
    G --> H
    H --> I[Refresh Statistics]

Technical implementation details:

  1. Album Selection Logic

    • Loads albums that have been "touched" in previous phases
    • Uses a producer-consumer pattern for efficient processing
    • Retrieves all media files for each album for completeness
  2. Change Detection

    • Rebuilds album metadata from associated tracks
    • Compares album attributes for changes
    • Skips albums with no media files
    • Avoids unnecessary database updates
  3. Statistics Refreshing

    • Updates album play counts
    • Updates artist play counts
    • Maintains consistency between related entities

Phase 4: Playlist Import (phase_4_playlists.go)

This phase imports and updates playlists from the file system.

flowchart TD
    A[Start Phase 4] --> B{AutoImportPlaylists?}
    B -- No --> C[Skip]
    B -- Yes --> D{Admin User Exists?}
    D -- No --> E[Log Warning & Skip]
    D -- Yes --> F[Load Folders with Playlists]
    F --> G{For Each Folder}
    G --> H[Read Directory]
    H --> I{For Each Playlist}
    I --> J[Import Playlist]
    J --> K[Pre-cache Artwork]
    K --> L[End Phase 4]
    C --> L
    E --> L

Technical implementation details:

  1. Playlist Discovery

    • Loads folders known to contain playlists
    • Focuses on folders that have been touched in previous phases
    • Handles both playlist formats (M3U, NSP)
  2. Import Process

    • Uses the core.Playlists service for import
    • Handles both regular and smart playlists
    • Updates existing playlists when changed
    • Pre-caches playlist cover art
  3. Configuration Awareness

    • Respects the AutoImportPlaylists setting
    • Requires an admin user for playlist import
    • Logs appropriate messages for configuration issues

Final Processing Steps

After the four main phases, several finalization steps occur:

  1. Garbage Collection

    • Removes dangling tracks with no files
    • Cleans up empty albums
    • Removes orphaned artists
    • Deletes orphaned annotations
  2. Statistics Refresh

    • Updates artist song and album counts
    • Refreshes tag usage statistics
    • Updates aggregate metrics
  3. Library Status Update

    • Marks scan as completed
    • Updates last scan timestamp
    • Stores persistent ID configuration
  4. Database Optimization

    • Performs database maintenance
    • Optimizes tables and indexes
    • Reclaims space from deleted records

File System Watching

The watcher system (watcher.go) provides real-time monitoring of file system changes:

flowchart TD
    A[Start Watcher] --> B[For Each Library]
    B --> C[Start Library Watcher]
    C --> D[Monitor File Events]
    D --> E{Change Detected?}
    E -- Yes --> F[Wait for More Changes]
    F --> G{Time Elapsed?}
    G -- Yes --> H[Trigger Scan]
    G -- No --> F
    H --> I[Wait for Scan Completion]
    I --> D

Technical implementation details:

  1. Event Throttling

    • Uses a timer to batch changes
    • Prevents excessive rescanning
    • Configurable wait period
  2. Library-specific Watching

    • Each library has its own watcher goroutine
    • Translates paths to library-relative paths
    • Filters irrelevant changes
  3. Platform Adaptability

    • Uses storage-provided watcher implementation
    • Supports different notification mechanisms per platform
    • Graceful fallback when watching is not supported

Edge Cases and Optimizations

Handling Album ID Changes

The scanner carefully manages album identity across scans:

  • Tracks previous album IDs to handle ID generation changes
  • Preserves annotations when IDs change
  • Maintains creation timestamps for consistent sorting

Detecting Moved Files

A sophisticated algorithm identifies moved files:

  1. Groups missing and new files by their Persistent ID
  2. Applies multiple matching strategies in priority order
  3. Updates paths rather than creating duplicate entries

Resuming Interrupted Scans

If a scan is interrupted:

  • The next scan detects this condition
  • Forces a full scan if the previous one was a full scan
  • Continues from where it left off for incremental scans

Memory Efficiency

Several strategies minimize memory usage:

  • Batched file processing (200 files at a time)
  • External scanner process option
  • Database-side filtering where possible
  • Stream processing with pipelines

Concurrency Control

The scanner implements a sophisticated concurrency model to optimize performance:

  1. Phase-Level Parallelism:

    • Phases 1 and 2 run sequentially due to their dependencies
    • Phases 3 and 4 run in parallel using the chain.RunParallel() function
    • Final steps run sequentially to ensure data consistency
  2. Within-Phase Concurrency:

    • Each phase has configurable concurrency for its stages
    • For example, phase_1_folders.go processes folders concurrently: ppl.NewStage(p.processFolder, ppl.Name("process folder"), ppl.Concurrency(conf.Server.DevScannerThreads))
    • Multiple stages can exist within a phase, each with its own concurrency level
  3. Pipeline Architecture Benefits:

    • Producer-consumer pattern minimizes memory usage
    • Work is streamed through stages rather than accumulated
    • Back-pressure is automatically managed
  4. Thread Safety Mechanisms:

    • Atomic counters for statistics gathering
    • Mutex protection for shared resources
    • Transactional database operations

Configuration Options

The scanner's behavior can be customized through several configuration settings that directly affect its operation:

Core Scanner Options

Setting Description Default
Scanner.Enabled Whether the automatic scanner is enabled true
Scanner.Schedule Cron expression or duration for scheduled scans (e.g., "@daily") "0" (disabled)
Scanner.ScanOnStartup Whether to scan when the server starts true
Scanner.WatcherWait Delay before triggering scan after file changes detected 5s
Scanner.ArtistJoiner String used to join multiple artists in track metadata " • "

Playlist Processing

Setting Description Default
PlaylistsPath Path(s) to search for playlists (supports glob patterns) ""
AutoImportPlaylists Whether to import playlists during scanning true

Performance Options

Setting Description Default
DevExternalScanner Use external process for scanning (reduces memory issues) true
DevScannerThreads Number of concurrent processing threads during scanning 5

Persistent ID Options

Setting Description Default
PID.Track Format for track persistent IDs (critical for tracking moved files) "musicbrainz_trackid|albumid,discnumber,tracknumber,title"
PID.Album Format for album persistent IDs (affects album grouping) "musicbrainz_albumid|albumartistid,album,albumversion,releasedate"

These options can be set in the Navidrome configuration file (e.g., navidrome.toml) or via environment variables with the ND_ prefix (e.g., ND_SCANNER_ENABLED=false). For environment variables, dots in option names are replaced with underscores.

Conclusion

The Navidrome scanner represents a sophisticated system for efficiently managing music libraries. Its phase-based pipeline architecture, careful handling of edge cases, and performance optimizations allow it to handle libraries of significant size while maintaining data integrity and providing a responsive user experience.