navidrome/persistence/sql_search_fts_test.go
Deluan Quintão 54de0dbc52
Some checks are pending
Pipeline: Test, Lint, Build / Lint i18n files (push) Waiting to run
Pipeline: Test, Lint, Build / Check Docker configuration (push) Waiting to run
Pipeline: Test, Lint, Build / Test JS code (push) Waiting to run
Pipeline: Test, Lint, Build / Get version info (push) Waiting to run
Pipeline: Test, Lint, Build / Lint Go code (push) Waiting to run
Pipeline: Test, Lint, Build / Test Go code (push) Waiting to run
Pipeline: Test, Lint, Build / Build (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-1 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-2 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-3 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-4 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-5 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-6 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-7 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-8 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-9 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build-10 (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Push to GHCR (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Push to Docker Hub (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Cleanup digest artifacts (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Build Windows installers (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Package/Release (push) Blocked by required conditions
Pipeline: Test, Lint, Build / Upload Linux PKG (push) Blocked by required conditions
feat(server): implement FTS5-based full-text search (#5079)
* build: add sqlite_fts5 build tag to enable FTS5 support

* feat: add SearchBackend config option (default: fts)

* feat: add buildFTS5Query for safe FTS5 query preprocessing

* feat: add FTS5 search backend with config toggle, refactor legacy search

- Add searchExprFunc type and getSearchExpr() for backend selection
- Rename fullTextExpr to legacySearchExpr
- Add ftsSearchExpr using FTS5 MATCH subquery
- Update fullTextFilter in sql_restful.go to use configured backend

* feat: add FTS5 migration with virtual tables, triggers, and search_participants

Creates FTS5 virtual tables for media_file, album, and artist with
unicode61 tokenizer and diacritic folding. Adds search_participants
column, populates from JSON, and sets up INSERT/UPDATE/DELETE triggers.

* feat: populate search_participants in PostMapArgs for FTS5 indexing

* test: add FTS5 search integration tests

* fix: exclude FTS5 virtual tables from e2e DB restore

The restoreDB function iterates all tables in sqlite_master and
runs DELETE + INSERT to reset state. FTS5 contentless virtual tables
cannot be directly deleted from. Since triggers handle FTS5 sync
automatically, simply skip tables matching *_fts and *_fts_* patterns.

* build: add compile-time guard for sqlite_fts5 build tag

Same pattern as netgo: compilation fails with a clear error if
the sqlite_fts5 build tag is missing.

* build: add sqlite_fts5 tag to reflex dev server config

* build: extract GO_BUILD_TAGS variable in Makefile to avoid duplication

* fix: strip leading * from FTS5 queries to prevent "unknown special query" error

* feat: auto-append prefix wildcard to FTS5 search tokens for broader matching

Every plain search token now gets a trailing * appended (e.g., "love" becomes
"love*"), so searching for "love" also matches "lovelace", "lovely", etc.
Quoted phrases are preserved as exact matches without wildcards. Results are
ordered alphabetically by name/title, so shorter exact matches naturally
appear first.

* fix: clarify comments about FTS5 operator neutralization

The comments said "strip" but the code lowercases operators to
neutralize them (FTS5 operators are case-sensitive). Updated comments
to accurately describe the behavior.

* fix: use fmt.Sprintf for FTS5 phrase placeholders

The previous encoding used rune('0'+index) which silently breaks with
10+ quoted phrases. Use fmt.Sprintf for arbitrary index support.

* fix: validate and normalize SearchBackend config option

Normalize the value to lowercase and fall back to "fts" with a log
warning for unrecognized values. This prevents silent misconfiguration
from typos like "FTS", "Legacy", or "fts5".

* refactor: improve documentation for build tags and FTS5 requirements

Signed-off-by: Deluan <deluan@navidrome.org>

* refactor: convert FTS5 query and search backend normalization tests to DescribeTable format

Signed-off-by: Deluan <deluan@navidrome.org>

* fix: add sqlite_fts5 build tag to golangci configuration

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: add UISearchDebounceMs configuration option and update related components

Signed-off-by: Deluan <deluan@navidrome.org>

* fix: fall back to legacy search when SearchFullString is enabled

FTS5 is token-based and cannot match substrings within words, so
getSearchExpr now returns legacySearchExpr when SearchFullString
is true, regardless of SearchBackend setting.

* fix: add sqlite_fts5 build tag to CI pipeline and Dockerfile

* fix: add WHEN clauses to FTS5 AFTER UPDATE triggers

Added WHEN clauses to the media_file_fts_au, album_fts_au, and
artist_fts_au triggers so they only fire when FTS-indexed columns
actually change. Previously, every row update (e.g., play count, rating,
starred status) triggered an unnecessary delete+insert cycle in the FTS
shadow tables. The WHEN clauses use IS NOT for NULL-safe comparison of
each indexed column, avoiding FTS index churn for non-indexed updates.

* feat: add SearchBackend configuration option to data and insights components

Signed-off-by: Deluan <deluan@navidrome.org>

* fix: enhance input sanitization for FTS5 by stripping additional punctuation and special characters

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: add search_normalized column for punctuated name search (R.E.M., AC/DC)

Add index-time normalization and query-time single-letter collapsing to
fix FTS5 search for punctuated names. A new search_normalized column
stores concatenated forms of punctuated words (e.g., "R.E.M." → "REM",
"AC/DC" → "ACDC") and is indexed in FTS5 tables. At query time, runs of
consecutive single letters (from dot-stripping) are collapsed into OR
expressions like ("R E M" OR REM*) to match both the original tokens and
the normalized form. This enables searching by "R.E.M.", "REM", "AC/DC",
"ACDC", "A-ha", or "Aha" and finding the correct results.

* refactor: simplify isSingleUnicodeLetter to avoid []rune allocation

Use utf8.DecodeRuneInString to check for a single Unicode letter
instead of converting the entire string to a []rune slice.

* feat: define ftsSearchColumns for flexible FTS5 search column inclusion

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: update collapseSingleLetterRuns to return quoted phrases for abbreviations

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: implement extractPunctuatedWords to handle artist/album names with embedded punctuation

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: implement extractPunctuatedWords to handle artist/album names with embedded punctuation

Signed-off-by: Deluan <deluan@navidrome.org>

* refactor: punctuated word handling to improve processing of artist/album names

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: add CJK support for search queries with LIKE filters

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: enhance FTS5 search by adding album version support and CJK handling

Signed-off-by: Deluan <deluan@navidrome.org>

* refactor: search configuration to use structured options

Signed-off-by: Deluan <deluan@navidrome.org>

* feat: enhance search functionality to support punctuation-only queries and update related tests

Signed-off-by: Deluan <deluan@navidrome.org>

---------

Signed-off-by: Deluan <deluan@navidrome.org>
2026-02-21 17:52:42 -05:00

333 lines
12 KiB
Go

package persistence
import (
"context"
"github.com/navidrome/navidrome/conf"
"github.com/navidrome/navidrome/conf/configtest"
"github.com/navidrome/navidrome/log"
"github.com/navidrome/navidrome/model"
"github.com/navidrome/navidrome/model/request"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = DescribeTable("buildFTS5Query",
func(input, expected string) {
Expect(buildFTS5Query(input)).To(Equal(expected))
},
Entry("returns empty string for empty input", "", ""),
Entry("returns empty string for whitespace-only input", " ", ""),
Entry("appends * to a single word for prefix matching", "beatles", "beatles*"),
Entry("appends * to each word for prefix matching", "abbey road", "abbey* road*"),
Entry("preserves quoted phrases without appending *", `"the beatles"`, `"the beatles"`),
Entry("does not double-append * to existing prefix wildcard", "beat*", "beat*"),
Entry("strips FTS5 operators and appends * to lowercased words", "AND OR NOT NEAR", "and* or* not* near*"),
Entry("strips special FTS5 syntax characters and appends *", "test^col:val", "test* col* val*"),
Entry("handles mixed phrases and words", `"the beatles" abbey`, `"the beatles" abbey*`),
Entry("handles prefix with multiple words", "beat* abbey", "beat* abbey*"),
Entry("collapses multiple spaces", "abbey road", "abbey* road*"),
Entry("strips leading * from tokens and appends trailing *", "*livia", "livia*"),
Entry("strips leading * and preserves existing trailing *", "*livia oliv*", "livia* oliv*"),
Entry("strips standalone *", "*", ""),
Entry("strips apostrophe from input", "Guns N' Roses", "Guns* N* Roses*"),
Entry("converts slashed word to phrase+concat OR", "AC/DC", `("AC DC" OR ACDC*)`),
Entry("converts hyphenated word to phrase+concat OR", "a-ha", `("a ha" OR aha*)`),
Entry("converts partial hyphenated word to phrase+concat OR", "a-h", `("a h" OR ah*)`),
Entry("converts hyphenated name to phrase+concat OR", "Jay-Z", `("Jay Z" OR JayZ*)`),
Entry("converts contraction to phrase+concat OR", "it's", `("it s" OR its*)`),
Entry("handles punctuated word mixed with plain words", "best of a-ha", `best* of* ("a ha" OR aha*)`),
Entry("strips miscellaneous punctuation", "rock & roll, vol. 2", "rock* roll* vol* 2*"),
Entry("preserves unicode characters with diacritics", "Björk début", "Björk* début*"),
Entry("collapses dotted abbreviation into phrase", "R.E.M.", `"R E M"`),
Entry("collapses abbreviation without trailing dot", "R.E.M", `"R E M"`),
Entry("collapses abbreviation mixed with words", "best of R.E.M.", `best* of* "R E M"`),
Entry("collapses two-letter abbreviation", "U.K.", `"U K"`),
Entry("does not collapse single letter surrounded by words", "I am fine", "I* am* fine*"),
Entry("does not collapse single standalone letter", "A test", "A* test*"),
Entry("preserves quoted phrase with punctuation verbatim", `"ac/dc"`, `"ac/dc"`),
Entry("preserves quoted abbreviation verbatim", `"R.E.M."`, `"R.E.M."`),
Entry("returns empty string for punctuation-only input", "!!!!!!!", ""),
Entry("returns empty string for mixed punctuation", "!@#$%^&", ""),
)
var _ = DescribeTable("normalizeForFTS",
func(expected string, values ...string) {
Expect(normalizeForFTS(values...)).To(Equal(expected))
},
Entry("strips dots and concatenates", "REM", "R.E.M."),
Entry("strips slash", "ACDC", "AC/DC"),
Entry("strips hyphen", "Aha", "A-ha"),
Entry("skips unchanged words", "", "The Beatles"),
Entry("handles mixed input", "REM", "R.E.M.", "Automatic for the People"),
Entry("deduplicates", "REM", "R.E.M.", "R.E.M."),
Entry("strips apostrophe from word", "N", "Guns N' Roses"),
Entry("handles multiple values with punctuation", "REM ACDC", "R.E.M.", "AC/DC"),
)
var _ = DescribeTable("containsCJK",
func(input string, expected bool) {
Expect(containsCJK(input)).To(Equal(expected))
},
Entry("returns false for empty string", "", false),
Entry("returns false for ASCII text", "hello world", false),
Entry("returns false for Latin with diacritics", "Björk début", false),
Entry("detects Chinese characters (Han)", "周杰伦", true),
Entry("detects Japanese Hiragana", "こんにちは", true),
Entry("detects Japanese Katakana", "カタカナ", true),
Entry("detects Korean Hangul", "한국어", true),
Entry("detects CJK mixed with Latin", "best of 周杰伦", true),
Entry("detects single CJK character", "a曲b", true),
)
var _ = Describe("likeSearchExpr", func() {
It("returns nil for empty query", func() {
Expect(likeSearchExpr("media_file", "")).To(BeNil())
})
It("returns nil for whitespace-only query", func() {
Expect(likeSearchExpr("media_file", " ")).To(BeNil())
})
It("generates LIKE filters against core columns for single CJK word", func() {
expr := likeSearchExpr("media_file", "周杰伦")
sql, args, err := expr.ToSql()
Expect(err).ToNot(HaveOccurred())
// Should have OR between columns for the single word
Expect(sql).To(ContainSubstring("OR"))
Expect(sql).To(ContainSubstring("media_file.title LIKE"))
Expect(sql).To(ContainSubstring("media_file.album LIKE"))
Expect(sql).To(ContainSubstring("media_file.artist LIKE"))
Expect(sql).To(ContainSubstring("media_file.album_artist LIKE"))
Expect(args).To(HaveLen(4))
for _, arg := range args {
Expect(arg).To(Equal("%周杰伦%"))
}
})
It("generates AND of OR groups for multi-word query", func() {
expr := likeSearchExpr("media_file", "周杰伦 greatest")
sql, args, err := expr.ToSql()
Expect(err).ToNot(HaveOccurred())
// Two groups AND'd together, each with 4 columns OR'd
Expect(sql).To(ContainSubstring("AND"))
Expect(args).To(HaveLen(8))
})
It("uses correct columns for album table", func() {
expr := likeSearchExpr("album", "周杰伦")
sql, args, err := expr.ToSql()
Expect(err).ToNot(HaveOccurred())
Expect(sql).To(ContainSubstring("album.name LIKE"))
Expect(sql).To(ContainSubstring("album.album_artist LIKE"))
Expect(args).To(HaveLen(2))
})
It("uses correct columns for artist table", func() {
expr := likeSearchExpr("artist", "周杰伦")
sql, args, err := expr.ToSql()
Expect(err).ToNot(HaveOccurred())
Expect(sql).To(ContainSubstring("artist.name LIKE"))
Expect(args).To(HaveLen(1))
})
It("returns nil for unknown table", func() {
Expect(likeSearchExpr("unknown_table", "周杰伦")).To(BeNil())
})
})
var _ = Describe("ftsSearchExpr", func() {
It("returns nil for empty query", func() {
Expect(ftsSearchExpr("media_file", "")).To(BeNil())
})
It("generates rowid IN subquery with MATCH and column filter", func() {
expr := ftsSearchExpr("media_file", "beatles")
sql, args, err := expr.ToSql()
Expect(err).ToNot(HaveOccurred())
Expect(sql).To(ContainSubstring("media_file.rowid IN"))
Expect(sql).To(ContainSubstring("media_file_fts"))
Expect(sql).To(ContainSubstring("MATCH"))
Expect(args).To(HaveLen(1))
Expect(args[0]).To(HavePrefix("{title album artist album_artist"))
Expect(args[0]).To(ContainSubstring("beatles*"))
})
It("generates correct FTS table name per entity", func() {
for _, table := range []string{"media_file", "album", "artist"} {
expr := ftsSearchExpr(table, "test")
sql, _, err := expr.ToSql()
Expect(err).ToNot(HaveOccurred())
Expect(sql).To(ContainSubstring(table + ".rowid IN"))
Expect(sql).To(ContainSubstring(table + "_fts"))
}
})
It("wraps query with column filter for known tables", func() {
expr := ftsSearchExpr("artist", "Beatles")
_, args, err := expr.ToSql()
Expect(err).ToNot(HaveOccurred())
Expect(args[0]).To(Equal("{name sort_artist_name search_normalized} : (Beatles*)"))
})
It("passes query without column filter for unknown tables", func() {
expr := ftsSearchExpr("unknown_table", "test")
_, args, err := expr.ToSql()
Expect(err).ToNot(HaveOccurred())
Expect(args[0]).To(Equal("test*"))
})
It("preserves phrase queries inside column filter", func() {
expr := ftsSearchExpr("media_file", `"the beatles"`)
_, args, err := expr.ToSql()
Expect(err).ToNot(HaveOccurred())
Expect(args[0]).To(ContainSubstring(`"the beatles"`))
})
It("preserves prefix queries inside column filter", func() {
expr := ftsSearchExpr("media_file", "beat*")
_, args, err := expr.ToSql()
Expect(err).ToNot(HaveOccurred())
Expect(args[0]).To(ContainSubstring("beat*"))
})
It("falls back to LIKE search for punctuation-only query", func() {
expr := ftsSearchExpr("media_file", "!!!!!!!")
Expect(expr).ToNot(BeNil())
sql, args, err := expr.ToSql()
Expect(err).ToNot(HaveOccurred())
Expect(sql).To(ContainSubstring("LIKE"))
Expect(args).To(ContainElement("%!!!!!!!%"))
})
It("returns nil for empty string even with LIKE fallback", func() {
Expect(ftsSearchExpr("media_file", "")).To(BeNil())
Expect(ftsSearchExpr("media_file", " ")).To(BeNil())
})
})
var _ = Describe("FTS5 Integration Search", func() {
var (
mr model.MediaFileRepository
alr model.AlbumRepository
arr model.ArtistRepository
)
BeforeEach(func() {
ctx := log.NewContext(context.TODO())
ctx = request.WithUser(ctx, adminUser)
conn := GetDBXBuilder()
mr = NewMediaFileRepository(ctx, conn)
alr = NewAlbumRepository(ctx, conn)
arr = NewArtistRepository(ctx, conn)
})
Describe("MediaFile search", func() {
It("finds media files by title", func() {
results, err := mr.Search("Radioactivity", 0, 10)
Expect(err).ToNot(HaveOccurred())
Expect(results).To(HaveLen(1))
Expect(results[0].Title).To(Equal("Radioactivity"))
Expect(results[0].ID).To(Equal(songRadioactivity.ID))
})
It("finds media files by artist name", func() {
results, err := mr.Search("Beatles", 0, 10)
Expect(err).ToNot(HaveOccurred())
Expect(results).To(HaveLen(3))
for _, r := range results {
Expect(r.Artist).To(Equal("The Beatles"))
}
})
})
Describe("Album search", func() {
It("finds albums by name", func() {
results, err := alr.Search("Sgt Peppers", 0, 10)
Expect(err).ToNot(HaveOccurred())
Expect(results).To(HaveLen(1))
Expect(results[0].Name).To(Equal("Sgt Peppers"))
Expect(results[0].ID).To(Equal(albumSgtPeppers.ID))
})
It("finds albums with multi-word search", func() {
results, err := alr.Search("Abbey Road", 0, 10)
Expect(err).ToNot(HaveOccurred())
Expect(results).To(HaveLen(2))
})
})
Describe("Artist search", func() {
It("finds artists by name", func() {
results, err := arr.Search("Kraftwerk", 0, 10)
Expect(err).ToNot(HaveOccurred())
Expect(results).To(HaveLen(1))
Expect(results[0].Name).To(Equal("Kraftwerk"))
Expect(results[0].ID).To(Equal(artistKraftwerk.ID))
})
})
Describe("CJK search", func() {
It("finds media files by CJK title", func() {
results, err := mr.Search("プラチナ", 0, 10)
Expect(err).ToNot(HaveOccurred())
Expect(results).To(HaveLen(1))
Expect(results[0].Title).To(Equal("プラチナ・ジェット"))
Expect(results[0].ID).To(Equal(songCJK.ID))
})
It("finds media files by CJK artist name", func() {
results, err := mr.Search("シートベルツ", 0, 10)
Expect(err).ToNot(HaveOccurred())
Expect(results).To(HaveLen(1))
Expect(results[0].Artist).To(Equal("シートベルツ"))
})
It("finds albums by CJK artist name", func() {
results, err := alr.Search("シートベルツ", 0, 10)
Expect(err).ToNot(HaveOccurred())
Expect(results).To(HaveLen(1))
Expect(results[0].Name).To(Equal("COWBOY BEBOP"))
Expect(results[0].ID).To(Equal(albumCJK.ID))
})
It("finds artists by CJK name", func() {
results, err := arr.Search("シートベルツ", 0, 10)
Expect(err).ToNot(HaveOccurred())
Expect(results).To(HaveLen(1))
Expect(results[0].Name).To(Equal("シートベルツ"))
Expect(results[0].ID).To(Equal(artistCJK.ID))
})
})
Describe("Album version search", func() {
It("finds albums by version tag via FTS", func() {
results, err := alr.Search("Deluxe", 0, 10)
Expect(err).ToNot(HaveOccurred())
Expect(results).To(HaveLen(1))
Expect(results[0].ID).To(Equal(albumWithVersion.ID))
})
})
Describe("Punctuation-only search", func() {
It("finds media files with punctuation-only title", func() {
results, err := mr.Search("!!!!!!!", 0, 10)
Expect(err).ToNot(HaveOccurred())
Expect(results).To(HaveLen(1))
Expect(results[0].Title).To(Equal("!!!!!!!"))
Expect(results[0].ID).To(Equal(songPunctuation.ID))
})
})
Describe("Legacy backend fallback", func() {
It("returns results using legacy LIKE-based search when configured", func() {
DeferCleanup(configtest.SetupConfig())
conf.Server.Search.Backend = "legacy"
results, err := mr.Search("Radioactivity", 0, 10)
Expect(err).ToNot(HaveOccurred())
Expect(results).To(HaveLen(1))
Expect(results[0].Title).To(Equal("Radioactivity"))
})
})
})