WFGY/ProblemMap/patterns/pattern_query_parsing_split.md

14 KiB
Raw Blame History

Pattern — Query Parsing Split (Multi-Intent / Wrong Sub-Intent First)

Scope
A single user query actually contains multiple intents (lookup + policy + transformation + generation), but the pipeline treats it as one retrieval/generation ask. The system answers the easiest/earliest sub-intent and ignores the rest, or mixes intents, causing off-topic retrieval and wrong acceptance decisions.

Why it matters
Multi-intent queries are common (“compare A vs B and give a summary with citations”). If you dont split, retrieval pools and prompts blur constraints, you get false grounding, and audit trails become meaningless (“which intent did this citation serve?”).

Quick nav: Patterns Index · Examples: Example 01 · Example 03 · Eval: Precision & CHR


1) Signals & Fast Triage

Likely symptoms

  • The answer handles only part of the question (e.g., explains A but not B or the comparison).
  • Retrieved chunks mix unrelated facets (policy + tutorial + changelog) → noisy context, low CHR.
  • Auditor (Example 04) flips VALIDNOT_IN_CONTEXT depending on which fragment the model latched onto.
  • Example 02 labels skew to query_parse_error.

Deterministic checks (no LLM)

  • Separator heuristics: query contains and, vs, ;, ,, numbered lists 1) 2), or colon-scoped asks (“X: do Y, then Z”).
  • Verb phase count: ≥2 finite verbs across different objects (compare, explain, implement, deploy).
  • Constraint tokens: presence of at least one data intent (find, lookup, cite) and one action intent (summarize, generate, rewrite).
    If ≥2 signals → treat as multi-intent and split.

2) Minimal Reproducible Case

data/chunks.json:

[
  {"id":"pA#1","text":"Policy A: Only domain example.com is allowed."},
  {"id":"pB#1","text":"Policy B: Allow *.company.com and partner domains."},
  {"id":"pC#1","text":"How to edit email settings in the dashboard."}
]

User query: “Compare Policy A vs B with citations, then draft an email asking IT to switch our domain.”

Naive pipelines either:

  1. Summarize A or B only, or
  2. Draft the email without grounded comparison.

3) Root Causes

  • Single-turn monolith: retrieval runs once on the whole sentence; constraints collide.
  • No intent schema: pipeline cant represent “first compare (grounded), then draft (un-grounded).”
  • Prompt overloading: one template tries to do comparison + generation + policy proof.
  • Acceptance gate blind: Auditor validates a claim that mixes two intents.

4) Standard Fix (Ordered, Minimal, Measurable)

Step 1 — Detect & Split

  • Run deterministic heuristics (Section 1) to produce sub-intents with roles: COMPARE, LOOKUP, DRAFT, REWRITE, etc.
  • Each sub-intent gets its own retrieval pool and acceptance rule.

Step 2 — Bind Contracts per Sub-Intent

  • Evidence-only template for grounded intents (COMPARE, LOOKUP) → requires citations: [id,...].
  • Free-form template for creative intents (DRAFT) → must echo the grounded summary id (handoff contract) but does not add new citations.

Step 3 — Sequence with Handoffs

  • Output of grounded step → summary.claim, citations.
  • Draft step may rephrase but cannot introduce new factual claims; it references the handoff id.

Step 4 — Accept or Refuse

  • Accept only if grounded step is VALID (Example 04) and draft step references the correct handoff id.
  • If grounded step is NOT_IN_CONTEXT, overall request returns refusal with explanation.

Step 5 — Evaluate

  • Example 08: score per-intent precision and CHR; drafts are graded on schema compliance, not truth.

5) Reference Implementation — Python (stdlib only)

Create tools/intent_split.py.

# tools/intent_split.py -- rule-based multi-intent splitter + per-intent contracts
import re, json, os, time, urllib.request, uuid

GROUND_REFUSAL = "not in context"

def split_intents(q: str):
    text = q.strip()
    # crude separators
    parts = re.split(r"\bthen\b|;| and then | && | -> ", text, flags=re.IGNORECASE)
    intents = []
    for p in parts:
        role = "LOOKUP"
        if re.search(r"\bcompare|vs\b", p, re.IGNORECASE): role = "COMPARE"
        if re.search(r"\bdraft|email|write|generate|compose\b", p, re.IGNORECASE): role = "DRAFT"
        intents.append({"id": str(uuid.uuid4())[:8], "role": role, "text": p.strip()})
    # if single fragment but has both compare + draft keywords, split into two logical intents
    if len(intents)==1 and re.search(r"\bcompare|vs\b", text, re.IGNORECASE) and re.search(r"\bdraft|email|write|generate\b", text, re.IGNORECASE):
        intents = [
            {"id": str(uuid.uuid4())[:8], "role":"COMPARE", "text": text},
            {"id": str(uuid.uuid4())[:8], "role":"DRAFT",   "text": text}
        ]
    return intents

def retrieve(chunks, q, k=6):
    qs = set(w for w in re.split(r"\W+", q.lower()) if len(w)>=3)
    scored = []
    for c in chunks:
        toks = re.split(r"\W+", c["text"].lower())
        overlap = sum(1 for t in toks if t in qs)
        scored.append((overlap, c))
    scored.sort(key=lambda x: x[0], reverse=True)
    return [c for s,c in scored[:k]]

def build_compare_prompt(q, ctx, allowed):
    ctxs = "\n\n".join(f"[{c['id']}] {c['text']}" for c in ctx)
    return (
        "Task: Compare the two policies strictly from EVIDENCE.\n"
        "Output JSON ONLY: { claim: string, citations: [id,...] }\n"
        f"If not provable, reply exactly '{GROUND_REFUSAL}'.\n\n"
        f"Question: {q}\nEVIDENCE:\n{ctxs}\n"
    )

def build_draft_prompt(summary_json):
    return (
        "Task: Draft a short email referencing the grounded comparison.\n"
        "You MUST echo {handoff_id} exactly and MUST NOT add new policy facts.\n"
        "Output JSON ONLY: { email: string, handoff_id: string }\n\n"
        f"Grounded summary:\n{json.dumps(summary_json)}\n"
    )

def call_openai(prompt, model=os.getenv("OPENAI_MODEL","gpt-4o-mini")):
    key=os.getenv("OPENAI_API_KEY"); assert key, "OPENAI_API_KEY"
    body = json.dumps({"model":model,"messages":[{"role":"user","content":prompt}],"temperature":0}).encode()
    req  = urllib.request.Request("https://api.openai.com/v1/chat/completions", data=body, headers={"Content-Type":"application/json","Authorization":f"Bearer {key}"})
    with urllib.request.urlopen(req) as r:
        j=json.loads(r.read().decode()); return j["choices"][0]["message"]["content"].strip()

def parse_json(text):
    s=text.find("{"); e=text.rfind("}")
    if s<0 or e<=s: return None
    try: return json.loads(text[s:e+1])
    except: return None

def run(q, chunks):
    turns=[]
    intents = split_intents(q)
    handoff=None
    for it in intents:
        if it["role"] in ("LOOKUP","COMPARE"):
            ctx = retrieve(chunks, it["text"], k=6)
            allowed = [c["id"] for c in ctx]
            out = parse_json(call_openai(build_compare_prompt(it["text"], ctx, allowed)))
            if not out or (isinstance(out, dict) and out.get("claim","").strip().lower()==GROUND_REFUSAL):
                return {"status":"REFUSAL", "reason":"grounding_failed"}
            # schema & scope checks
            if not set(out.get("citations",[])).issubset(set(allowed)):
                return {"status":"REJECT", "reason":"citation_out_of_scope"}
            handoff = {"handoff_id": str(uuid.uuid4())[:8], "summary": out}
            turns.append({"intent": it, "ctx_ids": allowed, "out": out, "handoff_id": handoff["handoff_id"]})
        elif it["role"]=="DRAFT":
            if not handoff: return {"status":"REJECT", "reason":"draft_without_grounding"}
            draft = parse_json(call_openai(build_draft_prompt({"handoff_id": handoff["handoff_id"], **handoff["summary"]})))
            if not draft or draft.get("handoff_id") != handoff["handoff_id"]:
                return {"status":"REJECT", "reason":"handoff_mismatch"}
            turns.append({"intent": it, "out": draft, "handoff_id": handoff["handoff_id"]})
    return {"status":"OK", "turns": turns}

if __name__=="__main__":
    chunks = json.load(open("data/chunks.json",encoding="utf8"))
    print(json.dumps(run("Compare Policy A vs B with citations, then draft an email asking IT to switch our domain.", chunks), indent=2))

Pass criteria

  • For the sample query, the first turn is a grounded COMPARE with citations to pA#1/pB#1.
  • The DRAFT turn echoes the handoff_id and contains no new policy facts.
  • If comparison is not in context, overall REFUSAL (no email is drafted).

6) Node Quick Variant (split only, no LLM call)

Create tools/intent_split.mjs.

// tools/intent_split.mjs -- detect multi-intent; emit a small plan
export function splitIntents(q){
  const text = q.trim();
  const cuts = text.split(/(?:\bthen\b|;| and then | && | -> )/i).map(s=>s.trim()).filter(Boolean);
  const parts = cuts.length ? cuts : [text];
  return parts.map(p=>{
    let role = "LOOKUP";
    if (/\bcompare|vs\b/i.test(p)) role = "COMPARE";
    if (/\bdraft|email|write|generate|compose\b/i.test(p)) role = "DRAFT";
    return { role, text: p };
  });
}

// CLI
if (import.meta.url === `file://${process.argv[1]}`) {
  const q = process.argv.slice(2).join(" ");
  console.log(JSON.stringify(splitIntents(q), null, 2));
}

7) Acceptance Criteria (ship/no-ship)

A multi-intent response may ship only if:

  1. Each grounded sub-intent has citations ⊆ retrieved_ids and passes Auditor VALID.
  2. Creative sub-intents (draft/rewrite) echo a valid handoff_id from a VALID grounded step.
  3. If any grounded sub-intent returns not in context, the overall request refuses (no partial answers).
  4. Example 08 per-intent gates pass (CHR for grounded, compliance for drafts).

8) Prevention (contracts & defaults)

  • Query schema: role: {COMPARE|LOOKUP|DRAFT|REWRITE}, text, optional constraints.
  • Router default: split when ≥2 deterministic signals fire; otherwise single-intent.
  • Template isolation: distinct prompts per role; never mix compare + draft in the same prompt.
  • UI hinting: suggest quick toggles (“Compare” / “Draft”) for power users; cut ambiguity at the source.

9) Debug Workflow (10 minutes)

  1. Run the splitter; print the plan.
  2. Execute grounded step(s) first and log citations.
  3. Ensure draft step references a real handoff_id.
  4. If grounded fails → return refusal; do not proceed.
  5. Re-score with Example 08; CHR should improve while over-refusal stays controlled.

10) Common Traps & Fixes

  • Draft first temptation → ungrounded emails. Always ground before drafting.
  • One big retrieval for all roles → tail noise. Retrieve per role.
  • Auditor on drafts → meaningless. Audit only grounded claims; drafts check schema and handoff.
  • Partial shipping (“we answered the easy half”) → inconsistent UX. Refuse on missing grounded parts.

11) Minimal Checklist (copy into PR)

  • Split multi-intent queries deterministically into roles.
  • Grounded steps use evidence-only template + citations.
  • Drafts echo handoff_id; no new facts.
  • Acceptance gate enforces per-role rules; no partial ship.
  • Example 08 gates pass per intent.

References to hands-on examples

  • Example 01 — Guarded baseline (evidence-only + refusal)
  • Example 02 — Reflection triage (query_parse_error)
  • Example 03 — Retrieval stabilization for each sub-intent
  • Example 04 — Acceptance gate (Scholar/Auditor + handoff)
  • Example 08 — Eval per intent (CHR for grounded, compliance for drafts)

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
Engine WFGY 1.0 Original PDF based tension engine
Engine WFGY 2.0 Production tension kernel and math engine for RAG and agents
Engine WFGY 3.0 TXT based Singularity tension engine, 131 S class set
Map Problem Map 1.0 Flagship 16 problem RAG failure checklist and fix map
Map Problem Map 2.0 RAG focused recovery pipeline
Map Problem Map 3.0 Global Debug Card, image as a debug protocol layer
Map Semantic Clinic Symptom to family to exact fix
Map Grandmas Clinic Plain language stories mapped to Problem Map 1.0
Onboarding Starter Village Guided tour for newcomers
App TXT OS TXT semantic OS, fast boot
App Blah Blah Blah Abstract and paradox Q and A built on TXT OS
App Blur Blur Blur Text to image with semantic control
App Blow Blow Blow Reasoning game engine and memory demo

If this repository helped, starring it improves discovery so more builders can find the docs and tools. GitHub Repo stars