Three search lanes fused with Reciprocal Rank Fusion (k = 60)

Same query, different lanes

Query

"how do I expire user logins"

Keyword only (Free)

0 hits

No session contains the literal phrase "expire user logins".

+ Semantic (Pro)

7 hits

Top result: a session about JWT refresh rotation. Zero keyword overlap, semantic match.

768

vector dimensions

~110MB

model footprint

CPU

no GPU required

no API fees, ever

Semantic vector search

Two lanes. Tier-2 is the default and the one you want. Tier-1 is an advanced opt-in that costs plan tokens — only reach for it when summary quality matters more than vector similarity.

#Tier-2 — local CPU, free, zero tokens (recommended)

On-device 768-d embeddings via bge-base-en-v1.5 + sqlite-vec. Runs entirely on your CPU. No network, no Anthropic calls, no token spend. Search powered by these vectors fuses three lanes (BM25 + vector + recency) for keyword + concept matching.

bash

recall semantic install                   # one-time: download bge-base-en-v1.5 (~110MB)
recall semantic status                    # model + queue health
recall semantic reindex                   # vectorize sessions on local CPU (idempotent)
recall semantic uninstall                 # remove model + vector index

#Tier-1 — Claude-summarized text (advanced, costs plan tokens)

Shells out to your local claude CLI to summarize each session into 3 sentences + ~15 keywords. The summaries land in session_semantic and feed FTS5. Burns ~30 sessions/min through your Claude plan. Default is OFF. Reach for it only when you specifically want LLM-generated summaries instead of (or in addition to) vector similarity.

bash

recall semantic on                        # enable Tier-1 spawn loop (costs tokens)
recall semantic backfill                  # one-shot summarize N pending sessions
recall semantic auto-extract on           # background nibble (still costs tokens)
recall semantic off                       # disable

#CLI

bash

recall search "expire login"              # auto-fuses on Pro
recall search "expire login" --no-semantic   # BM25-only fallback
recall similar abc12345                   # cosine-ranked related sessions
recall similar abc12345 -n 5              # top 5

#Privacy

The model runs entirely on your CPU. Zero cloud calls. Zero ongoing cost. Embedding and search never call an external API.

#Reindex on huge archives (advanced)

recall semantic reindex is idempotent. Sessions already indexed are skipped, so a killed run picks up where it stopped. By default every chunk in every session is embedded (full accuracy). For a faster first pass on multi-thousand-message archives, cap chunks per session via env var:

bash

RECALL_REINDEX_MAX_CHUNKS=200 recall semantic reindex
# Then later, with no cap, to deepen coverage:
recall semantic reindex

The default is 0 (no cap). The cap is a knob, not a recommendation.

#Limitations

bge-base-en-v1.5 is English-optimized. Multilingual support is planned. Lose the model file? Graceful fallback to keyword search.