Semantic vector search
Two lanes. Tier-2 is the default and the one you want. Tier-1 is an advanced opt-in that costs plan tokens — only reach for it when summary quality matters more than vector similarity.
#Tier-2 — local CPU, free, zero tokens (recommended)
On-device 768-d embeddings via bge-base-en-v1.5 + sqlite-vec. Runs entirely on your CPU. No network, no Anthropic calls, no token spend. Search powered by these vectors fuses three lanes (BM25 + vector + recency) for keyword + concept matching.
recall semantic install # one-time: download bge-base-en-v1.5 (~110MB)
recall semantic status # model + queue health
recall semantic reindex # vectorize sessions on local CPU (idempotent)
recall semantic uninstall # remove model + vector index#Tier-1 — Claude-summarized text (advanced, costs plan tokens)
Shells out to your local claude CLI to summarize each session into 3 sentences + ~15 keywords. The summaries land in session_semantic and feed FTS5. Burns ~30 sessions/min through your Claude plan. Default is OFF. Reach for it only when you specifically want LLM-generated summaries instead of (or in addition to) vector similarity.
recall semantic on # enable Tier-1 spawn loop (costs tokens)
recall semantic backfill # one-shot summarize N pending sessions
recall semantic auto-extract on # background nibble (still costs tokens)
recall semantic off # disable#CLI
recall search "expire login" # auto-fuses on Pro
recall search "expire login" --no-semantic # BM25-only fallback
recall similar abc12345 # cosine-ranked related sessions
recall similar abc12345 -n 5 # top 5#Privacy
The model runs entirely on your CPU. Zero cloud calls. Zero ongoing cost. Embedding and search never call an external API.
#Reindex on huge archives (advanced)
recall semantic reindex is idempotent. Sessions already indexed are skipped, so a killed run picks up where it stopped. By default every chunk in every session is embedded (full accuracy). For a faster first pass on multi-thousand-message archives, cap chunks per session via env var:
RECALL_REINDEX_MAX_CHUNKS=200 recall semantic reindex
# Then later, with no cap, to deepen coverage:
recall semantic reindexThe default is 0 (no cap). The cap is a knob, not a recommendation.
#Limitations
bge-base-en-v1.5 is English-optimized. Multilingual support is planned. Lose the model file? Graceful fallback to keyword search.