Skip to content
We’re going live Sunday, April 26 at 9:00 PM CT.

Semantic vector search

New in v0.7 (Pro). On-device semantic embeddings that find sessions by meaning, not just keywords.

What it solves

Keyword search fails when your query uses different words than the session content. "How do I expire user logins" won't match a session about "JWT refresh rotation" because there is zero keyword overlap. Semantic vectors close this gap.

How it works

  1. Every conversation turn is chunked (3-5 messages) and embedded into a 768-dimension vector using bge-base-en-v1.5, a local ONNX model running on your CPU.
  2. Queries are embedded with the same model.
  3. Results from three search lanes are fused using Reciprocal Rank Fusion (RRF, k=60):
    • Lane 1: BM25 full-text search over messages (FTS5)
    • Lane 2: FTS5 over session summaries
    • Lane 3: Cosine kNN over vector chunks (sqlite-vec)

The fused ranking is automatic when semantic search is active.

Setup

  1. Activate a Pro license.
  2. Install the embedding model:
recall semantic install

This downloads ~110MB once to ~/.recall/models/bge-base-en-v1.5/. Background embedding starts automatically.

  1. Check status:
recall semantic status

CLI commands

recall search "expire user logins"                 # auto-fuses all three lanes on Pro
recall search "expire user logins" --no-semantic   # BM25-only fallback
recall similar abc12345                            # find sessions similar to this one
recall semantic install                            # download the embedding model (~110MB)
recall semantic status                             # model + queue health
recall semantic reindex                            # re-embed everything (rarely needed)
recall semantic uninstall                          # remove model files, free disk

MCP tools

| Tool | What it does | |---|---| | search (existing) | Automatically gains three-lane vector fusion on Pro. No schema change. | | find_similar_sessions | Cosine-ranked related sessions. Pro only; returns upgrade CTA on Free. | | semantic_status | Model health snapshot for agent orchestration. |

HTTP endpoints

| Endpoint | What it does | |---|---| | GET /api/semantic/status | Model ID, dimension, chunk count, queue depth, license state | | POST /api/semantic/install | Kick off background model download | | GET /api/sessions/:id/similar?limit=N | Cosine-ranked related sessions |

Privacy

  • The embedding model runs entirely on your device. No cloud API calls, ever.
  • No data leaves your machine. Same local-first posture as everything else in Recall.
  • Vectors are a derived cache. If lost, they recompute from your original messages.

FAQ

Is my code sent anywhere?

No. The model runs locally on your CPU.

What does the model cost?

Zero ongoing. One ~110MB download, no API fees, no token charges.

Can I use semantic search on Free tier?

No. Semantic search is Pro only. Free tier keeps full BM25 keyword search.

What happens if I delete the model?

Graceful fallback to keyword search. Re-download anytime via recall semantic install.

Does it work in languages other than English?

The current model (bge-base-en-v1.5) is English-optimized. Multilingual support is planned for a future release.