Training the agent to think like you

April 22, 20266 min readcitizencapet

The point of a memory layer isn't search. It's reflex transfer. Every steering decision you captured becomes the agent's starting position on the next problem.

In the previous post I described what context engineering actually looks like at scale: thirty terminals, twenty-plus apps, roughly a quarter of every day spent on steering. This post is about what happens when that steering stops leaking away and starts compounding instead.

The short version: the memory layer is the prerequisite. The prize is reflex transfer. And reflex transfer is how you take the 25% orchestrator tax down to roughly 5%.

What "steering" actually encodes

Watch yourself during an agentic session. Ignore the code. Look at the decision points. Here's what you're really doing:

Picking a stack. Self-hosted droplet over Vercel because you own the box. Cloudflare R2 over S3 because egress is free and you already have R2. Resend over the alternatives because you've used both and Resend's domain verification flow is less painful. Twilio for SMS because you've already built the retry logic around their quirks.
Picking an architecture. Three different ways to structure the migration, and you pick the one that's easiest to roll back. The agent would have picked the cleanest one. The cleanest one breaks when the deploy fails halfway.
Picking a convention. "We don't use Prisma on this repo, we use better-sqlite3 directly." The agent would have defaulted to Prisma because the training data is full of Prisma. You know why it doesn't fit here.
Picking a boundary. "This belongs in a subagent, not a tool call." The agent would have inlined it. You've been burned by inlining when the retry logic got tangled.

None of those decisions live in the code. They live in the conversation. If the conversation is discarded, the next agent session starts with zero of them, and you pay for each one again.

Now imagine those decisions don't get discarded. Imagine every one of them is captured, tagged, and indexed, across every session across every project you've ever driven. That's the memory layer. And it's the floor, not the ceiling.

What "training the agent to think like you" actually means

Fine-tuning is one answer, but it doesn't fit this problem. You can't fine-tune on your own data in a way that keeps up with the pace at which you actually make decisions. Decisions shift faster than training cycles. The security posture is also wrong. Your steering lives in conversations with a model you're already paying to use. Round-tripping it into a separate fine-tuned model is work nobody should be doing.

The right answer looks different. It's:

Index every session. Local, searchable, durable. This is what Recall already does.
Organize by concern, not by folder. Features, not projects. Collections in Recall is the substrate for this. Every time I ship an auth flow, every session that contributed to it goes into a collection called auth-patterns. Over time, that collection is a dossier of how I do auth. Not generic auth. My auth.
Semantic summaries per session. Not embeddings of every turn. That's expensive and noisy. Summaries of the steering moments per session. "In this session, we decided X, rejected Y, and kept Z for next time." Recall's v0.11 semantic layer does this by shelling out to your own claude CLI. Zero extra cost, same Claude plan you already pay for.
Retrieval into the next conversation. The next time I start a feature that touches auth, the opening context isn't empty. It's the auth-patterns collection, pre-digested, piped into the new session via recall context or via an MCP tool call from inside Claude Code itself. The agent opens the conversation with my reflexes already loaded.

Step four is the reflex transfer. It's not the model learning to be me. It's the agent starting every conversation in the state I would have put it in by hand. That used to cost a quarter of my day. It should cost zero.

The 25% to 5% argument

I said earlier that roughly a quarter of my time goes to steering. The other three quarters are agents executing. Let me walk through what changes as reflex transfer gets real.

Today, starting a new auth-related session costs me maybe fifteen minutes of re-stating: no Prisma, use zod for inputs, SPF plus DKIM plus DMARC on any outbound mail, audit-log every grant change, don't leak implementation details in login error messages. If the agent has the auth-patterns collection loaded, that fifteen minutes drops to zero. The starting position already encodes all of it.

Multiply by the number of features I touch in a day. A dozen areas, fifteen minutes each, is three hours of pure re-stating that isn't work, it's replay. The memory layer deletes it.

What remains is the novel steering. The parts that are genuinely new. The calls where my existing reflexes don't cover the case and I have to invent. That's still work. That's still five percent of the day, maybe ten percent on a hard day. It's the interesting part.

The rest of the day, the agent should be running. That's what I want.

Why this is the foundational story

Every other feature of Recall is in service of this. Full-text search is useful, but it's the substrate for summarization. Context re-injection is useful, but it's the shape of a hand-off, not the purpose. Collections are useful, but they're the container for the dossier.

The foundational story is: every steering decision you capture is a brick in the wall that stops you from making the same decision twice. Over a year of serious agentic work, that wall is the difference between being a one-person shop that burns out at twelve apps and being a one-person shop that scales to fifty.

That's the pitch, and it's the whole pitch. You're not paying $29.69 for searchable JSONL files. You're paying $29.69 for the scaffolding that makes your own steering compound instead of evaporating. The searchable JSONL files are just what that scaffolding looks like from the outside.

Where this goes next

A few things on the near horizon:

Semantic recall into Claude Code itself. An MCP call like "load the auth-patterns collection and summarize the steering" means the next session opens with the summary already in the context window. No manual pipe, no copy-paste. Recall v0.11 is the first step; v0.13 (bidirectional MCP writes) is the next.
Cross-project reflex transfer. Collections already cross project boundaries. The next iteration is automatic suggestion: "you're starting a session in a repo with auth middleware; here are three past sessions that might matter." Local-only. No cloud, no phoning home.
Your own taxonomy, not mine. Everyone steers differently. The shape of your reflexes is yours. The tool should be flexible enough that your organization doesn't look like anyone else's. Collections are a tree because trees survive this.

None of this is magic and none of it is hypothetical. The pieces are real. The question is whether the operator cares enough to capture the steering in the first place.

If you do, the rest is tooling, and the tooling is here.