Context engineering, explained by an operator

April 22, 20266 min readcitizencapet

You're not prompt engineering. You're steering. Here's what context engineering actually means when you're running twenty-plus apps and thirty terminals at once.

"Context engineering" is one of those terms that sounds like a jargon version of something simpler. It's not. When you're running twenty-plus apps and thirty terminal sessions at once, context engineering is the job. It's the part you can't delegate. It's also the part the tooling doesn't support yet, which is the whole reason this product exists. bro we already had a pdf befor Let me describe what a normal day actually looks like at this scale, and then tell you what context engineering means inside it.

Thirty terminals is a real number, not a flex

On any given weekday I have twenty to twenty-five apps open across a few repos: the main product, three auxiliary services, a marketing site, a couple of research prototypes, a few scrapers. Each app has a handful of Claude Code sessions: one driving the feature I'm building, one for the test suite I'm repairing, one for the deploy script I'm debugging, one I opened two hours ago to explore an API and forgot about.

Multiply it out. Thirty terminals isn't a peak. It's a normal Tuesday.

Each of those terminals is its own conversation with Claude. Each conversation has its own context: which files it's read, which decisions it's made, which branches it's operating on, which errors it's seen, which half-formed plan I had before I got interrupted. When any of those windows close, that context is gone. VS Code's integrated terminal buffer caps around a thousand lines. A real Claude Code session is ten thousand lines inside an hour. The scrollback is already full by the time I think about scrolling.

This is the first thing context engineering has to solve at scale: you can't afford to lose the steering. Because the steering is the work.

The 25% tax

Here's the pattern I see in myself and in every power user I talk to. In any eight-hour block of agentic work, roughly a quarter of the time goes to actual steering. The rest is agents running. I call it the 25% tax because that's what it feels like when you're in the middle of it. Your day has a ceiling, and the ceiling is how fast you can make the picking decisions. Steering is:

Deciding which of three possible architectures to take.
Catching the agent when it interprets a requirement the lazy way.
Re-stating a constraint the agent keeps forgetting.
Killing a path that's technically correct but obviously wrong for this codebase.
Pulling in a convention from another project of mine the agent has no access to.
Choosing Resend over SendGrid, Cloudflare R2 over S3, a self-hosted droplet over Vercel, and explaining why.

That last one is the part most people underestimate. Every one of those choices has a reason. The reason isn't in the prompt. The reason is in my head, built up from running the last thirty projects. Every session, I'm re-injecting those reasons into the model, one decision at a time. The agent does the typing. I do the picking.

Roughly a quarter of my day is picking. The rest is agents executing things I've picked. The agents are fast. The picking is the bottleneck.

Context engineering, at this scale, is the work of making the picking cheaper.

Why prompt engineering is the wrong frame

Prompt engineering thinks of a prompt as a thing you write once, carefully, and then a model responds to. That's fine for a single-shot ChatGPT query. It's not how any of this works in long-running agentic work.

In a real session, the prompt is a conversation. It starts with "build the thing." It evolves over two hundred turns: a test fails, I say "no, the failure is in the env var, not the code", the agent pivots, I interrupt again, we decide to split the PR. By turn one hundred, the "prompt" is everything we've said to each other plus everything we haven't said because it's obvious from context.

That full accumulated conversation is the context. When the window ends, the context ends. In the next session, I start from scratch and pay the tax again.

Context engineering is the discipline of making context travel. Between sessions. Between agents. Between machines. Between yesterday-you and tomorrow-you.

What actually works

A few things have survived contact with real work.

Treat sessions as artifacts, not ephemera. Name them the day you finish them. Tag by feature, not by project: #auth-middleware-refactor, not #pest-control-automation. Projects are cwd-derived and too coarse. A single project has a hundred concerns. Name the concern, not the folder.

Pull steering forward, not just content. When I hand a new session the context from an old one, I don't want the full transcript. I want the decisions. The moment I said "no, do it this way." Those turns are five percent of the conversation and a hundred percent of the steering. recall context <id> --prelude "continue this work from where we left off" gets the condensed form, which strips tool JSON and keeps the dialogue.

Build a collection per feature, not per session. Collections in Recall are hand-picked, hierarchical, cross-project. A feature is usually six to ten sessions across two or three repos. Dropping them into a collection turns them from scattered JSONL files into a feature dossier. A future conversation can ingest the dossier instead of the universe.

Use tags to cross-cut what projects cannot. Auth is auth whether it shows up in the SaaS product, the admin tool, or the internal CLI. A #auth tag cuts across all three. Projects never will.

Why this product exists

I built Claude Recall because nothing else was preserving the steering. Claude Code writes every session to ~/.claude/projects/<encoded-path>/<uuid>.jsonl and then never does anything with it again. Those JSONL files are the most valuable data I produce each week, and they sit in a folder nobody ever opens. The indexing problem is solved the moment somebody decides to solve it. Nobody had, so I did.

Indexing is the easy half. The hard half is making the indexed corpus useful for the next steering decision. That's where recall context, collections, and semantic search earn their keep.

If the 25% tax is the tax I pay for being the operator, Recall is an attempt to make that tax deductible. Every steering decision you ever made, searchable. Every context you ever built up, re-injectable in one command. Every collection a portable library of how you architect.

The next post is about where this goes next: training the agent to think like you, so the tax drops from 25% to roughly 5% and the work shifts from picking to reviewing. Once the memory layer is solid, the reflex transfer is the whole game.