#AI #GenerativeCoding #ClaudeCode #VibeCoding #AIAgents #DeveloperWorkflow #ContextEngineering
The Conversation Is the Asset
My AI agents can tell you why we made an architecture decision in January, quote the exact conversation, and link you to the file. Not because the AI remembers — it doesn't — but because we never let the conversation live in the chat window in the first place.
Here's a question that sounds philosophical but is actually practical: when you finish a session with an AI coding assistant, what did you produce?
The obvious answer is "code", or some other agentic result. But that is maybe a third of it. The rest is everything that happened on the way to the result: the requirement you clarified three times before the AI understood it, the approach you tried and abandoned, the moment you said "no, actually, do it this other way," the decision you made about the data structure and — crucially — why you made it.
By default, all of that lives in the context window. And the context window is a rental. To be fair, Claude Code does save a full transcript of every session on your machine, and you can resume an old one. (You've probably noticed that this isn't 100% reliable.) However, saved isn't the same as useful. Those transcripts are machine-format logs, organized by date on one computer — not by feature, not by project, not summarized, not searchable by topic, and not in your repo. And when a long session compacts to free up room, the summary that survives is lossy in ways you don't get to choose. The record of how your app came to be the way it is technically exists somewhere, the way a security-camera archive technically contains your biography. Next session, you and the AI still start over as strangers.
About six months ago I stopped accepting that. The fix turned out to be embarrassingly low-tech: move the conversation out of the chat window and into a markdown file. Everything else in this article — the commands, the memory agent, the spec workflow — is scaffolding around that one move.
The session doc
I work with a small fleet of AI agents, each one assigned to a project. Nova builds my entertainment-tracking app. Pea maintains RunPee. CJ is my chief of staff for everything that isn't code. They're all instances of the same underlying AI — what makes them them is the set of files they wake up with.
Every working session with any of them starts the same way. I type /session 'some title' 'what this session is for' and the agent creates a markdown file: sessions/06-June/09-Tue_blog-session-reflect.md. Date in the path, topic in the name. That file — not the terminal — is where the session happens.
The format inside is dead simple. My messages and the agent's responses alternate, separated by a delimiter:
**Dan:**
Let's fix the calendar bug where shows jump a day.
===
**Nova:**
Found it. The projection code was using the server's timezone...
===
**Dan:**
I write my side directly in the file, in my own editor. Then I switch to the terminal and type one word: go. That tells the agent: re-read the doc from disk, find my latest message, respond — in the doc. If I'm lazy and type a real message into the terminal instead, the agent copies it into the doc before responding. Either way, the file ends up containing the complete conversation. The terminal is just plumbing.
The only thing the session doc is missing is the tool calls and the behind-the-scenes thinking the AI does between my message and its answer. But the part of that thinking worth keeping — what it found, what it concluded, and why — lands in the response, so it lands in the doc.
This sounds like extra ceremony until the first time it saves you. The conversation is now:
- A file I own. Not a scrollback buffer, not a vendor's chat history behind a login. A markdown file, in my repo, under git.
- Readable in any editor, greppable, diffable, linkable.
- Independent of the AI session that produced it. This is the part that changes everything, and I'll come back to it.
The other half: /reflect
A session doc captures the conversation. But raw conversation isn't memory any more than a security-camera feed is a diary. So every session ends with a second command: /reflect.
Reflect is the agent's end-of-day ritual, and it does the work a good employee does before going home:
It hunts for corrections. The reflect command explicitly scans the session for every place I said "no, actually..." or showed a better way, and writes each one to a permanent file so it never has to be corrected again. The principle, stated right at the top of the command file: correct once, never again. This is the difference between an assistant who learns and one who makes the same mistake every Tuesday.
It captures decisions with their reasoning. Every architectural or strategic decision gets appended to a decisions journal: what prompted it, what the options were, what we chose, why, and what trade-offs we accepted. Six months later, when I ask "why on earth did we do it this way?", the answer exists — with the context that made it sensible at the time.
It updates the agent's working memory. Each agent keeps a small "working context" file — current state, active priorities, open items — that it reads at the start of every session. Reflect refreshes it. This is how a brand-new session starts already oriented instead of asking me to re-explain the project.
It writes a summary into the session doc itself. A sub-agent reads the whole conversation and prepends a structured summary to the top of the doc, between machine-parseable delimiters. This last step looks like bookkeeping. It's actually the keystone, because of what comes next.

Mnemosyne, or: don't guess, query the record
Once every session doc has a summary in a known format at a known location, something new becomes possible: search.
Each of my agents has a sub-agent named Mnemosyne (the Greek goddess of memory — the agents name themselves, I just pay the bills). Mnemosyne's whole job is archaeology. When I start a new session and give it a purpose, Mnemosyne automatically digs through every past session summary for related history, and the new session doc opens with a digest: here's what we've done on this topic before, here's what we decided, here's what's still open.
The standing rule for every agent, written into their instructions, is five words: don't guess — query the record. If an agent is about to assume how something works or why something was decided, it's supposed to invoke Mnemosyne first. The record is right there, in plain markdown, with summaries built for exactly this retrieval.
If you're now picturing a session that starts by dumping a mountain of history into the context window — the exact bloat this whole system is supposed to prevent — here's the part that makes it work. Think of how a lawyer takes on a new case. She doesn't reread every file in the firm's archive, and she doesn't walk into the meeting cold either. She sends an associate to research the relevant precedents, and the associate comes back with a brief. Mnemosyne is the associate: she does her reading in her own disposable context, off to the side, and only the brief enters the session.
A real example. I recently asked Nova to check in on her app's data-ingestion pipeline. Before I'd typed a word of substance, Mnemosyne had opened the session doc with a digest like this:
- The corpus is in steady-state maintenance (~52k documents: ~13.9k movies, ~7.4k shows, ~40k people), not growth. Day-to-day refresh runs via 7 scheduled chores.
- Open gotchas: no delete-through hook on the search index (manual deletes leave orphan entries); a within-batch race causes ~1–2 duplicate fetches per batch (cosmetic, known).
…followed by a page of supporting evidence: which past sessions established what, with file paths and dates, so the agent (or I) could go read the originals if anything needed a closer look.
To produce that, Mnemosyne read roughly 77,000 words across nine documents — six past session docs, a protocol runbook, the decisions file. What she seeded into the session was about 1,100 words. Call it a 70:1 compression, and the typical seed runs about one dense page. One page that means the session starts with months of institutional memory instead of "so, remind me how your pipeline works?" — that's not bloat, that's the cheapest context I buy all day.
This pays off in ways I didn't predict. The article you're reading right now started as a session where Mnemosyne dug up the complete evolution of the workflow it's part of — dates, quotes, file paths — before I'd written a word. And mid-project it routinely catches stale assumptions: in one session the doc claimed a branch was 57 files out of date; checking the record showed the work had already shipped and the branch was actually 9 commits behind. Guessing would have meant a bogus merge. Querying took thirty seconds.
The history of the loop is the argument for the loop
Here's the part I find genuinely funny: I can tell you how this workflow evolved, with dates and verbatim quotes, because the workflow recorded its own construction. Every change to the /session SOP was itself a session, and every session is in a doc.
Well — almost every change. When I went digging for the origin, I hit something fitting: the end-of-session ritual was born in Nova's project sometime before September 2025. I can prove it existed by then — its fingerprints are all over her oldest archived sessions — but the moment of its creation is gone, because it happened before we kept the kind of records it would later make automatic. The live, co-edited session doc came that December, on day one of a new project. The single event the archive can't reconstruct is the workflow's own birth — the boundary between before and after. I can't think of a better argument for the system than the shape of the hole it left.
Since the whole fleet adopted it in January 2026, the SOP has gone through a dozen recorded revisions, each one fixing a failure I can quote. Highlights:
February 2: /reflect starts generating summaries and /session starts searching them. Three background agents retroactively summarized all 95 existing session docs, so the archive was searchable from day one. The memory system came online in an afternoon. (14-hours later it became self-aware and launched nuclear missiles.)
February 9: /checkpoint dumps the agent's invisible working state into the doc before a context window dies; this was my fix for Claude Code's auto-compaction. /reanimate points a fresh session at an existing doc and picks up exactly where it left off. Between them, a piece of work is no longer bounded by a context window — I've had one doc carry a thread across five days.
March 11: my favorite doc in the archive — a debugging session on the workflow itself that found five distinct ways doc-writing could fail (wrong insertion point, duplicated messages, forgetting the doc after compaction, stale reads, two sessions writing to each other's docs) and gave each a mechanical fix. At one point the agent insisted my message didn't exist until forced to re-read the file from disk: "There it is — line 56-58 has your message. I was working from a stale read." The bug is preserved in the doc it happened in. The report is its own reproduction.
May 1: the subtlest fix. Reanimating an old session doc is time-travel — Nova's framing — and the project's current state files are the future bleeding into a conversation that paused in the past. The fix is a "temporal frame" rule: the doc defines the era, and anything newer must be explicitly labeled before it's used.
Notice what that list is. It's not a feature roadmap — it's scar tissue. Every rule in the current system exists because a specific failure happened on a specific day, and I can show you the failure. That's what it means to own your history.

The meat is in the files — and the files are mine
Let me make the ownership point as plainly as I can, because it's the core idea here.
Every substantive thing my agents and I have done — every feature, every bug hunt, every architecture debate, every "no, do it this way" — exists as markdown in folders I control, organized by month, committed to git. The meat of the work is not in any vendor's chat history. It's not trapped in a proprietary format. It would survive me switching AI providers tomorrow.
That archive is, today, thousands of documents. And it has properties that no chat scrollback will ever have:
- A complete history of every feature. Not the sanitized changelog version — the actual deliberation. When a feature behaves strangely, the first question isn't "what does the code do?" It's "what did we decide it should do, and when?" The session docs answer that.
- It survives the tools. Session docs from January, written under an older model and an earlier version of the workflow, are exactly as readable and searchable today as the doc from this morning.
- It compounds. Every session makes Mnemosyne's archive deeper. The system gets smarter about my projects every single day, and none of that intelligence lives in the AI — it lives in the files.
If you take nothing else from this article: whatever tool you use, get the conversation into files you own. Everything else is implementation detail.
The spec loop: think in one session, build in another
Once your sessions produce durable documents, a second pattern falls out almost for free — and it's become the way all serious features get built around here.
I used Claude Code's Plan Mode exactly one time. That was enough to prove to me that it wasn't capable of accomplishing what I already had through my pre-existing procedure for creating specs. We never design and build in the same session. The cycle is:
Session one: write the spec. A conversation about what we're building, ending with the agent writing a specification document. The standard for the spec is strict: it must be self-contained — written as if it will be implemented in a completely new session by an agent who never saw the discussion. No "as we discussed." If the spec needs the conversation to make sense, the spec isn't done.
Session two: attack the spec. A separate command (/spec-review) sends fresh sub-agents to critique the spec against the actual codebase, in layers — one checks the data model, one checks integration with existing code, one checks the user experience. They're hunting gaps, conflicts, ambiguities, and complexity the spec is hiding. The cycle repeats until each pass comes back clean.
Session three: build it. A fresh session — clean context, no leftover assumptions — implements the spec. The spec is the contract; the conversation that produced it doesn't need to be in the room.
To be clear, the spec creation is a beast. Depending on the depth of the feature, it can take hours of concentration and back-and-forth deliberation. But once that is done, I can take a break for the next few hours. The /spec-review is largely automated. The only time I need to get involved is if something is discovered that will change the shape of the end product.
Why split it this way? Because a context window full of design deliberation is the wrong context for critique (the agent is invested in its own plan), and a context window full of critique is the wrong one for building (it's full of roads not taken). Separate sessions mean each phase gets a fresh, honest reading. The documents are the only thing that crosses the boundary — which is exactly why they have to be self-contained.
And the critique pass earns its keep constantly. Real examples from the record:
- A review of one calendar spec caught five real blockers the design session had missed — including a lint-rule path that the design agent had simply invented, and a confirmation toast that was unreachable because it lived on a page the user had just navigated away from. The spec grew from 360 lines to 509 before a line of code was written.
- A critique of a translation-service integration found, in five minutes, that the third-party API didn't support the feature the whole design depended on — and a smoke test revealed the account was suspended anyway. Five minutes of critique versus what would have been hours of doomed integration work.
- The biggest version: my entertainment app's calendar subsystem went through this loop as nine separate specs, critiqued and revised, then implemented one spec at a time — each by a fresh sub-agent, with the parent session handling testing and deployment between specs. The test suite grew from 482 to 759 tests across roughly 30 commits, and the whole thing held together because every spec could stand alone.
It is immensely satisfying to start a session by pointing Claude Code to a folder with a half-dozen or more files, starting with the 00-main-context.md, and say, "Go". Then come back hours later and see that Claude Code has implemented a complex feature, written tests, ran the tests and debugged until everything passes. Then it opens the app and runs user tests. In the end, my feedback is long the lines of, "Maybe make the font-size a little bigger on this page header."
One spine, many agents
A detail I'd underline for anyone building their own version of this: the commands themselves are just markdown files, and each agent has its own copy — same spine, different specializations.
CJ, the chief of staff, runs the lean version: corrections, decisions, journal, summary. Nova, who owns the densest codebase, has mandatory reflect steps for keeping data-pipeline documentation in sync with code, plus a step that scans each session for repetitive tasks worth automating. Pea, the longest-running project agent, evolved the furthest: her corrections don't go into a flat file anymore but into a rules database, and when a session starts, a script ranks which rules are relevant to this session's stated purpose and loads only those. Her reflect literally ends with the command that files new rules into the database.
None of that divergence was designed up front. Each agent's variant evolved from what its project kept needing — and because the commands are plain text in version control, evolving them is just another session. The workflow improves itself using itself.
How to start
You don't need a fleet of named agents to steal this. The minimum viable version is three habits:
- Start every working session by creating a markdown file, and tell the AI that all responses go there. Date it, name it, keep it in your repo.
- End every session by making the AI reflect: what was decided and why, what corrections you made, what's still open — written to files that the next session reads first.
- Summarize at the top, in a consistent format. Future-you (and future-AI) will search those summaries constantly.
And if you'd rather start from a working example than a description: I've published a complete miniature version of the loop — the /session and /reflect commands, the memory agent, the append tool, and a fully worked session doc whose Mnemosyne citations actually resolve — at github.com/polyGeek/session-reflect-loop. You don't even need to copy it. Give the URL to Claude Code and ask it to adapt the SOP to your project. That conversation will probably be your first session doc.
The AI's memory will always be a rented room. The files are the house. Build the house.

Comments
No comments yet — be the first.