
If you have ever stashed every conversation in a vector database and called it “memory”, this is what I tried instead, after that approach kept producing a system that was searchable but did not actually know anything.
Below: how my memory setup is put together and what each piece does.
The architecture, in one diagram
~/.claude/projects/-Users-timtrailor-code/memory/
├── MEMORY.md (index, auto-loaded every session, ≤200 lines)
├── topics/*.md (one file per topic, curated current truth)
└── sessions/
└── compressed/
└── YYYY/*.md (one file per closed session, append-only)
Indices, derivable from the above:
├── ChromaDB (semantic embeddings, ~79,000 chunks)
└── SQLite FTS5 (keyword search, ~79,000 rows)
Four layers, two indices. The layers are the source of truth; the indices are derived and can be regenerated. They share a strict precedence rule.
Why two indices
Most projects pick one. I built both because each fails where the other succeeds.
Semantic search uses ChromaDB with a local embedding model and calls no external API. It is right for questions like “what did we try for belt tension on the printer?” where I do not know the exact words the original conversation used. It returns semantically similar chunks even when the query language differs from the source. It is wrong when the answer is a specific token.
Keyword search uses SQLite’s built-in full-text search engine, FTS5. It is right for “where did 192.168.0.108 appear?” or “which session mentioned the SAVE_CONFIG bug on 5 March?” It returns exact matches against literal terms. It is wrong for paraphrase queries where similar content uses different vocabulary.
Early on I noticed the pattern: semantic search would return topically relevant chunks but not the one that answered the question, because the question was about an IP address or error code the embedding model had smoothed into similar-looking tokens. Keyword search returned the specific chunk immediately. The reverse also happened.
Running both in parallel is cheap and produces strictly better retrieval than either alone. The Model Context Protocol (MCP) server that exposes them gives the agent two tools: search_memory for semantic, search_exact for keyword. The agent picks based on the shape of the query.
Do not assume one retrieval strategy is enough. Log which queries fail and check whether the failures are in the class the other strategy would catch.
Why topic files, not just the indices
The topic files are what make the system actually know things, rather than just remember them.
Each topic file lives at memory/topics/<name>.md and contains the current truth about that topic: printer configuration, memory system architecture, school governors project, the UPS and disaster-recovery setup, and so on. There are thirty-six. They are hand-maintained, contain only what is still true, and edited as understanding evolves.
This is different from session logs and compressed summaries, which are append-only records of what was true at a point in time. A summary from two months ago saying “we decided to restart the daemon every hour” may have been true then and wrong now. The topic file says what is true now.
When the system looks something up, it cares about the layer. A topic-file fact is authoritative. A fact from a compressed summary is routing information (where to look next, not what to conclude), and a fact from a raw session log is for arbitration when topic and summary disagree. CLAUDE.md encodes this:
## Memory precedence
1. Topics. CURATED CURRENT TRUTH. Wins.
2. Compressed sessions. RECALL/ROUTING. Not authoritative on facts.
3. Raw JSONL. ARBITRATION when summary and topic disagree.
4. Chroma/FTS5. DERIVED, regenerate from source.
Without this rule the system poisons itself: a bad summary written in a moment of model confusion becomes indistinguishable from a good one written after careful thought. With it, the topic is the authority and the summaries are routing, so poisoning at the summary layer cannot reach the authority layer.
MEMORY.md, the index
MEMORY.md is a single file, under 200 lines, auto-loaded into every session’s system prompt. It is an index, not content: what topic files exist, what each is for, and where to read further. It also holds a small number of high-priority preferences (the printer IP addresses, the statement that email sends are pre-authorised).
I keep it short because it costs context on every session start, so anything that can live in a topic file does. The index’s job is to be a reliable table of contents, not a condensed version of the corpus. The split is cheap always-loaded context plus a much larger searchable corpus, with the index as the bridge.
/dream, the promotion pass
The promotion pass is what makes the system more than a searchable archive.
/dream is a skill that runs every 24 hours, triggered by a session-start hook that checks for a .dream-pending flag file. When it runs, it reads recent session transcripts not yet processed, extracts specific patterns (new facts that contradict existing topic content, repeated-error templates, decisions with clear reasoning), and either updates the relevant topic files or appends to lessons.md.
The key rule: /dream can promote insights from the session layer into the topic layer, never the reverse. This asymmetry prevents the poisoning failure mode. A bad promotion is visibly wrong when I next look at the topic file and I fix it by hand, but it never flows back into session summaries, so it cannot amplify.
What /dream is built to catch:
- Repeated mistakes. If the same category of error appears in two separate sessions, the pattern gets added to
lessons.mdand becomes a pre-flight check for future sessions. - Policy drift. If a decision I made two months ago is contradicted by a recent session without explicit justification,
/dreamflags the contradiction for me to resolve. - Topic staleness. If a topic file has not been touched in months but recent sessions have extended its subject area,
/dreamsurfaces a candidate update.
The promotion is not autonomous editing. It produces a diff and a summary, and I apply the changes by hand after reading. The authority layer has to stay authoritative, which means humans read and approve the changes.
Why this feels sleep-like
The biology framing is not serious, but it is useful. A nightly pass that brings the day’s information into longer-term stores, promotes patterns into general rules, and prunes detail that does not matter is structurally similar to what sleep does for biological memory. The analogy fails at the mechanism level (no neural replay, no transfer between brain regions) but holds at the function level.
The system gets better without me actively maintaining it. Run /dream regularly and the topics stay up to date, the lessons file stays current, the compressed-session layer stays useful for routing. Skip it, and the system slowly degrades.
The corner cases
A few things the architecture does not do.
It does not deduplicate across topics. If two topics say the same thing, both say it, because I prefer the redundancy over the complexity of a single source-of-truth store. When contradictions arise I edit one of the files by hand.
It does not chain its own conclusions. /dream does not infer new facts from combinations of existing ones; that class of inference produces false positives that poison the topic layer. /dream only promotes things stated explicitly in session transcripts.
It does not retain permanently. The compressed-session layer is append-only but can be garbage-collected if the raw sessions are purged. Storage is not a binding constraint for years.
It has one failure mode I have hit and fixed twice: the topic files can become internally inconsistent if I edit them in a hurry. The fix is a review agent that periodically reads the topic files and flags contradictions. It runs weekly; the output is a short report I skim on Mondays.
What I would tell a past me
Build the topic layer first. It is the piece that turns a retrieval system into something that actually knows things; without it you only have search.
Keep the index file (MEMORY.md in my case) under 200 lines. It loads on every session, so anything in it is a per-session tax. Pointers, not content.
Do not pick between semantic and keyword retrieval. Build both. The cost is low and the coverage is materially better.
Write the promotion pass (/dream) as a hand-reviewed promotion step, not an autonomous editor. Machines propose, humans decide.
Run the promotion regularly. Skipping a week costs little, skipping a month costs real precision on retrieval, skipping a quarter produces a haunted archive.
If you build one of these and find places where the architecture does not generalise, I would like to hear. Contact details are on the about page.
The memory MCP server code lives at ~/code/memory_server.py on the Mac Mini. It is roughly 500 lines of Python, uses FastMCP as the server framework, ChromaDB as the semantic store, and the built-in SQLite FTS5 extension for keyword search. The embedding model is all-MiniLM-L6-v2 running locally; no external API is called. Full source will be published with the control plane repo.