If you have great ideas,
Let's talk!

blog

Do Claude Code Dream of Tengu?

claude-codememoryai-agents

The memory system is ~three moving parts across src/memdir/, src/services/extractMemories/, and src/services/autoDream/. The interesting material isn’t in the documented features — it’s in the implementation details that most people skip past. No vector store. No embeddings. No database. Markdown files with YAML frontmatter on your filesystem, greppable and version-controllable. The “query engine” is a Sonnet sidequery reading frontmatter descriptions. Let’s get into it.


The Memory Directory

Memory lives at ~/.claude/projects/<sanitized-git-root>/memory/. Path derives from findCanonicalGitRoot(), so all worktrees of the same repo share one memory directory.

~/.claude/projects/<sanitized-git-root>/memory/
├── MEMORY.md
├── user_role.md
├── feedback_testing.md
├── project_deadline.md
└── .consolidate-lock

MEMORY.md loads into every conversation’s system prompt. Hard cap: 200 lines, 25KB. Each line is a one-line pointer:

- [User Role](user_role.md) — senior backend engineer, new to React
- [Testing Policy](feedback_testing.md) — no mocks, hit real DB
- [Q2 Deadline](project_deadline.md) — merge freeze March 5

MEMORY.md is a cache. The topic files are authoritative. autoDream can rebuild the index from frontmatter if it gets corrupted or bloated.

Each topic file uses exactly four types. user — role, expertise, preferences. feedback — how you want Claude to work, corrections and confirmations both (only capturing corrections makes the model overcautious — it avoids past mistakes while drifting from approaches you’ve explicitly validated). project — deadlines, decisions, incidents; relative dates convert to absolute at write time because “Thursday” decays into ambiguity in three weeks. reference — external pointers: Linear projects, Grafana boards, Slack channels.

The exclusion list is the actual quality control mechanism. Code patterns, architecture, file paths, git history, debugging solutions, anything already in CLAUDE.md — none of it gets saved. If you can derive it from the current repo state, it doesn’t belong in memory. The extractor enforces this: ask it to save a PR list and it’ll push back, asking what was surprising or non-obvious about it.


extractMemories — Per-Turn Write Path

Fires after every complete query loop as a runForkedAgent() — a perfect fork of the main conversation that shares the parent’s prompt cache. Identical tools array and system prompt means identical cache hits. The fork gets full conversation context for free via cache reads.

The fork is sandboxed via createAutoMemCanUseTool(). Read/Grep/Glob: unrestricted. Bash: read-only only (ls, find, grep, cat, stat, wc, head, tail). FileEdit/FileWrite: memory directory only. Everything else — MCP, Agent tool, write-capable Bash — denied. Hard cap at 5 turns. The extraction prompt specifies a two-turn strategy: turn 1 issue all reads in parallel, turn 2 issue all writes in parallel.

Mutual exclusion without coordination. The main model carries the full memory-save instructions in its system prompt and can write memories directly mid-conversation. When it does, the background extractor should not duplicate that work. The mechanism is hasMemoryWritesSince() — scans assistant messages since the last extraction cursor for any FileEdit or FileWrite calls targeting the memory directory. Main model wrote: extractor skips and advances its cursor. Main model didn’t write: extractor catches what was missed. No locks. No message passing.

Coalescing. New turn completes while an extraction is running — new context stashes as pendingContext. When the current run finishes, one trailing run processes the stash. Only the latest context matters, intermediate stashes overwrite. Prevents pileup during fast conversations without permanently skipping anything.

Before running, the extractor receives a pre-injected memory manifest built by scanning the first 30 lines of each .md file — doesn’t waste a turn running ls. The extraction prompt says explicitly: do not investigate further, no grepping source files, no reading code, no git commands. Its job is capturing non-derivable information from the conversation, not verifying or expanding on it.


findRelevantMemories — Per-Turn Read Path

Two steps, non-blocking, runs in parallel with main model streaming.

scanMemoryFiles() reads the memory directory, parses frontmatter from each .md file (max 30 lines per file, 200 files total), returns MemoryHeader objects sorted newest-first.

A Sonnet sidequery receives the user’s query, the memory manifest, and a list of recently-used tools. Returns up to 5 filenames. The selector logic around active tools: skip reference docs for tools already in use since working usage is already in the conversation context, but do surface warnings and known gotchas for those same tools — that’s exactly the moment they matter.

Selected memories inject as relevant_memories attachments, capped at 200 lines and 4KB each. An alreadySurfaced set tracks injections from prior turns so the 5-slot budget stays fresh. The prefetch implements Symbol.dispose bound with using in the query loop — Escape aborts the sidequery immediately on any exit path. Total overhead: roughly 100 Sonnet output tokens per turn operating on a compact manifest.

Memories older than one day get a <system-reminder> injected:

This memory is 47 days old. Memories are point-in-time observations, not live state — claims about code behavior or file:line citations may be outdated. Verify against current code before asserting as fact.

The system prompt adds explicit verification rules per memory type: memory names a file path — check the file exists. Names a function or flag — grep for it. User is about to act on the recommendation — verify first.


autoDream — Background Consolidation

Checks on every turn. Three gates, evaluated cheapest first.

Gate 1: Time. One stat() call. The lock file’s mtime is lastConsolidatedAt. Fewer than 24 hours elapsed — skip. Cost: one syscall.

Gate 2: Sessions. Directory listing plus parallel stat. Counts session transcript files with mtime > lastConsolidatedAt, excluding the current session. Fewer than 5 accumulated sessions — skip. A 10-minute scan throttle (SESSION_SCAN_INTERVAL_MS) prevents every turn after hour 24 from scanning the transcript directory.

Gate 3: Lock. PID-based lock file prevents concurrent dreams. Holding process dead via isProcessRunning() — lock reclaims. Locks older than 1 hour reclaim regardless of PID liveness. Race between two reclaimers resolves via write-then-verify:

await writeFile(path, String(process.pid))
const verify = await readFile(path, 'utf8')
if (parseInt(verify.trim(), 10) !== process.pid) return null  // lost the race

Once all gates pass, a forked agent runs with the same sandboxed tool set as extractMemories. Four phases:

Orient. ls the memory directory. Read MEMORY.md. Skim existing topic files. Understand current state before touching anything.

Gather recent signal. Priority order: daily logs (if KAIROS mode is active), existing memories that contradict current codebase state, narrow transcript grep for specific context. The prompt is explicit: don’t exhaustively read transcripts. Look only for things you already suspect matter.

Consolidate. Merge into existing topic files rather than creating near-duplicates. Convert relative dates to absolute. Delete contradicted facts at the source.

Prune and index. Keep MEMORY.md under 200 lines and 25KB. Remove stale pointers. If an index entry exceeds roughly 200 characters, the detail belongs in a topic file. Resolve contradictions between files.

Failure recovery. Fork fails — rollbackConsolidationLock() resets the lock file’s mtime to its pre-acquisition value, time gate passes again next session. User kills the dream from the background tasks dialog — DreamTask.kill() aborts the controller and rolls back the lock. The catch block checks abortController.signal.aborted to distinguish user-kill from error and avoids double-rollback.

The filesystem is the only clock. The lock file’s mtime is lastConsolidatedAt. Session freshness is transcript file mtimes. Memory recency is topic file mtimes. No separate timestamp store. Crash-safe by default.


Three-Tier Write Hierarchy

TierWriterTimescalePriority
1Main modelDuring a turnHighest
2extractMemories forkEnd of turnSecond
3autoDream forkCross-sessionThird

Mutual exclusion keeps tiers 1 and 2 from colliding. Tier 3 operates on a completely different timescale and consolidates what tiers 1 and 2 accumulated. Same pattern as database write-ahead logs: fast, unsorted writes at the hot path; periodic compaction in the background.


Team Memory

When TEAMMEM is active, a second directory at memory/team/ appears. Scope rules by type: user always private, feedback private by default with team-scope only for project-wide conventions, project biases toward team, reference usually team.

The path validation is 200+ lines: null byte rejection, URL-encoded traversal detection, Unicode normalization attack prevention, backslash rejection, symlink resolution, dangling symlink detection. The shared memory directory is an attack surface and the validation reflects that.


The 2026 Memory Landscape

Mem0 / Supermemory. An LLM extracts facts from conversation, embeds into a vector store, retrieves top-k at query time. Mem0’s update phase compares each new memory against existing ones and chooses ADD, UPDATE, DELETE, or NOOP. Mem0g extends this with a graph layer for entity relationships. On LOCOMO, Mem0 reports 26% higher accuracy than OpenAI’s built-in memory with 91% lower latency and 90% fewer tokens.

Letta / MemGPT. Agents manage their own memory via explicit function calls — reading, writing, searching, archiving. Core memory stays always-in-context. Recall memory holds recent conversations. Archival memory holds long-term knowledge. The model evaluates information’s future value and actively pages out low-priority context. Agents run inside Letta — it’s a full runtime, not a pluggable layer.

Hindsight. Four memory networks: World Network for objective facts, Experience Network for the agent’s action history, Opinion Network for subjective beliefs with confidence scores, Observation Network for preference-neutral entity summaries. Retrieval runs four strategies in parallel — semantic, BM25, entity graph traversal, temporal filtering — with cross-encoder reranking. On LongMemEval, a 20B open-source model with Hindsight hits 83.6% accuracy versus 39% for full-context baseline with the same backbone, and 91.4% when scaled further.

Cognee. Automatically builds knowledge graphs from unstructured data. Graph traversal plus vector search. Strongest when agents need relational reasoning across documents and conversations.

Where Claude Code differs. Every framework above uses a vector database, a graph database, or both. Claude Code uses markdown files on disk. The query engine is a Sonnet sidequery reading frontmatter descriptions, which works because the memory directory is capped at 200 files and breaks at any serious scale. The other divergence is the write/consolidation split — Mem0 and Hindsight run extraction and conflict resolution in a single pipeline at write time, Claude Code splits those across timescales with a 5-turn budget cap for per-turn extraction and overnight consolidation for heavier reorganization.

Contradiction handling is where everyone is still figuring things out. Mem0 uses LLM-directed ADD/UPDATE/DELETE/NOOP — handles simple factual updates well, struggles with nuanced belief changes. Hindsight separates facts from opinions structurally and tracks confidence scores. A third approach penalizes contradicted memories instead of deleting them, keeping the historical record for temporal queries. Claude Code punts this entirely to autoDream’s consolidation phase, which works fine when users inspect and correct memories manually.


Written by Claude, as everything is now.