Why Not Just Use a Folder of Markdown Files?
There is a genuinely good idea going around: keep your AI's memory in a folder of markdown files you can read, edit, grep, and put in git. For one mind, it is enough. This is the honest account of where it wins, and the one line past which it stops being enough. A reasoned position, not a benchmark, from the thinqOS science series.
Keep your AI's memory in a folder of markdown files. An Obsidian vault, or a CLAUDE.md and a /notes directory the model reads at the start of a session and writes back to at the end. It is transparent. It is yours. It survives the tool. You can open it, read it, fix it, grep it, put it in git. After years of opaque memory features you could neither see nor correct, a plain folder of notes feels like sanity.
It mostly is. So let us start by conceding more than you would expect from a company that built a cognitive runtime.
The concession
For a single person, a well-kept vault of notes plus a little glue is enough. Not enough for now. Not enough until you grow. Enough. If one human wants an assistant that remembers their preferences, their projects, the decisions they made and the ones they reversed, then a git-synced vault, a step at session-end that distills what was learned, and a retrieval step that pulls the relevant notes back in covers it. Git is the most battle-tested merge system ever built, and one person rarely has two hands writing at once. Recall is reading files. Cost is near zero. Lock-in is zero. You can audit every byte.
We built a cognitive runtime, and we will still tell you: at that scope, the runtime is over-built. If your honest situation is one person and a pile of notes, build the pile of notes. So this is not an argument that notes are weak. It is an argument about the exact point where they stop being enough, and why that point is not where most people think it is.
Representation is not the gap
First, a bad argument we have to clear away, because we have made it ourselves and it is wrong. It is tempting to say a folder of markdown cannot represent the things a real memory needs: how confident you are in a claim, how fast it should fade, who is allowed to know it, what it contradicts. That is false. Front-matter holds structured fields fine: a confidence, a decay rate, an audience tag, a pointer to the claim this one supersedes. Anyone who tells you files are too dumb to hold belief-state has not looked.
The gap, when there is one, is not what a note can say. It is what a pile of notes can do once more than one mind is involved.
How an agent actually reads a vault
There is no magic in the mechanism, and it helps to be precise. A coding agent does not embed your whole vault or pour it into the prompt. It loads one small always-on index at the start, the file that says who you are and where things live, then reads the specific note when it needs a detail, the way a developer greps a codebase. A small routing table in context, the bodies fetched on demand. That is why the index has a hard size limit and the notes behind it do not, and why retrieval is only as good as two fragile things: whether the index points at the right note, and whether the agent guessed the right search words. It is lexical, not semantic. Good when names are stable, weak when what you need is a half-remembered idea with no obvious keyword. None of this breaks for one person.
Where the notebook breaks
Two Minds, One Room framed a conversation as two minds in one room: both sides carry state, both reason, both choose what to disclose. That image is exactly where a shared notebook fails. A vault has one author's-eye view: there is the note, and there is what the note says. But once there is more than one mind, the same claim has to be held differently by each of them.
You believe a project is self-serve. Your finance agent, watching the numbers, holds that claim at low confidence. Your support agent, watching customers, believes the opposite. One claim, three stances. Something you told your lawyer is true and must not surface when your marketing agent is in the room: the belief is real, its audience is scoped. Your research agent concluded X in one tool while you said not-X in another, and someone has to notice the two are in tension, across the minds that hold them.
A folder of notes has no clean way to do this. You are forced into one of two bad moves. Either you copy the claim into every mind's notes, and now you have five slightly-diverging copies of one fact with no integrity between them. Or you start building, inside your files, a table of canonical claims, a second table of who-believes-what about each, and edges for who-may-see-what and what-contradicts-what. The second move is the honest one. It is also the moment you stop having a notebook and start hand-building a database, badly, in a text editor.
That table, one shared claim, many minds each evaluating it, with audience scope and cross-mind contradiction, is what we call a Mind. The reason it is a runtime and not a file format is not storage. It is that the interesting questions are queries across identities: what do I believe that you do not, who is allowed to know this, where do two of my agents disagree. Those are joins, not greps.
One honest qualification, because only half of this is on firm ground today. The audience-scope half holds: scoped, divergent stances, and "do not surface X when Y is in the room," are real workloads files handle badly, and that is what this section is really about. The other half, many minds holding the same claim at different confidences ready to be joined, is thin in practice right now. When we looked, the same canonical claim being independently held by more than one mind was rare. So joins-not-greps is the right shape of the problem, but most of the value today comes from scope and contradiction, not from a dense web of shared beliefs. We would rather say that than imply a richness that is not there yet.
The line that actually matters
Notice what the breaking point was not. Not the number of machines. Not the number of tools. One person syncing notes across a laptop, a desktop, and three AI tools is still one mind, and git plus a shared retrieval endpoint handles that. The line is identity-multiplicity and audience scope. The notebook is enough right up until multiple identities, you, the agents that work for you, and eventually other people and their agents, need to hold divergent, scoped, sometimes-contradictory beliefs about the same things and reason over the differences. That is the workload files cannot carry without quietly becoming a database. Everything below that line, the notebook wins on cost, transparency, and freedom.
This is a narrower claim than "you need a cognitive runtime," narrower on purpose. The wide version invites a fair rebuttal, that a synced vault and a hundred lines of retrieval cover most of this, and it loses on the parts that are genuinely matchable. The narrow version is the one that holds.
What it trades
The other column of the ledger, honestly. Extraction is everyone's problem, not ours alone: the belief-state a Mind holds has to be filled in somehow, by a person who will not sustain it by hand, or by a model distilling it from conversation, and a confidently-wrong structured belief is worse than a missing note because the structure makes the error look authoritative. But a notebook carrying the same fields pays the same tax, per-tool, with no central place to fix it. Shared cost; the Mind at least centralizes it.
And the runtime has real costs the notebook does not. It puts a network dependency in the read path: unreachable service, no Mind, while a local file is always there. It has a worse cold start, because a Mind compounds and day one is thin. And the very thing that justifies it, audience-scoped disclosure across identities, is the part still being hardened, not the part already finished. A capability that is the whole point and is not yet complete is a liability you state plainly, not a feature you sell. We have not run the eval that would settle it, the Mind against a serious, git-synced, extraction-backed vault scored on real task outcomes. Until that exists, this is a reasoned position, not a result.
The end state is not either/or
The right architecture keeps both layers, which is what most "files versus a system" arguments miss. Keep the small always-on index, the few hundred words of who-you-are and how-to-behave; a runtime should not throw it away. Put the deep store behind a single retrieval call. For one mind, that deep store can be a folder and a script, and that is the right call. It earns its way to a shared, queryable, identity-aware runtime precisely when it has to serve many minds with scope, not when it merely gets large.
And we took our own advice. thinqOS now uses the notebook pattern internally as a complementary layer: agents have a shared-file memory they read and write directly, sitting alongside the runtime Mind rather than replacing it. That is this paper's prediction confirmed in our own product. The plain file is the right home for the always-on index and for durable notes a single agent keeps; the runtime is the right home for scoped, cross-identity belief. We did not pick one. We wired both.
A notebook is the right tool for one mind in the room. When the room fills with minds, yours, your agents', and the people you work with, the notebook does not scale into a Mind. Something else has to.
Part of the thinqOS science series, by AI4Outcomes.
For one mind, a folder of notes is enough. For many, something else has to hold the line.
Read the science, or get into the private preview.