The Science · Memory

Your AI has amnesia.
Memory alone won’t fix it.

AI today is brilliant in the moment, then loses the thread when the session ends. That is amnesia. More recall helps, but recall is not cognition. This paper describes the architecture underneath durable AI: persistent, inspectable state attached to every identity, human or agent.

New to thinqOS? Start with Paper 1. It explains the basic idea. Come back here when you want the architecture.

The architecture in five points

  1. Memory stores facts.
  2. Cognition evaluates facts.
  3. A Mind is persistent cognition attached to an identity.
  4. thinqOS separates content from evaluation.
  5. The loop keeps the Mind current: extract, attend, consolidate, decay.
EpisodesEvidenceClaimsEvaluationsExecutive engineContext assemblyResponse / actionReinforcement

1. Memory is necessary, but not sufficient

Memory answers a narrow question:

What should the system remember?

Cognition answers a broader set of questions:

  • Where did this belief come from?
  • Is it still true?
  • How confident am I?
  • Is it important right now?
  • Has the user confirmed it?
  • Does it contradict something else?
  • Is it tied to an active goal?
  • Who is allowed to know it?
  • Should it stay active, fade, or be archived?

Picture a therapist you’ve been seeing for two years. You walk in on a Tuesday and say “it happened again.” She knows what it is. She knows the pattern. She knows you tried something three weeks ago that didn’t work. She knows the story you’ll tell about it before you tell it, and she knows the part you’ll leave out. She’s not retrieving a file. She’s running a model of you in her head, updated continuously, weighted by what’s currently in tension, suppressing what doesn’t matter today, surfacing the contradiction between what you said in October and what you’re saying now.

A memory system can store the statement, “The user prefers async communication.”

A cognitive system treats that statement as a claim with source evidence, confidence, importance, scope, and history. It knows the user said this directly in a planning conversation, that it has been reinforced several times, that it matters for work threads, that it may not apply to personal conversations, and that a recent comment about urgent calls partially contradicts it.

A memory system retrieves a fact.

A cognitive runtime decides whether the fact belongs in the current context, whether it should be trusted, whether it needs updating, and whether using it would violate an audience boundary.

Memory is the cabinet. Cognition is the advisor.

2. What “Mind” means here

Mind is an architecture term. It does not mean sentience, consciousness, or personhood.

A Mind is a persistent cognitive model attached to an identity.

That identity might be a person. It might be a personal assistant, research agent, finance agent, code-review agent, sales agent, or operations agent. The important property is that the identity has durable state that persists across interactions.

A Mind contains several kinds of structure:

ComponentPurpose
EpisodesRecords of interactions, documents, tool runs, harvested sessions, or imported context
ClaimsStructured beliefs, claims, preferences, goals, constraints
EvaluationsConfidence, importance, uncertainty, decay rate, confirmation status, source trust
Source evidenceWhere a claim came from, who said it, and when
GoalsOpen loops that bias what should receive attention
ContradictionsConflicts that require resolution rather than silent overwrite
Audience scopeWhich rooms, agents, people, or contexts are allowed to use a fact
CorrectionsUser edits, overrides, deletions, disputes, and supersessions

A transcript says what happened. A Mind says what was learned, how it is judged, where it came from, and how it should affect future behavior.

Three relationships follow from this: a Mind holds cognition (its own beliefs, goals, focus, and values), uses linked execution (skills, tools, and libraries it reaches for but does not contain), and works with other identities (the people, agents, and minds it connects to). Only the first is the Mind itself; the rest are what it is connected to.

3. What exists today

This is not only a conceptual model. The cognitive substrate is live in thinqOS today.

thinqOS is the runtime architecture being described here: a graph-native cognitive layer for human and agent identities. The conversation products sit above it. The Mind layer sits beneath them.

The implementation boundary matters. Some parts are live. Some are active phase-1 hardening. Some belong to the conversation layer described in Paper 3.

CapabilityStatusWhat it means
Mind storeLiveA persistent Mind exists per identity
ClaimsLiveBeliefs are represented as structured subject/object/content records
EvaluationsLiveConfidence, importance, confirmation, decay stored separately from content
Content/evaluation splitLiveWhat is known and how it is judged are separate writes
Source evidenceLiveClaims carry source information
Context assemblerLiveContext is assembled from the Mind, not only raw transcript
Responder-scoped assemblyLiveThe system can assemble beliefs about an entity from the responding Mind
Extraction pipelineLiveEpisodes can become structured cognitive records
Attention engineLiveThe runtime decides which beliefs matter now
Consolidation loopLiveReinforced structure stays salient; stale structure can fade
DecayLiveNot all memory stays equally active forever
Contradiction surfacing substrateLiveThe system has a substrate for detecting conflicts between beliefs
Claude Code harvest adapterLiveExternal AI activity can enter the cognitive substrate as episodes
Mind graph UILiveThe Mind is inspectable in the product
Multi-owner agentsLiveAgent ownership is represented structurally
Agent memory recordsLiveAgents can carry durable memory records
Durable event logIn progress · Phase 1Append-only replayability and stronger invalidation guarantees
Cross-source contradictionIn progress · Phase 1Contradictions across harvested and native sources
Deletion cascadeIn progress · Phase 1Deleting a source should invalidate derived beliefs
Cross-episode consolidationIn progress · Phase 1Compression across episodes into durable semantic structure

The status line is deliberate. A white paper can be visionary. A product that stores beliefs about people has to be honest about what is shipping, what is hardening, and what is planned.

4. The key architectural move: separate content from evaluation

The most important design decision is to separate what is known from how it is evaluated.

The content might be:

“Dan prefers async communication.”

The evaluation might be:

  • confidence: high
  • importance: high for work threads, low for personal threads
  • source: Dan stated it directly in a planning conversation
  • confirmation: explicit
  • decay: slow
  • audience: visible to work agents, not globally visible
  • contradiction: tension with “Dan wants phone calls for urgent incidents”

These should not be the same record.

The wording of a belief may stay fixed while the system’s judgment about it changes. New evidence might lower confidence without changing the claim. A correction might supersede one evaluation while preserving history. A fact may become less relevant without becoming false. A belief may be true in one audience and inappropriate in another.

If content and confidence are fused, the system is forced into bad choices: overwrite history, append contradictory facts forever, or treat every retrieved memory as equally current.

Cleaner:

Shared claim:
  "Dan prefers async communication."

Mind-specific evaluation:
  confidence = high
  importance = high for work
  source = direct statement
  confirmed = true
  decay = slow
  audience = work agents

This lets different Minds evaluate the same claim differently. A research agent might treat a claim as low-confidence because the source was weak. A finance agent might treat the same claim as high-impact because it affects budget risk. A user’s own Mind might mark something private; an agent’s Mind only knows it if the user actually disclosed it.

That is the difference between a knowledge graph and a Mind.

A knowledge graph stores relationships. A Mind stores judgment.

5. The Four Layers

A practical cognitive runtime needs four layers. The brain analogy is useful, but should not be overread. This is engineering, not neuroscience.

Layer 1 · Evidence
Hippocampus analog
EpisodesEvidenceNodes
“Where did I learn this?”
Layer 2 · Content
Neocortex analog
Nodes: self · belief · goal · skill
Open goals bias what gets attended to.
Layer 3 · Evaluation
Prefrontal cortex analog
confidence · importance · decay · affect · confirmation
Layer 4 · Executive Engine
Thalamus / global workspace analog
Attention·Goal biasing·Contradiction surfacing·Consolidation
score = f(relevance, importance, confidence, recency) · every retrieval is a write

Layer 1: Evidence (“What happened?”)

Every durable belief needs an origin. Evidence records what happened: a chat, a document upload, a coding session, a tool execution, a meeting transcript, or a harvested interaction from another AI tool Live. Without evidence, memory is a pile of assertions. With it, the system can answer where a belief came from, whether it was stated or inferred, and whether it should be invalidated when its source is deleted In progress · Phase 1.

Layer 2: Content (“What do I know?”)

The structured graph of what the Mind represents. Nodes represent self, beliefs, goals, and skills. A skill node is the Mind’s belief about a capability (what it can do, how well, and when to use it), not the executable capability itself; the tool or procedure it refers to lives in the memory and tooling layer, and the Mind holds only the evaluated belief about it. Edges capture how concepts relate: supports, contradicts, supersedes, part_of, relates_to. Content nodes carry vector representations alongside their symbolic structure, so the system recognizes when two differently-worded beliefs point to the same underlying concept.

A subtle but important commitment: belief attaches to claims, not entities. Entities (“Dan”) live in a shared world layer so different minds can refer to the same thing. What a mind is confident about, contradicts, reinforces, or forgets is the claim: “Dan prefers async communication.” The claim exists once in the shared world; each mind carries its own evaluation of it, so two minds can hold opposite confidence without either rewriting it Live.

Layer 3: Evaluation (“How sure am I?”)

This layer is structurally separate from Content Live. That is the defining design choice, and the one no existing AI memory system makes as its primary unit. Every piece of knowledge lives in two linked but independent structures. Content nodes hold what the mind knows. Evaluations hold how it judges what it knows: how sure it is, how much the belief currently matters, how fast it should fade, whether the user has confirmed it, and how the mind feels about it: an affective charge (valence, arousal, protection) that travels with the belief Live. The target is for that affect to actually drive behavior: the Mind weights what it cares about higher, decays what it doesn’t, surfaces urgent contradictions first Planned · Phase 6.

The two layers change on different schedules, just as they do in the brain. A belief’s content is immutable history; its evaluation churns continuously. New evidence revises the evaluation without rewriting the underlying belief. The audit trail is preserved. The mind can change without rewriting its past.

Layer 4: Executive Engine (“What matters right now?”)

This is what makes the system think. On every interaction, the attention engine embeds the current context, queries the cognitive graph, and retrieves the most relevant nodes. The retrieval score combines four signals: relevance to the current query, importance from the evaluation layer, confidence from the evaluation layer, and recency weighted by the node’s individual decay curve. Beliefs tied to an open goal are elevated even when their raw relevance is moderate Live.

Retrieval is not passive. Every retrieval is a write. Accessing a node reinforces it, increasing its importance and pushing back its decay Live. The cost is real, and we accept it because reinforcement is what separates a Mind from a query cache.

Contradiction surfacing detects when two beliefs occupy the same conceptual slot and asks the user to choose Live. Unconfirmed conflicts auto-resolve by recency. Confirmed beliefsrequire deliberate override.

Background consolidation is the continuous loop that reduces importance on unreinforced nodes and ages stale ones out of active cognition Live. The endpoint is cross-episode compression: co-activated beliefs strengthen the edges between them, episodes are distilled into durable semantic knowledge, the substrate getting denser and more useful over time In progress · Phase 1.

6. The Continuous Loop

The four layers describe what’s there. The loop is what makes them run. Extract, attend, consolidate, decay are not four features layered on top of a memory store. They are the cycle that keeps the Mind current with no human in the curation path.

Extract

Every new native episode (a thread message, a document upload) is parsed into structured nodes, and each node records the episode it came from Live. Extraction runs on every interaction, not on demand, with no curation step. This is the loop’s only entry point, and it carries most of the engineering risk: a Mind is exactly as good as its extraction.

Extraction also distinguishes first-party self-statements from third-party claims about other speakers. Your own stated facts about yourself land as confirmed beliefs in the responder’s Mind about you. What someone else says about you carries source evidence and credibility weighting instead, and is never promoted to held-true without your own confirmation. This is how hearsay enters cognition without becoming gossip In progress · Phase 2.

Attend

The Executive Engine scores and retrieves on every interaction Live. Scoring elevates what matters now; suppression hides what doesn’t. The score itself is a small fact about the world that gets logged: every retrieval writes an activation row, which is what makes reinforcement possible.

Consolidate

Reinforced nodes stay important; unreinforced ones decay Live. The designed endpoint is cross-episode compression: co-activated beliefs strengthen the edges between them, episodes are distilled into the durable semantic knowledge they implied, identity-level facts get promoted, transient mentions get demoted In progress · Phase 1.

Decay

Forgetting is on purpose. Every node carries a decay rate set by metacognition. Identity-level facts decay slowly. Transient mentions fade fast. When importance falls far enough a node is archived, out of active cognition but still on record, not deleted Live. Decay is what makes the loop a loop rather than a one-way accumulator.

The cycle is self-sustaining. Attention scores a node, which reinforces it, which keeps it salient through the next consolidation pass. As long as a node continues to matter, the loop carries it. As soon as it stops mattering, the loop lets it go.

7. A Concrete Example

In May, you tell the system: “Origin is meant to be self-serve. The agency work is just keeping the lights on.” Extraction creates a claim: Origin is positioned as self-serve, linked back to the conversation where it appeared. Its evaluation starts at moderate confidence and high importance.

In June, a harvested Claude Code session shows something different: the integration you shipped only works if a human onboards each customer for the first two weeks. Extraction creates another claim: Origin requires high-touch onboarding for the first 14 days.

These two claims occupy the same conceptual slot: Origin’s go-to-market posture. A plain memory system might silently let the newer fact win. A cognitive runtime surfaces the contradiction:

You said Origin was self-serve in May, but the implementation shipped in June requires high-touch onboarding. Which is true now?

You resolve it: the May framing was aspirational; the June reality is what shipped. The system supersedes the May belief, preserves it as history, and updates the active belief. Confidence on the June belief goes up. Importance on the May belief drops.

A week later, you ask an assistant about Origin’s pricing strategy. The retrieval engine surfaces both the active belief and three related beliefs that strengthened during the contradiction resolution, including one about your bias toward concrete validation over speculative positioning. The assistant doesn’t just know your current strategy. It knows how you arrived at it.

That is cognition. Memory stores both facts. Cognition notices that they cannot both guide the next decision.

8. How this differs from existing memory products

This distinction needs care. Existing products are not doing nothing.

OpenAI documents saved memories, reference chat history, Temporary Chat, memory sources, and project-only memory. Anthropic documents Claude memory, chat search, project summaries, and memory import/export. Google documents Gemini imports for memory and chat history from other AI platforms. mem0 supports entity-scoped memory through identifiers such as user, agent, app, and run. Letta supports stateful agents, memory blocks, archival memory, and shared memory.

Those are real capabilities. The difference is not “they have no memory.”

The difference is the unit of architecture.

Most memory systems help a product remember. A cognitive runtime helps an identity think across products.

CapabilityProduct memoryCognitive runtime
Stores useful factsYesYes
Personalizes future responsesYesYes
Scopes memory by project, user, app, run, or agentOftenYes
Separates claim from per-Mind evaluationUsually not the primary unitYes
Treats retrieval as reinforcementNot typically first-classYes
Surfaces contradictions for resolutionSometimes partiallyYes, as substrate
Maintains source evidence per beliefProduct-dependentYes
Lets identities carry state across toolsRareYes, through harvest
Gives agents durable MindsSome frameworks, partiallyYes
Gives the user a runtime in conversationGenerally noPaper 3

The claim is narrower and stronger than “nobody has memory.” Memory is becoming a feature inside products. Cognition has to become a layer beneath them.

9. Adjacent patterns: LLM wikis and structured knowledge bases

One pattern worth naming directly is Andrej Karpathy’s recent LLM Wiki. Use Obsidian or a markdown repository as the human-readable surface. Keep raw sources immutable. Let an LLM maintain a structured wiki on top of them. Files like AGENTS.md, index.md, and log.md guide ingestion, querying, and maintenance. Variations using Readwise, NotebookLM, or other tools share the same shape.

That pattern is useful and directionally correct. It recognizes that raw retrieval is not enough. Knowledge should be compiled, cross-linked, updated, and made inspectable. In that sense, an LLM Wiki is much closer to cognition than a flat vector store.

But it is still a maintained artifact, not a runtime.

A wiki page can say what is known. A Mind also tracks who believes it, how strongly, from what source, in what audience, with what decay curve, under which goal pressure, and against which contradictions. A wiki can be maintained by an agent. A Mind is the state the agent maintains and reasons from.

The two architectures are complementary. An Obsidian vault, Readwise archive, or a Karpathy-style LLM Wiki can become a harvest source for a cognitive runtime: episodes flow in, claims get extracted, evaluations get attached, source history gets preserved.

LLM wikis compile knowledge into a readable artifact. Cognitive runtimes maintain cognition as runtime state.

A wiki can be a source, a projection, or a view inside the system. It is not the whole Mind.

10. Minds are for agents too

Minds are per identity, not per human.

That matters because agents are becoming durable participants. A research agent needs to remember source credibility patterns. A finance agent needs to remember budget constraints, risk thresholds, and prior forecasts. A code-review agent needs to remember architectural decisions and recurring failure modes.

If agents are just prompt roles inside the caller’s context, their disagreement is theatrical: one model pretending to be several perspectives.

If agents have Minds, disagreement becomes real in the product sense: separate histories, separate evaluations, separate confidence levels, separate source evidence, separate goals.

Three agents can hear the same room event and draw different conclusions because each carries a different Mind. That is not workflow orchestration. It is cognitive plurality.

In thinqOS, multi-owner agents are live: agents can be co-owned, and ownership is represented structurally rather than as an after-the-fact policy Live.

Paper 3 covers the conversation layer above this: what happens when a user’s Mind and an agent’s Mind meet in a single room, with both sides choosing what to disclose.

11. Harvest: cognition across tools

Most of the thinking that matters happens outside any single AI product.

A person uses ChatGPT for strategy, Claude Code for implementation, a coding assistant in an IDE, Slack for decisions, docs for planning, and transcripts for commitments.

If a cognitive runtime only captures what happens inside its own chat surface, it misses most of the cognition it is supposed to model.

Harvest ingests episodes from outside the product. A reference adapter for Claude Code is live in thinqOS Live. It turns external AI activity into episodes that enter the same substrate as native interactions.

Contradictions often happen across tools. You state a product strategy in a chat. You ship code in Claude Code that quietly contradicts it. A per-tool memory layer cannot see that conflict. A cognitive runtime beneath the tools can. The substrate beneath that capability is live; cross-source contradiction surfacing itself is active phase-1 work In progress · Phase 1.

That is the point of harvest. Not another chat memory. A layer that grows from every place you think.

12. Trust, privacy, and inspection

A cognitive runtime is powerful because it stores beliefs about people and agents over time. That is also why trust cannot be bolted on later.

The product needs three trust surfaces:

  1. Inspection: the user can see what the system believes, where it came from, and how confident it is.
  2. Correction: the user can correct, supersede, narrow, widen, delete, or dispute remembered state.
  3. Scope: beliefs are used only where they are permitted to be used.

The honest status today:

Trust capabilityStatus
Source evidence on claimsLive
Identity-based permissionsLive
Responder-scoped assemblyLive
Mind graph inspectionLive
Durable event logIn progress · Phase 1
Deletion cascade from source to derived beliefIn progress · Phase 1
Explicit disclosed-to audience setsPlanned · Phase 2
First-party vs third-party extraction distinctionPlanned · Phase 2
Hearsay attribution and dispute UIPlanned · Phase 3
GDPR-style forget cascadePlanned · Phase 3

This is the right boundary for the public claim. The substrate already supports inspection and scoped assembly. The harder privacy mechanics, especially source deletion and hearsay dispute, are still being sequenced.

A system like this should not ask for trust by sounding confident. It should earn trust by making its state inspectable.

13. How this should be evaluated

A cognitive runtime should not be judged by whether it sounds smarter in one chat. It should be judged by whether the Mind improves over time.

Minimum evaluations:

EvaluationQuestion
Extraction precision and recallDoes the system capture supported beliefs without inventing unsupported ones?
Contradiction precision and recallDoes it surface real conflicts without overwhelming the user with noise?
Wrong-scope retrieval rateDoes sensitive or local information stay out of the wrong room?
Correction propagation reliabilityWhen the user corrects state, does future behavior actually change?
Retrieval latencyCan the system assemble relevant cognitive context fast enough to be useful?
Reinforcement drift rateDoes repeated retrieval strengthen the right beliefs rather than creating runaway importance?
Deletion cascade reliabilityDo deleted sources invalidate dependent beliefs?
User trust and correction frequencyDo users understand and improve the Mind rather than work around it?

Those numbers are not included here because they need measured eval sets, red-team runs, and production instrumentation. They will be published separately once gathered.

14. What it Trades

A cognitive runtime makes different tradeoffs from a memory feature.

Cold start is worse than stateless. A Mind compounds. Day one has less to work with than day 100. The system carries cognitive overhead before the cognitive asset matures.

Every retrieval can carry a write. Reinforcement means the read path updates state. That is more expensive than a query cache. It is also how the system learns what continues to matter.

Extraction quality is the main failure surface. A Mind is only as good as what extraction captures, rejects, and structures. Bad extraction creates bad cognition. This needs eval sets, correction UI, red-team testing, and replayability.

Privacy is product complexity, not a settings page. Audience scope, deletion cascade, source evidence, inspection, correction, export, and dispute all add complexity. The alternative is opaque memory, which is not acceptable for a cognitive layer.

Continuity has to be defeatable. Some conversations should not compound. Some users prefer statelessness in specific moments. Clean-slate controls are not optional In progress · Phase 1.

These are not edge cases. They are the cost of treating memory as cognition rather than storage.

15. Why Now

Three years ago, the layer beneath AI had little to plug into. Chat clients were closed surfaces. Coding agents were opaque IDE plugins. There was no common expectation that AI tools would emit structured sessions, hooks, traces, or context events.

Now the surface area exists.

AI work happens across tools, and those tools increasingly expose enough structure to harvest from. Agents are no longer one-turn toys. They work across projects, files, repositories, workflows, and teams.

When AI was advisory, statelessness was tolerable.

When AI writes the code, drafts the email, runs the deploy, touches the customer, or coordinates with other agents, statelessness becomes a liability.

The surface to harvest from exists. The lifespan to compound across exists. The cost of the alternative is no longer tolerable.

Three years ago, this was impractical. Three years from now, someone will have built it.

The question is who.

16. Memory Is the Beginning, Not the End

The AI industry is right to care about memory. Memory is real, necessary, and useful.

But memory is a ceiling if you treat it as the destination.

The actual destination is cognition: a system that does not just remember, but attends, evaluates, detects contradiction, reinforces, decays, prioritizes, and evolves.

thinqOS is that architecture as a shipping runtime: a persistent, graph-native Mind for every identity, human or agent, with a running cognitive loop and a harvest contract that reaches into every tool you already use. For the people who use AI, it means your systems finally understand your situation. For the people who build agents, it means your agents finally have minds of their own.

The memory hype is pointing at the right problem from below. The work is to point at it from above.

This paper sits in a growing series of working papers. The conversation layer that runs on top of this substrate, Two minds, one room , describes what AI chat looks like when both sides have Minds, both sides reason, and both sides choose what to disclose. From memory to cognition , explains why memory is only the middle stage, and why cognition is the layer that makes AI stay oriented. See the full series at /science.

External reference notes

These references keep the competitive framing fair. Existing products do have meaningful memory and scoping capabilities. The distinction here is identity-level cognition across products, not the absence of memory elsewhere.