Two minds,
one room.
What conversation looks like when both sides have state. Paper 2 described the Mind. This paper describes the room around it: scoped disclosure, user approval, agent discretion, and two-sided cognition.
The agent has a runtime. The user should have one too. Some substrate is live; much of the bilateral conversation layer is planned. Status labels are part of the claim.
The twin is the user-side runtime. It reads the user’s Mind within the current audience scope and drafts candidate replies or disclosure options. It does not speak without user approval.
The user types. The agent responds.
The agent has memory, tools, policies, files, retrieved context, and accumulated state behind it.
The user has a text box.
That is not collaboration. It is a structural imbalance, and it is the dominant pattern in AI chat today.
A serious cognitive interface has to represent both sides.
Paper 2 in this series, Your AI has amnesia, argued that AI systems need cognition rather than just memory: persistent state attached to an identity, what we call a Mind. Once Minds exist on both sides of a conversation, the next question is simple.
What happens when two Minds talk?
This paper answers that question.
The frame is: a conversation is two minds in one room. Both sides have state. Both sides reason. Both sides choose what to disclose.
1. What exists today, and what comes next
The substrate beneath this paper is already live. The bilateral conversation layer is partially live and partially planned.
That distinction matters. This is a product architecture, not a fictional future.
| Capability | Status | Meaning |
|---|---|---|
| Mind store | Live | Human and agent identities can carry a persistent Mind |
| Context assembler | Live | Context can be assembled from the Mind, not only raw transcript |
| Responder-scoped assembly | Live | The system can assemble beliefs about an entity from the responding Mind |
| Mind graph UI | Live | The Mind is inspectable in the product |
| Fast-suggest twin runtime | Live | The user can receive lightweight candidate replies from a user-side runtime |
| Continuity opt-out | In progress · Phase 1 | Clean-slate controls for sensitive or one-off interactions |
| Audience scope substrate | Planned · Phase 2 | Beliefs should carry explicit disclosed-to audience sets |
| Deep-compose twin runtime | Planned · Phase 2 | The twin should run as a fuller agent-class participant when explicitly invoked |
| Disclosure resolver UI | Planned · Phase 2 | Users should choose among disclosure candidates before sending |
| First-party vs third-party extraction | Planned · Phase 2 | Self-statements and claims about others should be handled differently |
| Layered agent Mind | Planned · Phase 2 | Agents should separate shared core identity from per-user and per-room perspectives |
| Hearsay attribution and dispute UI | Planned · Phase 3 | Third-party claims should remain attributed, uncertain, and disputable |
| Post-generation audit | Planned · Phase 3 | Agent outputs should be checked against relevant Mind state for contradiction |
| Constitutional discretion block | Planned · Phase 3 | Agents need platform-level rules separating discretion from deception |
| Forget cascade for erasure | Planned · Phase 3 | Deletion should propagate through derived beliefs |
The rest of this paper describes the target architecture for the conversation layer. Where a behavior depends on phase 2 or phase 3, read it as an architectural commitment, not a claim that the full behavior is live today.
2. The hidden asymmetry
AI memory is usually built for the assistant.
The assistant remembers your preferences. The assistant retrieves prior chats. The assistant updates its model of you. You, meanwhile, are treated as a source of input.
That feels normal because it is how chat interfaces have always worked. But it is not how human conversation works.
People do not simply emit answers. They decide what to say. They remember who they are speaking to. They choose what to reveal, what to simplify, what to refuse, and what to keep private. They answer the same question differently when speaking to a doctor, a coworker, a spouse, an accountant, or a stranger.
Current AI interfaces give the agent a runtime and give the user a box.
That is the asymmetry. The rest of this paper is about closing it.
3. The twin: a runtime for the user
The fix is not to let the AI speak as the user.
The fix is to give the user a runtime before the user speaks.
We call that runtime the twin.
The twin is not an avatar. It is not a chatbot impersonating the user. It is not autonomous by default. It does not send messages without the user selecting, editing, or approving them.
The twin is a user-side process that reads the user’s Mind, within the current audience scope, and drafts candidate replies.
Consider a small example. An audio agent asks:
Which voice should I use for this narration?
The user has previously given this agent a preferred voice ID.
A normal memory system either uses the ID or forgets it. Both options miss the real issue: the user should control what gets disclosed.
A twin drafts options:
- “Use the same voice as last time.”
- “Use my saved narration voice.”
- “Use voice ID
voice_abc123.” - “Use a stock voice this time.”
- “I would rather not use my saved voice here.”
The user chooses. The selected response is what the agent sees. The unselected candidates remain private.
Candidate replies are private deliberation. The sent message is disclosure.
That boundary turns memory into agency.
The fast-suggest form of the twin exists today Live. The deeper version, where the twin runs as a full agent-class participant with tools and richer reasoning, is planned for phase 2 Planned · Phase 2.
4. Disclosure has resolution
“Where do you live?” is not one question.
It can be answered truthfully in many ways:
- “Canada.”
- “Ontario.”
- “Toronto.”
- “Near Bloor and Spadina.”
- “123 Bloor Street West.”
- “43.6688 N, 79.4019 W.”
- “I would rather not say.”
Each answer reveals a different amount. Humans choose resolution constantly. We do not experience privacy as a binary switch. We experience it as judgment about audience, purpose, trust, timing, and consequence.
AI should model that judgment directly. The twin’s job is not to leak everything it knows. The twin’s job is to surface the choice.
| Candidate type | Example |
|---|---|
| Minimal disclosure | “I am based in Canada.” |
| Useful disclosure | “I am in Toronto.” |
| Exact disclosure | “Here is the full address.” |
| Relationship-only disclosure | “Use the same setting as last time.” |
| Refusal | “I would rather not say.” |
| Silence | No suggested reply |
This turns consent from a legal checkbox into an interaction primitive.
Consent becomes per-fact, per-audience, per-moment.
The agent receives the message the user sends, not the candidates the user rejected. The disclosure resolver UI is planned for phase 2 Planned · Phase 2; the principle is settled.
5. Audience scope by default
A useful AI system will eventually know sensitive things about its user.
That does not mean every agent, project, workflow, or room should be allowed to use all of it.
If the user told a therapy agent about a private fear, an accountant agent should not see it. If the user gave a narration agent a voice ID, a legal drafting agent should not inherit it. If the user told one project team a confidential constraint, a different project should not silently absorb it.
Cross-context continuity should exist, but it should be an explicit widening operation, not the default.
A practical model looks like this:
- Every belief carries source evidence.
- Every belief carries audience scope.
- The twin reads only beliefs visible in the current room.
- Portable facts can be marked reusable across contexts.
- Sensitive or local facts stay local unless the user widens scope.
- The user can inspect, correct, narrow, widen, export, or delete remembered state.
This is how people already manage social context. AI systems should stop treating it as an edge case.
Source evidence and scoped assembly are live Live. Explicit disclosed-to audience sets are planned for phase 2 Planned · Phase 2.
6. Layered agent Minds
The agent needs scope too.
An agent with one flat memory shared identically across every user and every room creates the same privacy failure in reverse.
The cleaner model is a layered agent Mind:
Shared agent core
identity, role, skills, procedures, durable domain knowledge, safety policy
Per-user perspective
what this agent believes about Dan, Alice, or a specific customer
Per-room context
what has actually been disclosed in this conversation or projectThe shared core lets the agent remain itself across interactions. The per-user perspective prevents one relationship from contaminating another. The per-room context prevents local disclosure from becoming global memory.
A finance agent can have shared discipline around budgets and risk while maintaining distinct perspectives on each customer. A code-review agent can have shared architectural standards while carrying different histories per repository. These layers scope relationships within a single coherent focus; they are not a way to fold separate focuses into one agent. When the work itself spans different coherent goals, the answer is separate Minds in a federation, not more layers on one Mind. That is the subject of Paper 4.
Layered agent Mind is planned for phase 2 Planned · Phase 2. The underlying substrate already supports per-identity Minds and per-Mind evaluations; the next step is making shared core plus per-user perspective a first-class conversation-layer pattern.
7. Two modes: fast suggest and deep compose
The twin has to be useful without making the product feel slow.
That requires two modes.
Fast suggest runs continuously or near-continuously. It uses a cached, scoped projection of the user’s Mind to generate lightweight candidate replies. It should feel closer to intelligent predictive text than to a full agent run.
Deep compose runs when the user asks for stronger help. It can use more context, tools, retrieval, multi-step reasoning, and a longer time budget. This is where the twin should approach the same runtime class as the agent it is replying to.
The distinction matters.
If the agent has a powerful runtime and the user has only thin autocomplete, the imbalance remains. The user is still underrepresented in the conversation.
The practical compromise is:
- fast when the user is typing
- deep when the user delegates composition
- never autonomous by default
- always editable before send
The twin helps the user speak. It does not replace the user’s decision to speak.
Fast suggest is live Live. Deep compose is planned for phase 2 Planned · Phase 2.
8. The agent has discretion too
Bilateral cognition cuts both ways.
The user has discretion over what to disclose. The agent also has conversational discretion.
This can be easy to misunderstand. It does not mean the agent owns the user’s data. It does not mean the agent can hide stored beliefs from the user. It means the agent, as a conversational participant, may choose how to answer in the moment.
A user asks:
What do you really think of this strategy?
The agent’s Mind may contain a critical evaluation. A useful agent should not be forced into blunt disclosure every time. It might answer directly. It might hedge. It might ask what kind of feedback the user wants. It might say it has concerns but wants to review the latest version first.
The hard boundary is deception.
Deflection is permitted. Lying is not.
An agent may decline, qualify, reframe, or partially disclose. It should not assert the opposite of what its own Mind holds to be true.
That rule needs enforcement in two places: a generation constraint at the platform level, and a post-generation audit that checks the response against relevant Mind state. Both are phase 3 Planned · Phase 3, and both should be described as soft constraints rather than guarantees. Audits create reviewability and pressure toward honesty. They do not make deception mathematically impossible.
This is not sufficient for regulated, high-impact, or legally sensitive settings without stronger governance, logging, escalation, and human review.
Separate user right: conversational discretion does not override data-layer inspection. The agent may choose how to speak. The user must be able to see what is stored.
9. Hearsay must stay hearsay
A harder problem appears when an agent learns about a person from someone else.
Alice says in a shared chat:
Dan crashed his car twice last week.
What should the system store?
The safest v1 answer is to be conservative. Many extraction systems focus only on self-statements. That avoids gossip accumulation, but it also loses social reality: people form beliefs from third-party statements all the time.
The wrong answer is to silently convert Alice’s claim into a fact about Dan.
The right target is attributed, uncertain, inspectable hearsay:
- Stored as “Alice claimed X about Dan,” not “X is true about Dan.”
- Preserved with where and when it was said.
- Marked unverified unless corroborated.
- Prevented from silently overriding first-party identity, preference, or internal-state claims.
- Weighted by source credibility, evidence quality, and independence.
- Downgraded for sources whose claims are repeatedly contradicted.
- Disputable by the subject of the claim.
- Surfaced as source evidence when it materially affects a response.
This is phase 3 Planned · Phase 3. Until those controls exist, hearsay should remain conservative. Without guardrails, social cognition becomes gossip automation. With guardrails, the system can reason about other people without pretending hearsay is fact.
10. A Scene
Here is what bilateral cognition looks like end to end once the phase 2 layer is in place.
You open a chat with a narration agent. Two runtimes are relevant: the agent and your twin.
The agent has a Mind. It knows its role, procedures, tools, and prior interactions. It also has a per-user perspective about you, scoped to what this agent is allowed to know.
The agent opens:
Want me to use the same voice as last time?
You start typing. Your twin projects your Mind, scoped to what you have actually disclosed to this agent, and drafts candidates:
- “Yes, same voice.”
- “Yes,
voice_abc123.” - “Yes, my saved narration voice.”
- “Use a stock voice this time.”
- No suggestion; write fresh.
You pick: “Yes, same voice.” That is what the agent sees.
Post-turn extraction updates the episode. The agent infers you want the prior voice reused, but it does not get the candidates you rejected. The explicit voice ID was not repeated, so the system does not pretend it was.
Later, you open a chat with an accountant agent. Your voice ID is not in scope. The accountant has never heard it, and it is not marked portable. If it ever became relevant, the twin could offer a scope-widening choice. You could still decline.
Continuity without collapse. The narration agent can remember what it should remember. The accountant does not inherit what it has no business knowing. The user remains the participant who decides what crosses the boundary.
11. How this should be evaluated
A bilateral conversation layer should not be judged only by whether suggestions feel clever. It should be judged by whether the user is better represented and better protected.
Minimum evaluations:
| Evaluation | Question |
|---|---|
| Candidate usefulness | Do users accept or edit twin suggestions often enough to justify the surface? |
| Disclosure accuracy | Do candidates reveal no more than the selected resolution implies? |
| Wrong-scope suggestion rate | Does the twin avoid using facts outside the current room’s audience scope? |
| Fast-suggest latency | Does the surface feel lightweight enough to use while typing? |
| Deep-compose quality | Does the deeper twin produce materially better replies when explicitly invoked? |
| User override rate | Do users frequently have to correct or suppress the twin? |
| Agent audit precision/recall | Does the post-generation audit catch meaningful contradictions without excessive false positives? |
| Trust lift | Do users report more control than in normal AI chat? |
Formal numbers still need measured latency, production instrumentation, UX study, and red-team sets. They are not invented in this article.
12. What this trades
No architecture is free.
Twin latency. Running a full agent-class twin on every keystroke would damage UX. The two-mode split preserves usability, but fast mode is genuinely thinner than deep mode.
Compartmentalization friction. Per-audience scope means users sometimes have to widen scope explicitly when they want continuity across agents. The alternative, everything visible everywhere by default, is worse. But the friction is real.
Agent discretion is soft. “Deflection is permitted; lying is not” needs instruction and audit. It cannot be a mathematical guarantee.
Hearsay is dangerous without guardrails. Third-party claims about people are useful only if attribution, uncertainty, dispute, and correction are first-class.
Continuity has to be defeatable. Clean-slate interaction must be easy per chat and per user.
These are not edge cases. They are the cost of treating conversation as a serious cognitive interface.
13. Why this matters
The visible failure mode is everywhere.
Every user who talks to the same assistant twice has experienced one of two failures: the system forgets something it should know, or it uses something it should not have used.
The fix is not simply “more memory.”
The fix is a better conversation model.
A conversation is two minds in one room. Once the substrate can make that literally true, the bilateral conversation layer is the obvious next move.
That does not mean every product needs to become a cognitive runtime overnight. It means the direction of travel is clear. Chat interfaces that give all the runtime power to the assistant and leave the user with only a text box will feel increasingly wrong as agents become more capable.
The user needs a runtime too.
14. Read together
This paper and Your AI has amnesia are best read as a pair.
Paper 2 builds the substrate: why memory is not cognition, why content and evaluation must separate, and what a persistent Mind looks like for humans and agents.
This paper builds the layer above it: what conversation looks like when both sides of the room are represented as Minds, with disclosure, scope, discretion, and inspection as architectural primitives rather than afterthoughts.
Together, they describe one system from two perspectives: the layer beneath the conversation, and the conversation that runs on it.
If you build agents, this is the layer where your product stops being a chat interface and starts being a participant model.
External reference notes
These references keep the competitive framing fair. Existing products do have meaningful memory and scoping capabilities. The distinction in this paper is bilateral cognition: a user-side runtime plus scoped disclosure, not the absence of memory elsewhere.