$ cat ai-engineering/multi-agent-shared-memory.md

Multi-Agent Shared Memory

Shared memory across LLM agents: scoping rules, consistency models, blackboard vs shared-block vs cross-thread store patterns, and the split-brain bugs.

Jatin Bansal@blog:~/ai-engineering$ open multi-agent-shared-memory

A research crew runs three agents in parallel: a web-search specialist, an internal-docs specialist, and a synthesis agent. Each maintains its own working-memory scratchpad and its own episodic store. The first two finish in 90 seconds; the synthesizer takes their outputs and produces a clean report in another 30. Total wall-clock: two minutes. Two days later the same three agents run an overnight job: the web specialist crawls 200 pages, the docs specialist reads 80 internal RFCs, the synthesizer is supposed to assemble a state-of-the-quarter brief. This time it takes eleven hours, the synthesizer’s report references “the customer churn data the docs agent surfaced” — except the docs agent never surfaced that, the synthesizer made it up, and somewhere in the middle of the run the web agent silently overwrote the docs agent’s “open questions” list with its own. The bug is not in any single agent. The bug is in what they share. This article is the deep dive on that layer — multi-agent shared memory, the first place in the curriculum where concurrency becomes a first-class memory concern.

Opening bridge

Yesterday’s piece on cross-session identity closed the single-user side of the memory subtree — durable profiles, persona switches, the deletion path, all anchored to one user being remembered by one agent over time. Today flips the axis: one memory store read and written by multiple agents at once. The multi-agent orchestration article deferred this explicitly — “shared memory across agents will get its own piece later in the memory subtree — flagged here because it’s the most common reason a working two-agent system fails when you scale to ten” — and the working-memory article named the blackboard as the multi-writer specialization of the typed-state-object substrate. Both articles handed off the consistency question. This piece picks it up.

Definition

Multi-agent shared memory is any memory store that is concurrently readable and writable by more than one agent loop, and whose contents are intended to affect the prompts those agents receive on subsequent turns. Three properties separate it from single-agent memory. First, the number of writers is greater than one — a single shared corpus that only one writer ever touches (a read-only RAG knowledge base) is a degenerate case that inherits none of the consistency problems. Second, the write timing is non-deterministic — two writes from different agents can arrive in either order and the system must handle both interleavings; the linear “one turn at a time” assumption that holds in a single agent loop does not hold here. Third, every agent’s prompt reflects the store’s current state — a stale read is not a quirk, it’s a correctness failure because the agent will reason and act on the stale view.

What multi-agent shared memory is not. It is not multiple agents each having their own private memory (that’s just N copies of the single-agent stack). It is not message-passing between agents (that’s agent-to-agent communication — A2A, handoffs, function calls — where data flows through the call site, not through a third-party store). It is not the foundation-model’s training data, which all agents read but none can write to in production. The defining feature is the shared mutable substrate.

Intuition

The mental model that pays off is a shared mutable database with no DBA. Every database engineer’s worst Monday morning is when two services started writing to the same row without coordinating. Multi-agent shared memory is that situation by default, with the additional fun that the “services” are non-deterministic LLM-driven agents whose write decisions are stochastic, and the “rows” are unstructured text that the next reader has to interpret in a prompt.

The four concrete shapes shared memory takes in production agent systems, in increasing order of how much concurrency they actually have to handle:

The supervisor-mediated channel. One agent (the supervisor) owns the writes; specialist agents read but cannot write. Single-writer-multiple-reader. No concurrency.
The shared block. A typed slot (Letta’s memory_block, a LangGraph state field, a Postgres row) is attached to several agents. Each agent can read and write. Multiple writers; concurrency control is mandatory.
The cross-thread store. A namespaced key-value layer (LangGraph’s Store, MongoDB-backed memory store, a Redis hash) that any agent can put/get against. Often used for cross-agent and cross-session memory. Multiple writers across both axes.
The blackboard. A typed, multi-region store with multiple agents reading and writing simultaneously, plus a scheduler that wakes agents based on board state. The full distributed-shared-memory model.

The trap most teams fall into is sliding from #1 to #4 without noticing — what starts as “the supervisor writes the plan, the workers read it” silently grows into “the workers also write their findings back, and another worker reads those findings, and now we have a free-for-all on the same store.” The single-writer assumption that made the first version work is gone; nobody updated the design.

The distributed-systems parallel

Three analogies. Each illuminates a different failure mode.

Shared memory across agents is shared memory across processes. The Unix shared-memory primitives (shmget, mmap) give multiple processes access to the same region of physical RAM. The throughput is high; the safety is nonexistent without explicit synchronization. Every concurrency hazard known to OS textbooks — torn reads, lost updates, ABA, race-to-commit — applies. The reason these primitives are dangerous in normal programs is the same reason they’re dangerous for agents: the substrate is fast and simple, but the semantics force every consumer to bring its own coordination. Frameworks like Letta give you the shared block; the coordination discipline (when to insert, when to replace, who owns “heavy edits”) is the part you have to design.

Multi-agent shared memory is a multi-master replication problem. Postgres logical replication, Cassandra’s tunable consistency, DynamoDB’s global tables — every multi-master system in production has to answer four questions: when does a write become visible to other readers (read-your-writes vs strong vs eventual), how are concurrent conflicting writes resolved (last-writer-wins, CRDT merge, custom conflict resolver), what happens during a network partition (CAP — either some agents see a stale view or some can’t write), and how is the audit trail preserved (event log, version vector, hybrid logical clock). All four show up in multi-agent shared memory. The October 2025 CodeCRDT paper ported the CRDT answer directly to LLM agents — observation-driven coordination over a shared document with strong eventual consistency — and reported 100% convergence with zero merge failures across 600 trials. The price was 5–10% semantic conflicts (the document converged textually but the agents’ intents disagreed) and a 21% speedup on tasks with clean parallelism versus a 39% slowdown on tasks where the agents were really sharing more than the substrate could mediate. The numbers are the point: CRDT-class consistency solves the mechanical problem of converging on a document; it does not solve the semantic problem of converging on a goal.

The blackboard architecture is a distributed publish-subscribe broker. The 1971 Hearsay-II system — a global typed store, multiple knowledge sources reading and writing concurrently, a scheduler activating the next source based on board state — reads exactly like a modern message broker with topic filtering and consumer groups. The 2025 LLM blackboard papers replay the pattern with LLMs as the knowledge sources. The architecture is the same one Kafka or NATS implement; the consumers are stochastic and the messages are unstructured prose, which is what makes the consistency story harder than it looks at the API surface.

Pattern 1: The supervisor-mediated channel

Single-writer-multiple-reader. The supervisor owns the canonical state — a plan, an open-questions list, a synthesized brief — and updates it as worker responses come in. Workers read it as part of their input prompt but never write to it directly; their findings flow back through the handoff payload, and the supervisor decides what to commit.

This is the safest pattern. There are no concurrent writes, so there are no consistency questions. Every observation about distributed databases that follows simplifies away — the supervisor is the database, the workers are stateless clients, and the worst that can happen is a stale read in a worker (the supervisor updated the plan after the worker received its input) which the supervisor can detect on next handoff and re-dispatch.

The cost: the supervisor is a bottleneck and a context inflater. Every write goes through its loop; every write pays a model call to decide how to commit; the supervisor’s own conversation history grows as it sees every worker’s findings. The 15× token tax from the multi-agent orchestration article is partly an artifact of this — the supervisor pays for the union of all sub-tasks because it’s the only writer.

When this is right: small specialist counts (≤5), tasks where the supervisor’s per-step decision quality matters more than parallel write throughput, anywhere the audit story has to be clean. When this is wrong: highly parallel breadth-first tasks where a hundred subagents are emitting findings in parallel and routing every one through a single supervisor’s context window is a fork-bomb on the supervisor’s prompt rather than on its child processes.

Pattern 2: The shared block

The supervisor’s monopoly is broken. Two or more agents each have read and write access to the same typed slot. Concurrency control becomes mandatory.

This is the pattern Letta’s shared memory blocks productionize. A block is created once (client.blocks.create(...)) and attached to multiple agents via block_ids; each attached agent can read or write to it through three different operations with different safety properties. The published memory-operations table is worth quoting verbatim because the asymmetry it documents is exactly the design lesson:

memory_insert — append-only, multi-writer safe. Multiple agents can call it simultaneously without conflicts; the block grows monotonically. This is the operation you reach for when you don’t know whether you’ll have multi-agent contention.
memory_replace — match-then-replace, mostly safe. Fails if the target string has changed since it was last read. This is optimistic concurrency control — the same primitive Postgres exposes as compare-and-swap and Git exposes as a non-fast-forward push reject.
memory_rethink — last-writer-wins, unsafe under contention. The full block is rewritten with no coordination. Letta’s guidance: “Designate one agent (or sleeptime) as the ‘owner’ for heavy edits. Other agents append via memory_insert.”

The pattern that falls out is a tier-based access policy. Most agents are allowed memory_insert only; one designated owner (often a sleep-time consolidation agent) gets memory_replace and memory_rethink and runs when no other agent is actively writing. The shape is a classic single-writer compaction over a multi-writer log — exactly the Kafka log-compaction model from the reflection article, specialized to a shared working-memory block.

CrewAI’s unified memory model makes a different design choice. Memory is shared by default across all agents in a crew, with hierarchical scopes (/project/alpha, /agent/researcher) carving private views out of the shared whole. Where Letta gives you tight control over which agents share which block, CrewAI gives you a global namespace with read-time scope filters. Both choices are defensible; the practical difference is that CrewAI’s default is “everyone sees everything” with opt-in privacy, and Letta’s default is “nothing is shared” with opt-in sharing — and the consistency problems show up at different stages of the design.

Pattern 3: The cross-thread store

The store-level shared memory. LangGraph’s Store API — backed by Postgres, Redis, MongoDB, or an in-memory implementation for development — exposes a namespaced key-value layer that any thread (any agent, any session) can put, get, or search against. The Store sits below the per-thread checkpointer; checkpointer state is private to one thread, Store state is shared across all of them.

The semantics are deliberately bare: put((namespace,), key, value) writes a row; get((namespace,), key) reads it; search((namespace,), query=...) does semantic retrieval if the namespace was configured with an embedding model. The concurrency model is whatever the backend implements — Postgres-backed stores get serializable transactions if you ask for them, Redis-backed stores get last-writer-wins by default, the in-memory dev store is racy. The store does not enforce consistency; it inherits the consistency of its backend.

The mental model that pays off is the Store is a shared blackboard with a key-value interface and no scheduler. Every agent can read and write; the namespace is the only scoping mechanism; nothing wakes an agent when a key it cares about changes. This is the cost of the API surface being small — coordination is up to the agents and the harness above them. Production deployments add the missing pieces: a notification mechanism on top of the Store (Redis pub/sub, Postgres LISTEN/NOTIFY), a versioning column for optimistic concurrency, an audit-log table that captures every write for compliance and debugging.

Pattern 4: The blackboard

The full distributed-shared-memory model. A typed store with multiple writers, multiple readers, and a scheduler that activates agents based on the store’s current state. The Hearsay-II revival in 2025 (the Lu & Sasaki paper and the follow-up information-discovery paper, reporting 13–57% relative gains over baselines on data-science discovery) is the modern reference, and the architecture is the right one to reach for when:

The agent count is genuinely large (10+ specialists).
Which agent runs next is data-dependent and not predictable in advance.
Agents need to react to other agents’ writes, not just to the user or the supervisor.

The shape: a typed board with regions (hypotheses, evidence, open_questions, conclusions), a write log (every write is timestamped, attributed, and versioned), a scheduler that scans the board and dispatches the next agent whose preconditions match the current state, and explicit conflict-resolution rules on each region (hypotheses is append-only; conclusions is single-writer; open_questions is multi-writer with CRDT-style set-union semantics).

The blackboard is what you build when no single supervisor’s prompt is wide enough to route the whole task. The supervisor pattern’s bottleneck — the lead agent’s context window growing linearly with task complexity — is solved by making the board itself the coordination surface; agents look at the board to decide what to do, the supervisor (if there is one) just observes. The cost is that everything subtle about distributed shared memory now has to be designed, not avoided.

The four consistency questions

Whichever pattern you pick beyond #1, four questions have to be answered. The answers are how you turn “shared memory” from a vibes-based design into something an on-call engineer can debug at 3am.

1. When does a write become visible? Read-your-writes (an agent’s own writes are immediately visible to itself) is table stakes. Cross-agent visibility splits into strong consistency (every other agent sees the write on its next read) and eventual consistency (other agents may see a stale view for some bounded period). Letta’s docs are explicit: shared blocks are strong-consistency within a single Letta server because all blocks live in the same Postgres row and the database serializes the reads. LangGraph’s Store-with-Redis is eventual by default; Store-with-Postgres is strong if you ask for it. The right choice is workload-dependent: a fan-out research crew where agents work on independent regions of a problem can tolerate eventual visibility; a multi-agent code-edit crew where agents are touching the same file cannot.

2. How are concurrent conflicting writes resolved? Three doctrines, escalating in safety. Last-writer-wins — the simplest, and the one most home-grown shared-memory systems start with — is unsafe whenever the lost write contained information that wasn’t in the winner. Optimistic concurrency (compare-and-swap, the model Letta’s memory_replace and Git’s non-fast-forward push use) catches conflicts at write time and forces the loser to retry against the updated view; the cost is the retry, the benefit is no silent data loss. CRDT-style merge (the CodeCRDT approach) defines a deterministic merge operator on the data type so that any two concurrent writes can be combined without conflict; the cost is the data type has to be designed for CRDT semantics (sets, counters, Y.js-style sequence types — not free-form prose), the benefit is the system never blocks on a conflict.

3. What is the audit trail? Every shared-memory write needs who, when, what, and (sometimes) why. The “who” makes attribution possible; the “when” enables temporal queries; the “what” is the payload; the “why” is the optional reasoning the agent emitted with the write. Skipping this turns every shared-memory bug into a forensics nightmare — the report mentions a fact that wasn’t in any agent’s individual output, and there’s no way to find which write introduced it. The cheapest implementation is an append-only audit log on the side of the store; the most expensive is a full event-sourced reconstruction (the store is the event log, with materialized views for the current state). The middle ground — a store plus an audit table — is where most production systems land.

4. What is the deletion path? When a write is wrong, who can take it back, and what cascading invalidation does that require? The memory-conflict article covered the supersession-vs-deletion distinction for single-agent stores; in multi-agent, the answer is harder because other agents may already have read the wrong write and acted on it. The mature pattern is to surface the deletion as an event on the store rather than a silent state mutation — every reader can observe “the previous value at key K was wrong, here’s the corrected one” and choose whether to re-run its own logic against the new view. The Kafka-tombstone parallel is direct: deletion is just another write with special semantics, recorded in the same log as everything else.

Mechanics: scope and the namespace hierarchy

Whatever pattern you pick, the namespacing of the shared memory is what makes it tractable at scale. The naïve “one global store everyone reads and writes” is the worst case — every write contends with every other write, and the persona-leak production incident from the identity article generalizes from cross-persona to cross-tenant to cross-agent leakage.

Three scoping axes that every production design has to pick a position on:

Per-task scope. Each multi-agent task gets its own namespace; nothing leaks between tasks. The default for blackboard architectures. The cost is no cross-task learning — useful patterns from one task don’t show up in the next unless explicitly promoted to a higher-scoped store.
Per-tenant scope. All tasks for one organization share a namespace; different organizations are strictly isolated. The default for SaaS deployments. The cost is some agents that should learn across the tenant (the support agent learning from the engineering agent’s resolutions) need explicit cross-agent visibility within the tenant.
Per-agent-role scope. All researchers across all tasks share a “researcher” namespace; all synthesizers share a “synthesizer” namespace. Useful for procedural memory — every researcher learns successful research strategies that future researchers can retrieve. The cost is that the role-level memory can carry biases from one task into another (a researcher who became confident about Acme Corp on Monday surfaces those beliefs as facts on a Wednesday task about a different company).

Most production systems run all three axes simultaneously, with explicit promotion rules between them. The LangMem-style namespace hierarchy (("tenant", tenant_id, "task", task_id) for hot scratchpads, ("tenant", tenant_id, "role", role) for role memories, ("system", "shared") for cross-tenant prior art that’s been reviewed) is the architectural pattern; the exact backing store is incidental.

Code: Python — versioned shared scratchpad with optimistic concurrency

A minimal but production-shaped shared scratchpad. Two agents read and write to the same Postgres-backed block; writes are versioned, conflicts are caught at write time, and the audit log is the source of truth. Uses the Anthropic SDK and psycopg. Install: pip install anthropic psycopg[binary].

python

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
"""
shared_scratchpad.py

A multi-agent shared scratchpad with:
  - Append-only and CAS-style replace operations
  - Per-task namespacing
  - Audit log of every write
  - Optimistic concurrency: replace fails if the version moved
"""

from __future__ import annotations
import time
import uuid
import psycopg
import anthropic

SCHEMA = """
CREATE TABLE IF NOT EXISTS scratchpad (
    namespace TEXT NOT NULL,
    key TEXT NOT NULL,
    value TEXT NOT NULL,
    version INTEGER NOT NULL,
    updated_by TEXT NOT NULL,
    updated_at DOUBLE PRECISION NOT NULL,
    PRIMARY KEY (namespace, key)
);
CREATE TABLE IF NOT EXISTS scratchpad_audit (
    audit_id UUID PRIMARY KEY,
    namespace TEXT NOT NULL,
    key TEXT NOT NULL,
    op TEXT NOT NULL,             -- 'insert' | 'replace' | 'tombstone'
    old_value TEXT,
    new_value TEXT,
    actor TEXT NOT NULL,
    reason TEXT,                  -- the agent's why-for-this-write
    ts DOUBLE PRECISION NOT NULL,
    version INTEGER NOT NULL
);
"""


class SharedScratchpad:
    """Multi-writer scratchpad backed by Postgres with versioned writes."""

    def __init__(self, dsn: str):
        self.dsn = dsn
        with psycopg.connect(dsn, autocommit=True) as conn:
            conn.execute(SCHEMA)

    # -- append-only: multi-writer safe by construction ----------------------
    def insert(self, ns: str, key: str, value: str, actor: str, reason: str = "") -> int:
        """Append to a multi-writer region. Concatenates with newline."""
        with psycopg.connect(self.dsn, autocommit=True) as conn:
            with conn.transaction():
                row = conn.execute(
                    "SELECT value, version FROM scratchpad WHERE namespace=%s AND key=%s FOR UPDATE",
                    (ns, key),
                ).fetchone()
                if row is None:
                    new_value, new_version = value, 1
                    conn.execute(
                        "INSERT INTO scratchpad VALUES (%s, %s, %s, %s, %s, %s)",
                        (ns, key, new_value, new_version, actor, time.time()),
                    )
                else:
                    old_value, old_version = row
                    new_value = f"{old_value}\n{value}"
                    new_version = old_version + 1
                    conn.execute(
                        "UPDATE scratchpad SET value=%s, version=%s, updated_by=%s, updated_at=%s "
                        "WHERE namespace=%s AND key=%s",
                        (new_value, new_version, actor, time.time(), ns, key),
                    )
                self._audit(conn, ns, key, "insert", row[0] if row else None, new_value,
                            actor, reason, new_version)
                return new_version

    # -- replace: CAS-style; fails on version mismatch -----------------------
    def replace(self, ns: str, key: str, expected_version: int, value: str,
                actor: str, reason: str = "") -> int:
        """Replace iff version matches. Raises on conflict."""
        with psycopg.connect(self.dsn, autocommit=True) as conn:
            with conn.transaction():
                row = conn.execute(
                    "SELECT value, version FROM scratchpad WHERE namespace=%s AND key=%s FOR UPDATE",
                    (ns, key),
                ).fetchone()
                if row is None:
                    raise KeyError(f"{ns}/{key} not found")
                old_value, current_version = row
                if current_version != expected_version:
                    raise ConflictError(
                        f"version moved: expected {expected_version}, got {current_version}"
                    )
                new_version = current_version + 1
                conn.execute(
                    "UPDATE scratchpad SET value=%s, version=%s, updated_by=%s, updated_at=%s "
                    "WHERE namespace=%s AND key=%s",
                    (value, new_version, actor, time.time(), ns, key),
                )
                self._audit(conn, ns, key, "replace", old_value, value,
                            actor, reason, new_version)
                return new_version

    # -- read with version --------------------------------------------------
    def read(self, ns: str, key: str) -> tuple[str | None, int]:
        with psycopg.connect(self.dsn) as conn:
            row = conn.execute(
                "SELECT value, version FROM scratchpad WHERE namespace=%s AND key=%s",
                (ns, key),
            ).fetchone()
            return (row[0], row[1]) if row else (None, 0)

    def _audit(self, conn, ns, key, op, old, new, actor, reason, version):
        conn.execute(
            "INSERT INTO scratchpad_audit VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)",
            (uuid.uuid4(), ns, key, op, old, new, actor, reason, time.time(), version),
        )


class ConflictError(Exception):
    """Raised when a CAS replace finds the version moved underneath."""


# -----------------------------------------------------------------------------
# Demo: two agents collaborating on a shared scratchpad
# -----------------------------------------------------------------------------


def agent_turn(client, scratchpad: SharedScratchpad, ns: str, agent_name: str,
               instructions: str, user_msg: str):
    """One agent turn: read the shared block, reason, write back."""
    findings_value, _ = scratchpad.read(ns, "findings")
    open_q_value, open_q_version = scratchpad.read(ns, "open_questions")

    system = (
        f"{instructions}\n\n"
        f"You are {agent_name}. The shared scratchpad currently shows:\n"
        f"<findings>\n{findings_value or '(empty)'}\n</findings>\n"
        f"<open_questions version={open_q_version}>\n{open_q_value or '(empty)'}\n</open_questions>\n"
        "Emit your contribution as a short paragraph; do not duplicate existing findings."
    )
    resp = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=400,
        system=system,
        messages=[{"role": "user", "content": user_msg}],
    )
    contribution = resp.content[0].text.strip()

    # Append-only write is safe under any concurrency.
    new_version = scratchpad.insert(
        ns, "findings", f"[{agent_name}] {contribution}", agent_name,
        reason="incremental finding"
    )
    return contribution, new_version


def demo():
    client = anthropic.Anthropic()
    scratch = SharedScratchpad("postgresql://localhost/scratchpad_demo")
    ns = f"task-{uuid.uuid4().hex[:8]}"

    # Seed an open question via CAS.
    try:
        scratch.replace(ns, "open_questions", expected_version=0,
                        value="What are the top 3 production risks for the Q3 launch?",
                        actor="supervisor", reason="initial question")
    except KeyError:
        scratch.insert(ns, "open_questions",
                       "What are the top 3 production risks for the Q3 launch?",
                       actor="supervisor", reason="initial question")

    # Two specialists run in sequence, each appending findings.
    for agent, instr in [
        ("web_researcher", "You are a web research specialist."),
        ("db_researcher", "You are an internal-data specialist."),
    ]:
        contribution, v = agent_turn(
            client, scratch, ns, agent, instr,
            "Contribute one risk you can support with evidence."
        )
        print(f"{agent} wrote v{v}: {contribution[:80]}...")

    # Demonstrate CAS conflict: two agents try to replace the same key concurrently.
    _, v = scratch.read(ns, "open_questions")
    scratch.replace(ns, "open_questions", expected_version=v,
                    value="Updated by agent A", actor="agent_A", reason="first revision")
    try:
        scratch.replace(ns, "open_questions", expected_version=v,
                        value="Updated by agent B", actor="agent_B", reason="second revision")
    except ConflictError as e:
        print(f"Expected conflict caught: {e}")


if __name__ == "__main__":
    demo()

Three things to notice about the shape. First, insert is append-only and safe under any concurrency — the FOR UPDATE row-lock serializes the read-modify-write, but the operation itself never rejects a write. Second, replace is CAS-style — it requires the caller to pass the expected version and rejects with ConflictError if anyone else has written in between. The caller’s recovery is to re-read, re-reason, and re-attempt; the conflict is visible rather than silent. Third, every write writes the audit log in the same transaction — there is no shared-memory mutation that isn’t reflected in scratchpad_audit. When the synthesizer later produces a report that contains a fact nobody remembers writing, the audit log is the place to look.

Code: TypeScript — namespaced cross-thread store with semantic visibility events

Functionally equivalent in TypeScript using the Vercel AI SDK and better-sqlite3 for portability; in production swap SQLite for the same Postgres backend as the Python version. Install: npm install ai @ai-sdk/anthropic better-sqlite3.

typescript

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
// shared_scratchpad.ts
import Database from "better-sqlite3";
import { generateText } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { randomUUID } from "node:crypto";

class ConflictError extends Error {
  constructor(msg: string) {
    super(msg);
    this.name = "ConflictError";
  }
}

interface ScratchEntry {
  value: string;
  version: number;
}

class SharedScratchpad {
  private db: Database.Database;
  private listeners = new Set<(ns: string, key: string, version: number) => void>();

  constructor(dbPath = ":memory:") {
    this.db = new Database(dbPath);
    this.db.exec(`
      CREATE TABLE IF NOT EXISTS scratchpad (
        namespace TEXT NOT NULL,
        key TEXT NOT NULL,
        value TEXT NOT NULL,
        version INTEGER NOT NULL,
        updated_by TEXT NOT NULL,
        updated_at REAL NOT NULL,
        PRIMARY KEY (namespace, key)
      );
      CREATE TABLE IF NOT EXISTS scratchpad_audit (
        audit_id TEXT PRIMARY KEY,
        namespace TEXT NOT NULL,
        key TEXT NOT NULL,
        op TEXT NOT NULL,
        old_value TEXT,
        new_value TEXT,
        actor TEXT NOT NULL,
        reason TEXT,
        ts REAL NOT NULL,
        version INTEGER NOT NULL
      );
    `);
  }

  // Append-only, multi-writer safe.
  insert(ns: string, key: string, value: string, actor: string, reason = ""): number {
    const tx = this.db.transaction(() => {
      const row = this.db
        .prepare("SELECT value, version FROM scratchpad WHERE namespace=? AND key=?")
        .get(ns, key) as ScratchEntry | undefined;
      let newValue = value;
      let newVersion = 1;
      let oldValue: string | null = null;
      if (row) {
        oldValue = row.value;
        newValue = `${row.value}\n${value}`;
        newVersion = row.version + 1;
        this.db.prepare(
          "UPDATE scratchpad SET value=?, version=?, updated_by=?, updated_at=? WHERE namespace=? AND key=?"
        ).run(newValue, newVersion, actor, Date.now() / 1000, ns, key);
      } else {
        this.db.prepare(
          "INSERT INTO scratchpad VALUES (?, ?, ?, ?, ?, ?)"
        ).run(ns, key, newValue, newVersion, actor, Date.now() / 1000);
      }
      this.audit(ns, key, "insert", oldValue, newValue, actor, reason, newVersion);
      return newVersion;
    });
    const version = tx();
    this.notify(ns, key, version);
    return version;
  }

  // CAS-style replace.
  replace(
    ns: string, key: string, expectedVersion: number, value: string,
    actor: string, reason = ""
  ): number {
    const tx = this.db.transaction(() => {
      const row = this.db
        .prepare("SELECT value, version FROM scratchpad WHERE namespace=? AND key=?")
        .get(ns, key) as ScratchEntry | undefined;
      if (!row) throw new Error(`${ns}/${key} not found`);
      if (row.version !== expectedVersion) {
        throw new ConflictError(
          `version moved: expected ${expectedVersion}, got ${row.version}`
        );
      }
      const newVersion = row.version + 1;
      this.db.prepare(
        "UPDATE scratchpad SET value=?, version=?, updated_by=?, updated_at=? WHERE namespace=? AND key=?"
      ).run(value, newVersion, actor, Date.now() / 1000, ns, key);
      this.audit(ns, key, "replace", row.value, value, actor, reason, newVersion);
      return newVersion;
    });
    const version = tx();
    this.notify(ns, key, version);
    return version;
  }

  read(ns: string, key: string): ScratchEntry {
    const row = this.db
      .prepare("SELECT value, version FROM scratchpad WHERE namespace=? AND key=?")
      .get(ns, key) as ScratchEntry | undefined;
    return row ?? { value: "", version: 0 };
  }

  // Local pub/sub for visibility events. Production: Postgres LISTEN/NOTIFY or Redis.
  subscribe(fn: (ns: string, key: string, version: number) => void): () => void {
    this.listeners.add(fn);
    return () => this.listeners.delete(fn);
  }

  private notify(ns: string, key: string, version: number) {
    for (const fn of this.listeners) fn(ns, key, version);
  }

  private audit(
    ns: string, key: string, op: string,
    oldValue: string | null, newValue: string,
    actor: string, reason: string, version: number
  ) {
    this.db.prepare(
      "INSERT INTO scratchpad_audit VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
    ).run(
      randomUUID(), ns, key, op, oldValue, newValue, actor, reason,
      Date.now() / 1000, version
    );
  }
}

async function agentTurn(
  scratch: SharedScratchpad, ns: string,
  agentName: string, instructions: string, userMsg: string
): Promise<number> {
  const findings = scratch.read(ns, "findings");
  const openQ = scratch.read(ns, "open_questions");

  const { text } = await generateText({
    model: anthropic("claude-haiku-4-5"),
    system:
      `${instructions}\n\nYou are ${agentName}. The shared scratchpad shows:\n` +
      `<findings>\n${findings.value || "(empty)"}\n</findings>\n` +
      `<open_questions version=${openQ.version}>\n${openQ.value || "(empty)"}\n</open_questions>\n` +
      "Emit one short paragraph; do not duplicate existing findings.",
    prompt: userMsg,
  });
  return scratch.insert(
    ns, "findings", `[${agentName}] ${text.trim()}`,
    agentName, "incremental finding"
  );
}

async function demo() {
  const scratch = new SharedScratchpad();
  const ns = `task-${randomUUID().slice(0, 8)}`;

  const unsubscribe = scratch.subscribe((ns, key, v) =>
    console.log(`event: ${ns}/${key} -> v${v}`)
  );

  scratch.insert(
    ns, "open_questions",
    "What are the top 3 production risks for the Q3 launch?",
    "supervisor", "initial question"
  );

  for (const [agent, instr] of [
    ["web_researcher", "You are a web research specialist."],
    ["db_researcher", "You are an internal-data specialist."],
  ] as const) {
    const v = await agentTurn(
      scratch, ns, agent, instr,
      "Contribute one risk you can support with evidence."
    );
    console.log(`${agent} wrote v${v}`);
  }

  // CAS conflict demo.
  const { version: v0 } = scratch.read(ns, "open_questions");
  scratch.replace(ns, "open_questions", v0, "Updated by A", "agent_A", "first revision");
  try {
    scratch.replace(ns, "open_questions", v0, "Updated by B", "agent_B", "second revision");
  } catch (e) {
    if (e instanceof ConflictError) console.log(`Expected conflict caught: ${e.message}`);
  }

  unsubscribe();
}

demo();

The TypeScript version adds an explicit visibility-event channel — every write emits a notification that subscribers can react to. Production deployments wire this to Postgres LISTEN/NOTIFY, a Redis pub/sub channel, or a server-sent-event stream so that subscribing agents can react to writes from other agents without polling. This is what turns Pattern 3 (cross-thread store) into Pattern 4 (blackboard) — the same store with a scheduler watching for writes and dispatching the right reader.

Trade-offs, failure modes, gotchas

The “we just need to share state, how hard can it be” trap. Every two-week shared-memory prototype ships with a dict in Python or a Map in TypeScript and works fine until the third agent shows up. The bug then is not concurrency — it’s no consistency model anyone wrote down. The mitigation is to name the consistency model out loud in the first design review: strong vs eventual, read-your-writes vs last-writer-wins, append-only vs CAS. The teams that ship robust multi-agent systems write this down on day one; the ones that ship outages discover it on day 60.

The phantom-fact failure mode. The synthesizer’s report cites a number that no individual agent’s output contains. The audit log is the only thing that can resolve this — chained writes can produce intermediate states that no single agent ever saw in isolation, and the synthesizer reading the current state of the board cites a fact that was true for thirty seconds while two writes were in flight. The mitigation is every prompt assembly that uses shared memory pins to a specific version snapshot — the synthesizer reads (findings, version=k) and the rest of the run uses that snapshot, even if k has been superseded by the time the synthesizer’s call returns. The Postgres MVCC parallel is direct: every reader gets a stable snapshot; writers don’t block readers.

The blackboard runaway. A scheduler that wakes any agent whose preconditions match the current board state can ping-pong indefinitely — agent A writes evidence that wakes agent B which writes a hypothesis that wakes agent A again with a new finding. The same no-progress detector from the agent-loop article applies: track the last N board states (hashed), terminate the scheduler if they’re effectively identical. The blackboard-papers’ production answer is more nuanced — a coordinator that prioritizes wake events and deduplicates preconditions that recently fired — but a hard board-state-cycle detector is the floor below which the system is unsafe.

The reader-from-a-different-namespace bug. Agent A writes to ("tenant-X", "task-42", "findings"); agent B reads from ("tenant-X", "task-43", "findings") — different task, no overlap. The naive implementation gets this right. The version that breaks: agent B’s prompt assembly is templated and the harness uses a partial namespace tuple (("tenant-X",) instead of ("tenant-X", "task-43")), and B’s query against the Store returns A’s findings as a substring match. This is the persona-leak generalized — wrong scope, wrong data, wrong context. The mitigation is the namespace is a typed value, not a string — a Namespace class with required fields, validated at the harness boundary, never assembled by string concatenation. Production systems that skip this ship cross-tenant leaks; the bug is usually found by a security team six months later.

The “let’s share embeddings instead of text” optimization that becomes a bug. When the shared block holds a summary of findings, embedding the summary once and sharing the embedding seems efficient — readers can do similarity search without re-embedding. The bug: when the writer updates the summary, the embedding does not update unless explicitly re-embedded. Six hours later, a reader’s similarity search returns the original embedding’s match while the text has moved on. The memory-conflict article’s embedding-drift failure lands here too — the embedding is a derived view that has to be invalidated whenever the source moves. Either re-embed on every write (high cost, accurate) or treat the embedding as best-effort and re-rank with a cross-encoder at read time (cheaper, more robust).

The CRDT-converges-mechanically-but-not-semantically failure. The CodeCRDT paper’s 5–10% semantic conflict rate is the production-quantified version of the deeper issue: two agents can independently write things that are textually compatible (the CRDT merges them cleanly) but semantically incompatible (the merged result is internally contradictory or incoherent). CRDTs solve the convergence problem at the bytes level; they don’t solve the coherence problem at the meaning level. The mitigation is a periodic coherence pass — a reflection-style agent that reads the converged document and emits a coherent: bool plus a list of internal contradictions, which the harness uses to trigger reconciliation. CRDTs are the substrate; semantic coherence is the agent’s job on top of them.

The visibility-event flood. A scheduler that subscribes to every write on the board gets flooded when an agent emits many small writes (say, an iterative refinement loop). Each event wakes the scheduler; the scheduler scans every precondition; most fire spuriously. The cost shows up as scheduler latency and prompt-cache churn. Two mitigations: batch the notifications (the writer can opt into “I’m going to write 50 things, fire once at the end”) via an explicit batch context; filter at the broker (a topic-level filter so the scheduler only subscribes to writes on regions it actually cares about). The Kafka consumer-group analogy is direct — fan-out is configurable per consumer, not just per topic.

The “all agents see everything by default” governance problem. CrewAI’s default — every agent in the crew shares the same memory — is operationally simple but creates a governance problem: every new agent added to the crew immediately has visibility into everything every other agent wrote. The memory privacy and multi-tenancy article is the deep dive on the patterns that contain this; the design lesson here is that default scope matters more than the API surface. A framework whose default is “isolated, opt-in sharing” ships fewer governance bugs than one whose default is “shared, opt-in isolation.” Neither is wrong, but the failure modes are different.

The audit log nobody reads. Every shared-memory write goes to an audit table; the audit table grows to 50 million rows in three months; nobody ever queries it because the schema is awkward and the tooling is missing. When the phantom-fact bug fires, the audit log technically has the answer but nobody can find it. The mitigation is to treat the audit log as a first-class observability surface, not a compliance afterthought — instrument it with traces (every write emits a span with the actor, the namespace, the version, the reason), wire it to the same observability pipeline the rest of the agent system uses, and assume an on-call engineer will need to reconstruct a write timeline at 3am.

The “let’s add caching on top of the shared store” performance trap. A read-heavy shared store gets a per-agent read cache to reduce backend load. The cache is per-process; the store is global; the cache invalidation is on a TTL. For most of the day, this works fine. Then two agents race-condition into a state where agent A’s cache has v3 and agent B’s cache has v4 and v3 has been superseded by v4 — agent A acts on stale data for several seconds before its cache TTL expires. The bug looks like model hallucination. The mitigation is cache invalidation by version, not by time — every cache entry stores the version, every read checks “is this still current?” against the store, and the cache is invalidated by the visibility-event subscription rather than by a TTL. The cost is one extra read per request; the benefit is no stale-cache hallucinations.

The shared memory that’s secretly a single-writer pattern. A team builds an elaborate shared-block design with CAS writes, audit logs, and a CRDT layer. Six months later, the production trace shows that 97% of writes come from a single agent (the supervisor) and the other agents only read. Most of the consistency machinery is unused. The honest fix is to demote the design to Pattern 1 — the shared block becomes a supervisor-owned block, the workers read but don’t write, the CAS and CRDT code paths are removed. The general rule: if your traces show your shared memory is in fact single-writer, simplify. The cost of carrying unused consistency machinery is real — every developer who reads the code has to figure out which parts matter.

What to read next

Multi-Agent Orchestration — the upstream piece this article extends. Orchestration covered how agents talk; this article covered how they share state. The supervisor/swarm/hierarchical patterns from that piece each have a default shared-memory pattern that follows from them.
Working Memory: Scratchpads, Blackboards, and Agent Notebooks — the single-agent substrate this article generalized. The blackboard section there was a forward reference to today’s piece; the typed-state-object and dataflow-graph patterns there are the building blocks every shared-memory implementation reuses.
Memory Conflict, Forgetting, and Embedding Drift — the single-agent precursor to the consistency questions this article asks. The contradiction-resolver mechanic, the supersession-vs-deletion distinction, and the embedding-drift failure all generalize directly to the multi-writer case; the difference is that in shared memory the contradictions arrive concurrently rather than sequentially.
Memory Privacy, Isolation, and Multi-Tenancy — the next article and the direct follow-on. Where this piece worked the concurrency questions across agents within a tenant (who writes, in what order, with what consistency), the privacy piece works the containment questions across tenants (who is even allowed to see this, how do you prove deletion, how do you defend against MINJA-class memory-injection attacks). The reader-from-a-different-namespace bug and the all-agents-see-everything default both generalize directly there.