Memory Privacy, Isolation, and Multi-Tenancy
Per-tenant memory isolation for LLM agents: namespace discipline, cross-tenant leak modes, prompt-injection-via-memory, and verifiable GDPR deletion.
The on-call page fires at 02:14. A customer-support agent at a B2B SaaS company has told a user the previous question on the account was about a different company’s product. The first reaction is “model hallucination”; the second is “let’s check the trace.” The trace is worse than the hallucination would have been. The retrieval call against the memory store returned three episodes — two from the correct account, one from a different tenant entirely. The namespace argument on the retrieval call was missing. The code path was a debug endpoint that had been quietly used in production for six weeks. The cross-tenant leak had been live the whole time; this is the first conversation where a user noticed. The bug is one missing function argument. The blast radius is every customer the agent has ever served. This article is the deep dive on the layer where that argument has to be structurally impossible to forget, not just easy to remember.
Opening bridge
Yesterday’s piece on multi-agent shared memory closed with a forward reference: “CrewAI’s default — every agent in the crew shares the same memory — is operationally simple but creates a governance problem… the memory-privacy article (next in the subtree) is the deep dive on the patterns that contain this.” That piece walked the consistency questions across agents within a tenant. Today flips axes from concurrency to containment — the question is no longer “which agent’s write wins” but “which agents and which users are even allowed to see this write in the first place.” Every memory tier we’ve built so far (episodic, semantic-via-graphs, hierarchical, procedural, the identity layer, the shared scratchpad) has had a tenant_id argument lurking in its API surface as a “don’t skip this” footnote. This article promotes that argument to a first-class architectural concern, with its own threat model, its own discipline, and its own deletion pipeline.
Definition
Multi-tenant memory isolation is the property that every read and write against a memory store is scoped to a tenant — a stable identifier for the principal whose data is being accessed — and that no read can return data outside the caller’s tenant scope, no write can land in another tenant’s namespace, no derived artifact (cache, summary, embedding, reflection) can outlive a tenant deletion, and the system can produce an audit trail demonstrating these properties on demand. Four properties are load-bearing. First, structural scoping — the tenant is a required parameter of every API surface, not a filter the caller can forget. Second, no cross-tenant retrieval — even when an attacker provides a malformed scope, the worst case must be “no results,” not “results from a different tenant.” Third, complete deletion coverage — when a tenant is deleted, every byte derived from their data is invalidated, including caches, summaries, and reflections. Fourth, verifiable audit — every read, write, and deletion is attributed, timestamped, and replayable against compliance demonstrations.
The tenant is typically a user, but not always — depending on the product surface it may be an organization, a workspace, a project, or a session. The architectural shape is the same; the granularity is a product decision. The 2026 EDPB coordinated erasure enforcement audit of 764 controllers found that many controllers had no specific procedures for erasure in backup systems, and that anonymization techniques used as a substitute for deletion were often weak and amounted to mere pseudonymization. That finding is the regulatory-reality calibration for how much of this article’s machinery is genuinely necessary versus theoretical.
Intuition
The mental model that pays off is isolation is a property of the API surface, not of the storage layer. Every memory framework in production runs on top of a database — Postgres, Redis, Pinecone, Qdrant — that has rock-solid row-level isolation primitives. The leaks never come from those primitives failing. They come from a code path that bypasses the framework’s scoped API and goes straight to the underlying client with no tenant filter, or from a default value (tenant_id=None) that the framework happily executes as “match everything.” The bytes were never co-mingled on disk; the query was wrong. The mitigation has to live where the wrong query was assembled, which is at the harness boundary — not inside the database.
Three escalating shapes the tenant scope can take, in order of how hard it is to bypass:
- Optional argument. The function accepts a
tenant_idkeyword. Forget it; everything still runs. This is the failure mode in the opening anecdote. Most prototypes ship this. - Required argument. The function refuses to run without
tenant_id. Forget it; you get an exception at the call site, ideally during development. Mem0’s design makesuser_idstructural — “at least one of user_id, agent_id, or run_id is required” — which moves the bug from “production data leak” to “type error before staging.” - Structural scope. The function operates against an object whose existence is itself scoped — the
Store.namespace((tenant_id, "memories"))returns a handle that physically cannot read or write outsidetenant_id. LangGraph’s Store API lands here; the namespace tuple is the operating handle, and the only way to switch tenants is to construct a different handle, which is visible in code review.
The framework you pick determines the floor; the discipline you apply determines the ceiling. A team with strong discipline can build safely on shape #1; a team with shape #3 will still leak if they cargo-cult a debug endpoint that bypasses the scoped handle.
The distributed-systems parallel
The clean analogue is Unix process isolation and the setuid boundary. The kernel guarantees that process A cannot read process B’s memory pages without explicit IPC; the analogue to LLM memory isolation is that one tenant’s vectors cannot be retrieved by another tenant’s query. Both rely on the same architectural shape:
- A trusted boundary (the kernel; the harness) that owns the scope check.
- An immutable scope identifier (the UID; the tenant ID) attached to every operation.
- A default-deny posture — operations without a scope are rejected, not “matched globally.”
- A privileged path for cross-scope operations (root; the admin/support API) with much stricter audit.
The 2026 enterprise multi-tenant LLM SaaS literature reports that “cross-tenant data exfiltration and knowledge base poisoning had the highest amplification factors” among LLM-SaaS attack classes. Those are the same two attack classes the kernel addresses (cross-process memory reads; injecting into another process’s address space) — and the same set of mitigations applies (mandatory scope checks at the syscall boundary; provenance on writes; audit on privileged operations). The kernel community has had forty years to design these primitives well. The LLM-memory community is at year three. Borrowing the architectural pattern is the cheapest way to avoid reinventing it badly.
A second analogue: the multi-tenant database namespace versus the application’s query layer. Postgres has row-level security policies; tenant-aware ORMs (Django’s tenant schemas, Hasura’s permissions) push the policy enforcement into the framework so the application code can’t bypass it. The 2026 Pinecone multi-tenancy guide makes this explicit at the vector-store level: “In the serverless architecture, each namespace is stored separately, so using namespaces provides physical isolation of data between tenants/customers, which reduces the risk of application bugs that could query the wrong tenant’s data.” Physical isolation at the storage layer is the safety net; the API discipline is the structural fix.
Threat model
Privacy-and-isolation work is engineering against a specific threat model. Five attack classes show up in production traces, in roughly descending order of frequency.
1. The unscoped-query bug
The opening anecdote. A code path issues a retrieval against the memory store without a tenant filter; the store returns top-K by similarity across all tenants; the agent surfaces the foreign content as its own memory. The damage scales with how interesting the cross-tenant overlap is — a customer-support agent in a niche industry can return a direct competitor’s data because the embeddings cluster.
The mitigation is structural: the API surface refuses to run an unscoped query, the underlying client is never called directly, and every retrieval emits a span with the tenant ID attached so an observability sweep can detect “any retrieval without a tenant tag” as a hard alert.
2. The namespace-confusion attack
A more subtle version. The system is correctly scoped but the scope value comes from user-controllable input. A user under tenant A constructs a request that smuggles tenant B’s identifier into the scope — either by directly setting it (the API trusts a client-side header) or by indirection (the agent’s tool call accepts a “switch context” command that the user can trigger). The retrieval is technically correctly scoped; it’s scoped to the wrong tenant.
The mitigation is to derive the tenant scope server-side from an authenticated session, never trust a client-supplied value, and treat any cross-tenant operation as a privileged-API call with separate audit. The pattern is the same as a web app trusting X-User-Id headers — a known anti-pattern with a thirty-year track record of postmortems.
3. Memory injection (MINJA/MEXTRA)
The 2025 MINJA paper demonstrated that an attacker who can only interact with an agent via queries and observations can inject malicious records into the agent’s memory bank. These records are designed to elicit a sequence of malicious reasoning steps when the agent later processes a different target query — the attack persists across sessions, hidden in the memory. The companion MEXTRA paper showed that adversarial probing of the memory module can leak private user-agent interaction data. The reported injection success rate exceeds 98%; the end-to-end attack success rate exceeds 70% across GPT-4, GPT-4o, GPT-4o-mini, Gemini-2.0-Flash, and Llama-3.1-8B. Evaluated defenses — LlamaGuard, embedding-level sanitization, prompt-based detection — were ineffective.
The mitigations are layered. First, write-side authentication — every memory write is attributed to an actor (user-content vs. agent-summary vs. system-fact), and the retrieval prompt distinguishes these (the provenance layer from the temporal-reasoning article is the structural prerequisite). Second, content-class gating — user-supplied content is never directly written to a long-lived store without a distillation pass; a write-policy gate decides what becomes durable, and adversarial inputs that are clearly trying to inject memory are caught by a small classifier. Third, retrieval-side classification — at read time, a sanity check on the retrieved memories flags content that looks like instruction injection rather than fact recall, and routes it for review rather than direct use. None of these is a complete defense; together they raise the bar significantly. The MINJA result is the empirical ceiling on what’s possible — assume any production system has some attack surface here, instrument accordingly.
4. Embedding-similarity leakage
Embeddings of one tenant’s content can be similar to embeddings of another tenant’s content. If the same vector index serves both tenants and the scope filter is post-retrieval (the index returns top-K globally and the filter is applied after), an attacker who controls one tenant’s content can craft inputs whose embeddings sit close to a target tenant’s sensitive embedding, then use the retrieval call’s latency or metadata leak to confirm a hit. This is the LLM-memory analogue of cache-timing attacks on cryptographic operations — the side-channel is real even when the direct read fails.
The mitigation is to push the scope filter into the index, not after it — Pinecone’s per-tenant namespaces are physically separate; pgvector’s per-tenant tables or row-level security give the same property; a single shared index with a metadata filter applied after top-K is the vulnerable shape. The 2026 Burn-After-Use paper on enterprise LLM multi-tenant architectures formalizes this: “No conversational histories or vector embeddings are shared across tenants, effectively preventing cross-department inference.”
5. Derived-artifact contamination
The hardest class. A tenant’s data was written, retrieved, summarized by a reflection pass, the summary was added to a semantic store, that summary was retrieved into a system-wide “common knowledge” prompt-cache prefix, and now every tenant’s prompts ship the contaminated cache. The original tenant data is correctly isolated; the derivative is not. The same shape applies to fine-tuned model variants trained on user data — the model itself is a derived artifact, and deletion of the source data doesn’t unlearn the weights.
The mitigation is the derived-artifact registry — every artifact whose content depends on tenant data is tracked with its sources, and any source-tenant deletion triggers cascading invalidation. The 2025 unlearning-at-scale literature describes the model-weights side; the application-layer side is a more tractable engineering problem of cache and summary invalidation. The honest position, per the cross-session identity article’s deletion discussion, is that some derived artifacts (deployed fine-tunes) cannot be fully scrubbed, and the product surface should be transparent about it.
Mechanics: the four invariants
A production multi-tenant memory layer maintains four invariants. Each invariant has a specific implementation pattern; together they’re the contract.
Invariant 1: Every operation carries a tenant scope
The scope is a typed value — a TenantScope class with required fields, never a bare string concatenation. The API surface for the memory layer takes TenantScope as a required argument; the underlying database driver is never exposed directly. A code review of the harness should be able to grep for db.query( or client.query( and find zero occurrences outside the scope-checking layer.
The reader-from-a-different-namespace bug from the shared-memory article is the canonical failure mode here — a templated namespace tuple, a missing component, a substring match on a partial namespace. The mitigation is the typed value plus an exact-match contract on the database layer; partial-prefix matches are explicitly disabled.
Invariant 2: Default deny on missing scope
If a query arrives without a scope, the system errors rather than returning anything. This is a posture choice — the safe default is “no results unless explicitly scoped” — and it’s the inverse of the “match everything” default that ships in most prototype code paths. The implementation is one line at the top of every memory function: if scope is None: raise ScopeRequiredError(...). Skipping it costs you the audit report.
Invariant 3: Every write is attributed and timestamped
Every memory write records who wrote it (user content vs. agent summary vs. system fact vs. external tool result), when, and which scope. The attribution is what lets the retrieval prompt distinguish “user said X” from “the agent inferred X” from “the system asserted X” — without it, MINJA-class attacks succeed because the model has no way to know whether a retrieved memory is trustworthy. The provenance layer from temporal reasoning is the substrate; this invariant is what makes the substrate mandatory rather than optional.
Invariant 4: Deletion is verifiable
When a tenant is deleted, the system can demonstrate the deletion. The demonstration has three parts: an audit log entry attesting to the operation, a verification query that confirms zero rows remain in any tenant-keyed table or namespace, and a derived-artifact invalidation log listing every cache, summary, or reflection that was rebuilt because it depended on the deleted tenant’s data. Production systems that ship the deletion API without the verification query cannot answer the GDPR demand “prove this user is gone”; the verification is the engineering deliverable, not the deletion itself.
The deletion pipeline
The GDPR right-to-erasure machinery is the most-asked-about part of this layer. The naive implementation is one DELETE statement; the production implementation is a pipeline with seven steps, in order:
- Authentication and authorization. The deletion request is verified — the requester is the tenant or an authorized agent acting on their behalf. The request itself goes to an audit log before execution.
- Scope enumeration. The system lists every store, namespace, and derived artifact that references the tenant. The list is the contract for the deletion; an incomplete list is the source of every deletion-coverage bug.
- Cache invalidation. Every prompt-cache prefix, embedding-cache entry, and intermediate-result cache that references the tenant is dropped. The prompt-caching article’s cache-key discipline is the prerequisite — cache keys that include the tenant ID can be invalidated by key prefix; cache keys that don’t have to be wholesale-purged.
- Hard deletion on primary stores. Every primary store row, vector, and graph node in the tenant’s namespace is deleted. This is the database operation everyone thinks of; it’s step 4 of 7.
- Derived-artifact rebuild. Summaries, reflections, semantic facts, and procedural skills that were derived from the tenant’s data are dropped. Some of these are tenant-private (already covered by step 4); others are shared (a cross-tenant semantic-fact store, a global procedural-skill cache). The shared derivatives need a rebuild pass that excludes the deleted tenant’s contributions.
- Verification query. A read against every namespace listed in step 2 confirms zero remaining rows. The verification is logged with the deletion audit entry.
- Compliance attestation. A signed record of the deletion is emitted — what was deleted, when, by whom, with the verification result. This is the artifact a GDPR auditor asks for.
Step 5 is the one most implementations skip. The memory-reflection article noted that reflections create derivative claims that survive their sources; the privacy-layer corollary is that those derivatives have to be tracked so they can be invalidated. The mature pattern is a derived-artifact registry — every reflection, every summary, every semantic-fact entry records a derived_from: [episode_ids] field, and the deletion pipeline traverses the graph forward from the deleted tenant’s episodes to find and rebuild every dependent.
The 2026 EDPB enforcement audit flagged anonymization-as-deletion as the most common cheat — pseudonymizing the user’s identifier and claiming the data is gone, when the embeddings are still reidentifiable. The regulatory direction is clear: pseudonymization is not erasure for AI-backed systems. The hard-deletion pipeline is the engineering answer.
Code: Python — tenant-scoped memory with verifiable deletion
A minimal but production-shaped implementation. A TenantScope typed value, a scoped memory API where every operation requires the scope, an audit log on every read/write/delete, and a deletion pipeline with a verification step. Uses the Anthropic SDK for the model, psycopg for Postgres, and pgvector for the embedding column. Install: pip install anthropic psycopg[binary] pgvector.
| |
Four properties the implementation enforces: every public method takes a TenantScope (no default value, no Optional — the type system rejects unscoped calls); the database connection is encapsulated and never exposed (the only way to bypass scope enforcement is to subclass and call the private connection, which is visible in code review); every operation writes an audit-log row in the same transaction as the operation itself (you cannot have a memory mutation that’s invisible to compliance); the deletion pipeline has all seven steps including the verification query that confirms zero rows remain.
The deliberate omission is the cross-tenant derived-artifact enumeration — the code finds derived rows but doesn’t trigger their rebuild. A production implementation pairs this with a job queue (the sleep-time compute substrate from the consolidation article is the natural fit) that rebuilds shared semantic facts and reflections excluding the deleted tenant’s contributions.
Code: TypeScript — namespace-scoped memory with prompt-injection-aware retrieval
Functionally equivalent in TypeScript using the Vercel AI SDK and Postgres via postgres. The added wrinkle: a retrieval-side classifier that flags content that looks like prompt-injection-via-memory (the MINJA class) before it’s surfaced to the agent. Install: npm install ai @ai-sdk/anthropic postgres.
| |
Three properties beyond the Python version. The TenantScope type is branded — a plain string cannot be passed where a TenantScope is expected, even through TypeScript’s structural typing, because the brand field is unforgeable; this is the closest TypeScript gets to the typed-value discipline Python’s frozen dataclass provides. The searchAndScreen method runs a retrieval-side classifier that flags memories looking like prompt-injection attempts — a partial defense against MINJA-class attacks, with the documented limitation that the evaluated defenses in the MINJA paper were ineffective at high attacker success rates; this is a layer of defense, not a solution. The attestation record is returned to the caller, not just logged, so the calling system can demonstrate the deletion to a regulator without having to query the audit table after the fact.
Trade-offs, failure modes, gotchas
The “scope as optional argument” prototype that ships to production. The most common path to a cross-tenant leak: the original prototype accepted tenant_id as a keyword argument with a default of None, the team meant to make it required before launch, the deadline came, and “we’ll harden it after the demo” became permanent. The bug is dormant for months because everyone passes it in the happy path; the leak fires the first time a new endpoint forgets the argument. The mitigation has to happen at the API design step — the scope is structurally required (a brand, a frozen dataclass, a typed handle), not enforced by code review. The 2026 enterprise multi-tenant LLM analysis reports that “cross-tenant data exfiltration and knowledge base poisoning had the highest amplification factors” — these are the bugs you cannot afford to find in production.
The shared-index-with-post-filter trap. The retrieval call against a vector store goes “find me top-K nearest neighbors across all data, then filter by tenant.” This works until two things go wrong simultaneously. First, top-K is small (say K=5) and all five nearest neighbors are in a different tenant — the post-filter returns nothing, and the agent is starved of memory it should have had. Second, the latency of the call varies with how many candidates the filter eliminates — a side channel an attacker can use to confirm a foreign tenant has data similar to a query. Both failure modes go away if the filter is pushed into the index: pgvector’s WHERE tenant_id = ... before the ORDER BY embedding <=>, Pinecone’s per-tenant namespaces, Qdrant’s per-collection partitioning. The 2026 Pinecone multi-tenancy guide is explicit: namespace-per-tenant gives “physical isolation of data between tenants.” This is the architectural pattern; “metadata filter on a shared index” is the anti-pattern.
The cache-key without the tenant. The prompt-caching article covered the cost benefits of warm prefix caches. The privacy-layer corollary: cache keys that don’t include the tenant ID cannot be invalidated on deletion. A prefix cache keyed by a hash of the prompt content, with the tenant data baked into the content, looks correct — until the next tenant’s prompt happens to hash to a cached prefix and the agent serves cross-tenant content. The mitigation is to make the tenant ID part of every cache key, not just rely on content hashing. The cost is lower cache hit rates across tenants; the benefit is structural isolation.
The reflection-leak via shared semantic store. A reflection over a tenant’s episodes produces a higher-order claim that lands in a shared semantic-facts store (“Customers in industry X commonly ask about Y”). The semantic store has no tenant scope because it’s deliberately cross-tenant prior art. When the original tenant requests deletion, the reflection persists — and the reflection carries information derived from their data. The mitigation is the derived-artifact registry from the deletion pipeline — every shared-store row records its source episodes, and the deletion pipeline rebuilds shared rows whose sources include any deleted-tenant episode. The cost is real (rebuild jobs are slow); the alternative is GDPR exposure.
The pseudonymization-as-deletion cheat. The 2026 EDPB enforcement audit called this out specifically: many controllers replace the user identifier with a pseudonym and claim the data is deleted, when the data is still reidentifiable. For vector stores this is especially damning — the embeddings themselves carry semantic content that can be re-attributed to a person without the user_id column. Pseudonymization is not erasure for AI-backed memory; the deletion pipeline has to physically remove rows, vectors, and derivatives. The product surface should be transparent about what the deletion covers and what (deployed fine-tunes, backups) might still persist.
The “but the embeddings are anonymized” misconception. Embeddings are not anonymous. The 2025 memory-privacy survey literature documents that adversarial probing can extract identifying information from embeddings of personal content, and the inversion attacks on dense embeddings reconstruct surprisingly faithful approximations of the source text. Treat embeddings as personal data when they’re derived from personal data; the deletion pipeline applies to them with the same rigor as to plain-text rows.
The MINJA write that the system attributes to the agent. The MINJA attack writes adversarial content into the agent’s memory bank under the agent’s attribution — the harness lacks a clear distinction between “user said X” and “agent wrote a summary of X” at the storage layer. The retrieval prompt then reads the adversarial content as if it came from the agent itself (“I previously concluded that…”), and the model treats it with higher trust than user content. The mitigation is the provenance discipline from the temporal-reasoning article — every memory row carries an actor enum (user/agent/system/tool), the retrieval prompt renders the actor explicitly, and the model is instructed to weight system/tool content as authoritative and user content as suspect. This doesn’t defeat MINJA — the paper reports defenses are ineffective — but it raises the floor.
The audit log that grows faster than the data. Every read, every write, every deletion writes an audit row. The audit table accumulates faster than the memory table itself (one write produces one memory row but typically two or three audit rows when you count the cache invalidations and the verification queries). After six months, the audit table dominates the storage cost. The mitigation is a tiered audit policy — high-volume read events go to a cheaper store (S3 with hot/cold tiering) with a shorter retention window; write and deletion events stay in the primary database with full retention; compliance demonstrations join against both. The Postgres-with-partition-pruning pattern is the standard answer; the cheap version is “audit-log table partitioned by month, drop the partitions older than the retention window.”
The deletion request that races a write. A user requests deletion at T0; a background reflection job started at T-5s lands a new derivative write at T+10s. The deletion ran at T+1s and the verification confirmed zero rows; ten seconds later there’s a new row. The naive fix is a TTL on deletion attestations (re-verify after some period) but the right fix is deletion as a barrier: every write checks the deletion-attestation table for an active deletion of its tenant before committing, and refuses to write if one is in progress. The Postgres-SERIALIZABLE-isolation parallel is direct — concurrent operations against the same tenant must serialize through the deletion check.
The cross-tenant retrieval that’s intentional. Some product surfaces deliberately allow cross-tenant retrieval — a research-tool agent that summarizes “what are common questions across all customers” needs to read across tenants. This is not a bug; it’s a privileged operation. The mitigation is to expose it through a separate API surface — read_cross_tenant_aggregate(...) with its own permission gate, its own audit class, and a guarantee that the output is aggregated/anonymized before it’s surfaced. The Unix-root-vs-user analogue is direct: most operations are user-scoped, but some require explicit elevation, and the elevation is visible in audit. Mixing cross-tenant reads into the normal API surface is what produces incident postmortems.
The deletion that breaks the procedural-skill cache. A user’s procedural skills — successful action sequences cached for retrieval — were stored in a shared skill-library namespace because the skills were useful to other users. When the user requests deletion, the skills they contributed are still in the library. The mitigation is the same derived-artifact registry pattern: shared skills record their source users; deletion triggers a rebuild that excludes the deleted user’s contributions. If a skill’s only source was the deleted user, it’s deleted outright. If it had multiple sources, it’s preserved without the deleted contribution. This is the memory-conflict article’s supersession-versus-deletion distinction applied at the cross-tenant boundary.
The “we’ll add audit later” trap. Audit is the most-postponed part of this layer because it doesn’t surface in the happy path — the product works fine without it, the only people who notice are the on-call engineer trying to reconstruct an incident and the compliance team trying to demonstrate a deletion. Both notice catastrophically when audit is missing. The mitigation is to write audit in the same transaction as the operation it audits, never as a fire-and-forget side effect. If the audit write fails, the operation fails. This makes audit a hard requirement of every operation rather than a best-effort observability artifact.
Further reading
- Mem0 — Entity-Scoped Memory and the four-scope model (user_id, agent_id, run_id, app_id) — the production-framework reference for required-by-design scope parameters. The “at least one of user_id, agent_id, or run_id is required” rule is the cleanest available example of moving tenant isolation from “remember to pass it” to “the API refuses to run without it.” Read alongside Mem0’s multi-agent memory production guide for the org_id-plus-user_id composition pattern.
- Pinecone — Multi-Tenancy in Vector Databases — the vector-store side of the isolation story. The namespace-per-tenant pattern with serverless physical separation is the architectural model; the offboarding-by-namespace-delete property is what makes GDPR deletion tractable at the vector layer. Pair with the Pinecone production-engineering deep-dive on multi-tenancy for the namespace-limits and per-namespace-performance numbers that govern at-scale design.
- MINJA: Memory Injection Attack on LLM Agents (2025) — the attack literature. The 98% injection success rate against GPT-4/GPT-4o, the demonstrated failure of LlamaGuard / embedding-sanitization / prompt-based defenses, and the cross-session persistence are the production-realistic threat model for any agent system with a writable memory. Pair with MEXTRA’s privacy-extraction follow-up which works the cross-user-data-leak side of the same attack surface.
- EDPB Coordinated Erasure Enforcement Audit, 2025/2026 — the regulatory calibration. The audit of 764 controllers across 32 DPAs documents the most common shortcuts (pseudonymization-as-deletion, no backup-deletion procedure, weak anonymization), which are the cheats the deletion pipeline in this article is explicitly designed not to take. This is the “what an auditor actually checks” reference.
- Burn-After-Use: Preventing Data Leakage through Secure Multi-Tenant Architecture (2026) — the architectural reference for enterprise multi-tenant LLM deployments. The SMTA-plus-BAU pattern formalizes the “no conversational histories or vector embeddings shared across tenants” property as the structural fix for cross-department inference; useful as the published-research grounding for the namespace-per-tenant pattern the rest of this article advocates.
What to read next
- Multi-Agent Shared Memory — the immediate predecessor. Where this article works the containment axis (who is allowed to see this), the shared-memory article worked the concurrency axis (who wrote this and when). The reader-from-a-different-namespace bug, the all-agents-see-everything default, and the audit-log discipline are the bridges between the two.
- Cross-Session Identity and Personalization — the prerequisite. The identity layer’s persona-leak production incident, the export/edit/delete triad, and the right-to-be-forgotten discussion all generalize from per-persona to per-tenant in this article; the identity-record and namespace-scope patterns compose directly.
- Memory Conflict, Forgetting, and Embedding Drift — the unlearning-vs-forgetting distinction and the user-asserted-deletion classifier. The hard-deletion pipeline in this article is the multi-tenant generalization of the single-user erasure path covered there; the same audit discipline applies.
- Long-Term Memory: Vector-Backed Episodic Storage — the substrate. Every isolation primitive in this article (namespace tuple, required user_id, where-clause-before-ANN) is layered on top of the vector-backed episodic store; the tenant-isolation discipline the earlier article flagged in the snippet comments is the architectural concern formalized here.