PII Detection and Data Privacy for LLM Systems
PII detection and data residency for LLM systems: Presidio cascades, OpenAI Privacy Filter, GDPR deletion pipelines, EU residency, and on-device inference.
A support agent answers a billing question on a Tuesday. The transcript flows through the production stack: user message → retrieval over the help-desk corpus → a reranker → the model → an output classifier → ship. The transcript also flows into the analytics pipeline that powers the drift detector, into the LangSmith span store, into the seven-day-retention API logs at the model vendor, and into the long-term memory tier that the next session will use to personalize the user. The customer mentioned their order number, which contained their email and the last four of their card. None of those derivatives are governed by a single policy. Each storage tier has its own retention clock, its own access control list, its own deletion API — and the customer, six weeks later, files a right-to-erasure request. The team has eighty days under GDPR Article 17 to demonstrate that every copy of those values, in every tier, is gone. That is the surface area this article exists to govern.
Opening bridge
Yesterday’s piece on guardrails framed the model boundary as a place where five attack classes converge: direct injection, indirect injection, sensitive-content generation, PII leakage, and tool-call misuse. Today’s article is the depth-first walk on category four — the only one of the five that is also a regulatory liability with statutory deletion timelines, not just a brand-and-trust problem. Guardrails answer “what is the model allowed to do?” The PII layer answers “what data is allowed to flow through the model, where can it be stored, and how do you prove it is gone?” The two layers share infrastructure — both run as pre- and post-call interceptors, both depend on classifier-based detection that fails against adaptive attackers, both rely on architectural defenses when classifiers don’t suffice — but the threat model and the success criteria are different enough that they deserve separate treatment. Today’s companion piece, agent budgets and runaway prevention, closes the Production & Operations subtree with the economic side of the same defense-in-depth story.
Definition
The PII layer is the set of runtime checks, transformations, and architectural decisions that govern which personal data flows through an LLM system, where it is stored, and how it is removed on demand. Three components compose the layer: detection (identifying personal data in inputs, outputs, retrieved context, tool results, and stored memory); transformation (redacting, masking, tokenizing, or de-identifying that data before it reaches a destination that shouldn’t see it); and residency (the architectural choice of where the inference happens, who has jurisdiction over the data at rest and in transit, and how deletion propagates across every derived artifact). A complete production deployment treats all three as inseparable — detection without residency lets the redacted data still cross a jurisdictional boundary; residency without detection routes the personal data correctly but doesn’t bound which fields are exposed to which downstream surfaces.
Three properties of the definition do load-bearing work. First, “personal data” is broader than “PII” in the colloquial sense. Under GDPR Article 4(1), personal data is any information relating to an identified or identifiable natural person — which includes IP addresses, device identifiers, browser fingerprints, location coordinates, and the embeddings derived from any of the above. The memory-privacy article documented that adversarial probing can reconstruct identifying content from dense embeddings of personal text; the legal corollary is that those embeddings are themselves personal data and the deletion pipeline applies to them with the same rigor as to plain-text rows. A PII layer that scrubs names and emails out of the prompt but stores the original message’s embedding without scoping is leaking personal data, regardless of what the detection regex caught.
Second, “data residency” is a separate axis from “data retention.” Residency answers where the data is processed and stored at rest; retention answers how long it persists. A zero-data-retention contract with a US-based vendor doesn’t satisfy EU residency requirements — the data was still processed under US jurisdiction, briefly enough that the CLOUD Act can subpoena it from the vendor mid-request. Conversely, an EU-hosted vendor with thirty-day retention satisfies residency but loses on retention. Production systems serving EU users typically need both: residency in-region and zero retention against the vendor.
Third, the PII layer’s hardest problem is not detection — it is deletion. A regex that catches 99% of email addresses on input is shippable today with weekend effort. Tracing every derived artifact downstream of a given user’s data — every chunk in the vector store, every reflection that quoted it, every prompt cache that contains it, every analytics event that hashed it, every embedding in every replica — and verifying that the user can be wiped on demand is six months of pipeline work that most teams under-invest in. The deletion invariant is the engineering deliverable; the detection is the easier prerequisite.
Intuition: the four kinds of PII exposure
Four exposure modes drive almost all production PII deployments. Pin them down before the mechanics, because the right mitigation differs sharply by mode.
1. Inbound exposure. The user types personal data into the prompt — their own SSN, their employer’s customer’s address, a screenshot of an invoice. This is the easiest case to bound: the personal data is known to be present (the user just typed it) and the operator has a clear policy choice — strip it before the API call, store it scoped, route it through an EU endpoint, or never send it at all.
2. Retrieved exposure. A RAG retrieval pulls a chunk of personal data into the prompt that the user is not authorized to see — another customer’s record, a deleted user’s old session, an internal employee directory entry. The user didn’t put the data in; the system did. This is the cross-tenant retrieval failure mode the memory-privacy article covered in depth, viewed from the privacy side rather than the multi-tenancy side.
3. Outbound exposure. The model generates personal data in its response that should not have shipped — an internal email address recalled from training data, a name pattern-matched from a context the user didn’t have access to, a credit card number extracted via a prompt-injection attack. This is what output-side PII scanners catch.
4. Telemetry exposure. The personal data exists only in the plumbing of the system — in observability spans, in eval golden sets, in drift-detection samples, in error logs, in the prompt-cache prefix that the next twenty users share. The user’s data never reached the model in a way the operator considered “exposed,” but it has been copied into half a dozen storage tiers that the deletion pipeline has to reach. This is the mode most teams under-attend; it is also the one a GDPR auditor is most likely to find a gap on, because the engineering team that built the eval pipeline isn’t usually the team that scoped the privacy policy.
The distributed-systems parallel
The cleanest analogue is how a credit-card processor handles cardholder data under PCI DSS. The PCI architecture has been refined for two decades around exactly the same problem: a piece of sensitive data has to flow through a system to do useful work, the system has to record what happened, the auditor has to be able to prove the data wasn’t in any tier it didn’t belong, and the deletion semantics have to be unambiguous. Map it across:
- Cardholder data → personal data. The piece of sensitive content that drives every layer’s design.
- PCI scope reduction → PII scope reduction. PCI environments shrink the set of systems “in scope” for cardholder data; PII pipelines shrink the set of systems that touch un-redacted personal data. Detection-and-redaction at the gateway is exactly the scope-reduction move PCI calls tokenization.
- Tokenization with a vault → reversible redaction with a mapping table. PCI replaces the card number with an opaque token; the original lives in a hardened vault accessible to a tiny number of services. Reversible PII redaction does the same: the model and downstream pipeline see
[EMAIL_3], the vault holds the mapping, and only the response-rendering layer (which already had the cleartext from the original request) un-redacts before showing the user. - Compensating controls → derivative invalidation. PCI requires that any place a card number was logged or cached must be wiped on incident; the GDPR-shaped deletion pipeline requires that any reflection, summary, embedding, or prompt-cache that depended on a deleted user’s data be invalidated and rebuilt.
- Quarterly attestation → deletion verification. PCI demands signed evidence that the controls are working; GDPR demands signed evidence that the deletion happened. The artifact is the same shape.
The mapping isn’t ornamental — PCI’s architectural lessons predate LLMs by decades and the patterns transferred almost unchanged. A team designing a PII layer in 2026 who has never built a PCI-scoped system is reinventing two decades of payments-engineering rigor; reading the PCI DSS v4.0 quick-reference for an afternoon is the fastest way to skip half the mistakes.
Mechanics: the detection cascade
The production-grade detector is not a single component. It is a cascade — three layers, each with different precision/recall/latency characteristics, run in series with early exit on confident hits.
Layer 1: regex on every byte. The first pass catches the structured patterns: email addresses, US/UK/EU phone formats, SSNs and national IDs, credit card numbers (with Luhn validation to suppress false positives), IBANs, IPv4/IPv6 addresses, MAC addresses, URLs containing personal subdomains, AWS access keys, Stripe customer IDs. Regex catches roughly 80% of the structured PII at sub-millisecond latency per kilobyte and zero memory cost. The failure mode is unstructured personal data — names, addresses spelled out in prose, identifiers that don’t match a known format. Regex alone is what most teams ship as v1 and what every auditor flags as insufficient by v2.
Layer 2: Named-entity recognition. The second pass is a small NER model — typically spaCy under Microsoft Presidio, occasionally a fine-tuned BERT-class model — that tags spans of text with entity types: PERSON, ORG, LOCATION, GPE, DATE. NER catches the unstructured cases regex misses, at the cost of single-digit-millisecond per kilobyte and a few hundred MB of model weights. The failure mode is contextual — a name that’s also a common noun (“Ms. Booker”), an organization name that’s a person’s name, a date that’s incidental rather than tied to a person. NER alone has higher recall but lower precision than regex; the false-positive rate runs 5–15% in production traffic.
Layer 3: LLM-based classification. The third pass — used sparingly, only on flagged samples or on the highest-risk surfaces — runs a guard model (Llama Guard 4 S7: Privacy category, OpenAI’s Privacy Filter released as open-weight in 2026, or a small dedicated classifier) to disambiguate the cases NER couldn’t. LLM-based classification handles context the previous layers can’t (“this is a fictional name in a story” vs “this is a real person’s name in a customer support ticket”), at the cost of 50–200ms of additional latency and the dollar cost of an extra inference call.
The cascade pattern is the same shape as the reranking cascade — a cheap, high-recall first pass narrows the candidate set; a more expensive, higher-precision later pass refines the verdict. The production calibration: regex on every byte; NER on every byte that regex didn’t already cover with high confidence; LLM-based classification only on flagged samples in eval pipelines or on the highest-risk inputs (uploaded documents, retrieved chunks crossing tenant boundaries). Prediction Guard’s PII pipeline writeup lays out the same three-tier architecture from a regulated-industries angle; the cascade is by now consensus.
Mechanics: the transformation modes
Once detected, the personal data has to be transformed before it crosses whatever boundary the policy guards. Four modes, in increasing order of how much utility they preserve:
Mode 1: Redaction. Replace the span with a placeholder: [EMAIL], [REDACTED], █████. Loses all information including type. Cheap, deterministic, irreversible. Appropriate for telemetry exports and eval samples where the personal data was never load-bearing.
Mode 2: Masking. Preserve type, lose specifics: [EMAIL_1], [PHONE_2]. Adds an index so the model can distinguish multiple PII spans in the same prompt (“Send the report to [EMAIL_1] and CC [EMAIL_2]”). Cheap, deterministic, reversible if you store the mapping. The reversible variant — store [EMAIL_1] → [email protected] in a tenant-scoped vault, un-mask in the response — is the dominant pattern for production LLM pipelines because it preserves utility without exposing the cleartext to the model or to downstream telemetry.
Mode 3: Synthetic substitution. Replace [email protected] with [email protected]. Preserves type and locale (a UK postcode stays a UK postcode); the model sees realistic-looking text and produces coherent output; the original lives in the vault and substitutes back on rendering. This is the Presidio anonymizer engine’s default operation. Higher utility than masking (the model handles plausible names better than placeholder tokens), at the cost of having to be careful that the substitution doesn’t accidentally introduce a real person’s identity — a synthetic name generator that pulls from a finite pool can collide with a real customer’s name in the next request.
Mode 4: Tokenization with a hardened vault. The full PCI pattern: the personal data lives in a separate system with its own access controls, audit log, and encryption-at-rest with a customer-managed key; the LLM pipeline sees opaque tokens; un-tokenization happens at the rendering boundary by a service that re-authenticates the user. Highest utility (the model’s output can include the original values, indirectly, via the rendering layer un-tokenizing) and highest engineering cost. Appropriate when the data is high-sensitivity (healthcare, financial) or the regulatory regime demands segregated storage.
The choice between modes is policy, not engineering. The engineering question is: does the operation need to be reversible, and which boundary does the cleartext live behind? Once those are answered, the mode follows.
Latency, cost, and where it sits in the request path
The cascade adds latency in the obvious place — between the user request and the model call — and again on the response side. A realistic budget on a typical pipeline:
| Stage | Median latency | Cost adder | Notes |
|---|---|---|---|
| Regex pass | 0.5–2ms/KB | $0 | Pure CPU, easy to parallelize |
| NER pass (Presidio + spaCy) | 5–20ms/KB | $0 (self-hosted) | Single-threaded per request; batch where you can |
| LLM-based classifier (input) | 50–200ms | $0.0001–0.001 | Only on flagged samples or high-risk paths |
| Vault token mint/lookup | 1–5ms | $0 | Local with a Redis-class store |
| Output cascade (mirror of above) | 7–25ms | $0 + classifier sample | Run in parallel with output guard where possible |
| Total typical overhead | ~15–50ms | <$0.001 typical | Dominated by NER; LLM tier is sampled |
The latency budget interacts with the rest of the production stack. Run the cascade before the prompt-cache prefix — if you tokenize after the cache hits, the cache stores cleartext and the deletion pipeline can’t reach it. Run the cascade before the observability tracer sees the prompt — span attributes are personal data when they contain personal data, and the trace store is a separate retention tier the deletion API has to invalidate. Run the cascade after the user’s authentication is verified — pre-auth requests don’t deserve the latency budget, and unauthenticated traffic that contains personal data is its own incident.
The structural alternative — and the one the on-device-inference camp argues is the only honest answer — is to avoid the latency entirely by avoiding the network call. On-device inference shipping in 2026 (Apple Foundation Models, executorch on Android, MLC Chat) means the personal data never leaves the device, the PII detection cascade becomes optional rather than mandatory, the residency question is trivially answered, and the deletion is what the OS already does when the user uninstalls the app. The trade-off is model quality — the 3B-parameter on-device class is materially weaker than frontier cloud models, and the model-routing layer becomes the natural place to send only privacy-sensitive workloads on-device while keeping the rest in the cloud. We return to this in trade-offs.
Code: a detect-redact-deidentify pipeline in Python
The example is the v1 production cut: Presidio’s analyzer for detection, a vault for reversible mapping, redaction modes per entity, and a clean interface that the rest of the harness can call. Install: pip install presidio-analyzer presidio-anonymizer openai anthropic. The analyzer needs python -m spacy download en_core_web_lg once.
| |
The shape worth internalizing: detection is symmetric (input and output), tokenization is asymmetric (only the tokens this request minted are eligible to be reversed on the response), and the deletion API (vault.purge) is the artifact a GDPR auditor asks for. The naive failure mode — un-tokenize every token in the output — is exactly how a multi-tenant system leaks across tenants when the model regurgitates a token from a different tenant’s prompt cache.
Code: gateway-side PII handling with LiteLLM in TypeScript
The other production pattern is to push the PII layer to the gateway, so the application code doesn’t reason about it at all. LiteLLM ships a Presidio integration as a built-in callback; the application calls /chat/completions, and the gateway intercepts. Install: npm install openai. The gateway itself is a Python process — the TypeScript here is the client shape, with the gateway URL pointed at localhost:4000:
| |
The gateway pattern is the right shape for organizations with many small applications and a central platform team. The application code shrinks to “call the gateway”; the platform team owns the PII policy, the vault, the deletion pipeline, the residency routing. The trade-off is loss of in-process control — the application can’t peek at the cleartext mid-pipeline, which is sometimes load-bearing for legitimate UX (the rendering layer needs the cleartext to display to the original user). The standard answer is a separate render endpoint on the gateway that re-attaches cleartext for the original requesting user, scoped by tenant + session + token-vintage to keep cross-tenant leakage closed. The LiteLLM Presidio tutorial walks the full configuration; treat it as a v1 starting point and add per-tenant scoping before any production traffic.
The regulatory contour that actually matters
Two regulatory frames dominate 2026’s privacy decisions for LLM systems, and a third frame is the one most teams underestimate.
Frame 1: GDPR’s right to erasure. Article 17 gives EU data subjects the right to have their personal data deleted on request, with a default response window of one month (extensible to three for complex cases). The compliance bar is proof of deletion across every tier, not best-effort. The memory-privacy article walked the deletion pipeline for the memory layer; the privacy-layer corollary is that the pipeline extends to the prompt cache, the observability spans, the eval golden set, the drift-detection sample, the embedding-derived analytics rollups, and any vendor-side log that retained the personal data within its retention window. Production teams that ship the deletion API without a verification query cannot answer the auditor’s “prove this user is gone” question; the verification is the engineering deliverable.
Frame 2: EU AI Act, full enforcement August 2, 2026. The EU AI Act entered partial force in 2024 with general-purpose AI transparency obligations starting August 2025; full enforcement for high-risk systems kicks in August 2, 2026, with penalties reaching €35M or 7% of global annual turnover. The AI Act is not a re-implementation of GDPR — it focuses on risk classification of AI systems and demands documentation, conformity assessment, and post-market monitoring for high-risk uses. For most LLM application teams, the operational impact is the documentation burden (model cards, system cards, intended-purpose statements) and the high-risk classification triggers (employment, credit scoring, education, law enforcement) more than the residency math. But the act does require that personal data feeding high-risk systems be governed under GDPR, which loops back to Frame 1.
Frame 3: CLOUD Act jurisdiction. The US CLOUD Act gives US law enforcement the authority to compel US-headquartered companies to produce data in their possession, custody, or control — regardless of where the servers physically sit. The practical consequence is that an EU-region deployment of OpenAI, Anthropic, or Google still leaves the data subject to US legal process via the parent company. This isn’t a hypothetical: data residency analyses coming out of the AI Act enforcement runway flag that EU-region hyperscaler tenants are not in scope for EU data residency under a strict reading. The honest production answer is one of: (a) accept the CLOUD Act exposure as a documented residual risk; (b) route EU traffic through an EU-domiciled provider with no US corporate parent (Mistral, Cohere via Cohere’s EU subsidiary, Aleph Alpha); or (c) move the inference on-device so jurisdiction is moot. Each has real costs. The first is what most teams ship; the third is what the privacy purists argue for.
The zero-data-retention amendment. OpenAI’s EU residency program and Anthropic’s enterprise tier both ship zero-data-retention amendments that suppress the default 7-day (Anthropic) or 30-day (OpenAI) retention window. ZDR is necessary but not sufficient for residency: it bounds retention but not jurisdiction. A request that hits OpenAI’s EU endpoint under ZDR is processed in-region without persistence, but the parent entity is still US-domiciled and still subject to CLOUD Act process during the in-flight window. The combination of EU-region endpoint + ZDR amendment + an EU-only data processing agreement is the closest you can get to “real” EU residency with a US-domiciled vendor in 2026, and is the configuration most enterprise deployments converge on.
Trade-offs, failure modes, gotchas
Detection precision degrades on multilingual and code-switched text. Presidio’s English NER is well-tuned; its Spanish and German recognizers are decent; its Hindi, Tamil, Vietnamese, and Indonesian recognizers are spotty. The Mamezou intro to Presidio documents adding custom recognizers, but the engineering cost is real — a serious multilingual deployment needs language-specific NER models or a multilingual LLM-based detector as the fallback. Audit the precision/recall on a representative sample of your traffic, not the vendor’s benchmark, before believing the published numbers.
Reversible tokenization expands the attack surface. The vault is now a high-value target — a credential leak or SQL injection on the vault rebuilds the cleartext for every active session. Mitigations: customer-managed encryption keys, per-tenant key derivation so a single-tenant breach doesn’t expose all tenants, audit logging on every reveal, automatic token expiry, IP allowlists on the vault API. The PCI DSS v4.0 requirements for tokenization vaults transfer almost unchanged.
The “but the embeddings are anonymized” misconception. Embeddings of personal data are not anonymous. Inversion attacks on dense embeddings can reconstruct surprisingly faithful approximations of the source text; the memory-privacy article documents that adversarial probing extracts identifying information from embeddings of personal content. Treat embeddings derived from personal data as personal data themselves; the deletion pipeline applies to them with the same rigor as to plain-text rows. A redaction pipeline that scrubs the prompt but lets the un-redacted source flow through the embedding step is leaking by construction.
Synthetic substitution can leak via collisions. A Faker-generated email might happen to be a real customer’s. A synthetic name might match a customer in the next request’s context. Mitigations: derive substitutions deterministically from the tenant + entity hash so the same input yields the same output (avoiding cross-request leakage), salt the substitution function with a tenant-specific secret (avoiding cross-tenant collisions), check the synthetic value against an exclusion list of known customer identifiers before emitting. None of these are free; the safer default for high-sensitivity data is to skip synthetic substitution entirely and use reversible tokenization.
Output-side detection has higher consequences than input-side. An undetected PII span on input means the model sees data it shouldn’t have — bad, expensive, but contained inside your own systems. An undetected PII span on output means the model shipped personal data to the user (potentially the wrong user, in a multi-tenant context) or to a downstream system (which then has it in its retention tier). Calibrate output-side detection more conservatively than input-side. The asymmetry is the same shape as output guardrails being the last line of defense for content policy.
Telemetry pipelines are the silent leak. Every observability span that records the prompt records the personal data. Every eval golden set built from production samples freezes a personal-data snapshot. Every drift-detection histogram keyed by token identity preserves the structure of personal data even when the content is hashed. Each of these is a retention tier the deletion API has to reach. The mature pattern is to route the redacted form into the telemetry pipeline, not the cleartext — the trace store sees [EMAIL_3], not [email protected], and the trace store’s own retention is governed against the redacted form. Teams that ship cleartext into observability are taking on a deletion-pipeline scope they don’t realize.
Apple’s Foundation Models framework changes the calculus for personal-data-heavy iOS apps. Apple’s iOS 26 Foundation Models ship a 3B-parameter on-device model accessible from any Swift application, free at inference. For workloads where the personal data lives on the user’s device (notes, messages, photos metadata) and the inference is comfortable at 3B-class quality, the residency question is trivially answered — no data leaves the device, no GDPR retention timer runs on the model vendor, no CLOUD Act jurisdiction attaches. The trade-off is model quality and the iOS-only constraint. executorch and MLC Chat provide the cross-platform analogue, with the same on-device-inference privacy property and a similar quality envelope. The model-routing layer is the natural place to send only the privacy-sensitive surfaces on-device and keep the rest in the cloud.
Deletion is not just delete-then-prove. The GDPR-shaped pipeline is: identify all derivative artifacts (chunks, embeddings, summaries, reflections, cache prefixes, span records, eval samples, vault entries) → tombstone the originals → rebuild the derivatives that depended on them → verify the deletion by querying every tier → emit a signed attestation. The memory-privacy article’s seven-step deletion pipeline is the most complete account in this curriculum; the privacy-layer corollary is that the same pipeline runs on the same schedule for every personal-data tier, not just the memory tier. If your eval pipeline isn’t on the deletion graph, your deletion is incomplete.
Further reading from the field
- Microsoft Presidio — official documentation — the canonical reference for the open-source detector and anonymizer. Read the analyzer recognizers list and the anonymizer operators before designing your detect-redact pipeline; both are the cleanest description of what a production-grade PII stack actually contains, and the PII Shield blog post shows how Microsoft itself wraps Presidio as a privacy proxy in front of LLM calls.
- OpenAI — Introducing the OpenAI Privacy Filter — the May 2026 release of an open-weight model purpose-built for detecting and redacting PII in text, designed to run locally. This is the highest-quality drop-in alternative to NER-based detectors for the LLM-based third tier of the cascade, and the open-weight release means it can run inside your own infrastructure with no data egress.
- EDPB — Opinion 28/2024 on AI models and personal data processing — the European Data Protection Board’s December 2024 opinion on when AI models can be considered anonymized, the legitimate-interests basis for training and deployment, and the deletion-and-erasure expectations. This is the most authoritative single source on how EU regulators expect LLM systems to be governed in 2026.
- Simon Willison — Prompt injection and AI agents — the running tag index Willison maintains; every post in the tag is relevant to the PII-via-injection failure mode, and his lethal trifecta framing is the cleanest formal statement of when an agent can exfiltrate personal data via injection.
What to read next
- Guardrails: Input and Output Safety Layers — yesterday’s piece in the subtree; the broader safety framing this PII article specializes one slice of, and the source of the input/output cascade architecture the PII layer mirrors.
- Memory Privacy, Isolation, and Multi-Tenancy — the memory-layer companion to this article; the deletion pipeline and namespace-discipline patterns covered there are the structural complement to the runtime detection layer covered here.
- Production Tracing and Observability — the telemetry pipeline that is the silent leak this article calls out. The intersection — telemetry that stores personal data — is where most production deployments accumulate their largest hidden retention liability.
- Agent Budgets and Runaway Prevention — the closing piece of the Production & Operations subtree. The economic-safety counterpart to the data-residency-safety story this article walks: the same defense-in-depth discipline applied to dollars and tokens, with the seven enforcement primitives every agent harness needs to ship.