$ cat ai-engineering/pii-and-privacy.md

PII Detection and Data Privacy for LLM Systems

How to detect, transform, retain, and delete personal data in LLM request and storage paths.

Jatin Bansal@blog:~/ai-engineering$ open pii-and-privacy

Personal data can pass through retrieval, model providers, traces, analytics, caches, and long-term memory during one request. Detection and transformation must happen before each disclosure boundary, while retention and deletion controls must cover every stored derivative.

Detect PII in stages

The production-grade detector is not a single component. It is a cascade; three layers, each with different precision/recall/latency characteristics, run in series with early exit on confident hits.

Layer 1: regex on every byte. The first pass catches the structured patterns: email addresses, US/UK/EU phone formats, SSNs and national IDs, credit card numbers (with Luhn validation to suppress false positives), IBANs, IPv4/IPv6 addresses, MAC addresses, URLs containing personal subdomains, AWS access keys, Stripe customer IDs. Regex catches roughly 80% of the structured PII at sub-millisecond latency per kilobyte and zero memory cost. The failure mode is unstructured personal data; names, addresses spelled out in prose, identifiers that don’t match a known format. Regex alone is what most teams ship as v1 and what every auditor flags as insufficient by v2.

Layer 2: Named-entity recognition. The second pass is a small NER model; typically spaCy under Microsoft Presidio, occasionally a fine-tuned BERT-class model; that tags spans of text with entity types: PERSON, ORG, LOCATION, GPE, DATE. NER catches the unstructured cases regex misses, at the cost of single-digit-millisecond per kilobyte and a few hundred MB of model weights. The failure mode is contextual; a name that’s also a common noun (“Ms. Booker”), an organization name that’s a person’s name, a date that’s incidental rather than tied to a person. NER alone has higher recall but lower precision than regex; the false-positive rate runs 5–15% in production traffic.

Layer 3: LLM-based classification. The third pass; used sparingly, only on flagged samples or on the highest-risk surfaces; runs a guard model (Llama Guard 4 S7: Privacy category, OpenAI’s Privacy Filter released as open-weight in 2026, or a small dedicated classifier) to disambiguate the cases NER couldn’t. LLM-based classification handles context the previous layers can’t (“this is a fictional name in a story” vs “this is a real person’s name in a customer support ticket”), at the cost of 50–200ms of additional latency and the dollar cost of an extra inference call.

The cascade pattern is the same shape as the reranking cascade; a cheap, high-recall first pass narrows the candidate set; a more expensive, higher-precision later pass refines the verdict. The production calibration: regex on every byte; NER on every byte that regex didn’t already cover with high confidence; LLM-based classification only on flagged samples in eval pipelines or on the highest-risk inputs (uploaded documents, retrieved chunks crossing tenant boundaries). Prediction Guard’s PII pipeline writeup lays out the same three-tier architecture from a regulated-industries angle; the cascade is by now consensus.

Choose a transformation

Once detected, the personal data has to be transformed before it crosses whatever boundary the policy guards. Four modes, in increasing order of how much utility they preserve:

Mode 1: Redaction. Replace the span with a placeholder: [EMAIL], [REDACTED], █████. Loses all information including type. Cheap, deterministic, irreversible. Appropriate for telemetry exports and eval samples where the personal data was never load-bearing.

Mode 2: Masking. Preserve type, lose specifics: [EMAIL_1], [PHONE_2]. Adds an index so the model can distinguish multiple PII spans in the same prompt (“Send the report to [EMAIL_1] and CC [EMAIL_2]”). Cheap, deterministic, reversible if you store the mapping. The reversible variant; store [EMAIL_1] → [email protected] in a tenant-scoped vault, un-mask in the response; is the dominant pattern for production LLM pipelines because it preserves utility without exposing the cleartext to the model or to downstream telemetry.

Mode 3: Synthetic substitution. Replace [email protected] with [email protected]. Preserves type and locale (a UK postcode stays a UK postcode); the model sees realistic-looking text and produces coherent output; the original lives in the vault and substitutes back on rendering. This is the Presidio anonymizer engine’s default operation. Higher utility than masking (the model handles plausible names better than placeholder tokens), at the cost of having to be careful that the substitution doesn’t accidentally introduce a real person’s identity; a synthetic name generator that pulls from a finite pool can collide with a real customer’s name in the next request.

Mode 4: Tokenization with a hardened vault. The full PCI pattern: the personal data lives in a separate system with its own access controls, audit log, and encryption-at-rest with a customer-managed key; the LLM pipeline sees opaque tokens; un-tokenization happens at the rendering boundary by a service that re-authenticates the user. Highest utility (the model’s output can include the original values, indirectly, via the rendering layer un-tokenizing) and highest engineering cost. Appropriate when the data is high-sensitivity (healthcare, financial) or the regulatory regime demands segregated storage.

The choice between modes is policy, not engineering. The engineering question is: does the operation need to be reversible, and which boundary does the cleartext live behind? Once those are answered, the mode follows.

Place privacy checks in the request path

The cascade adds latency in the obvious place; between the user request and the model call; and again on the response side. A realistic budget on a typical pipeline:

Stage	Median latency	Cost adder	Notes
Regex pass	0.5–2ms/KB	$0	Pure CPU, easy to parallelize
NER pass (Presidio + spaCy)	5–20ms/KB	$0 (self-hosted)	Single-threaded per request; batch where you can
LLM-based classifier (input)	50–200ms	$0.0001–0.001	Only on flagged samples or high-risk paths
Vault token mint/lookup	1–5ms	$0	Local with a Redis-class store
Output cascade (mirror of above)	7–25ms	$0 + classifier sample	Run in parallel with output guard where possible
Total typical overhead	~15–50ms	<$0.001 typical	Dominated by NER; LLM tier is sampled

The latency budget interacts with the rest of the production stack. Run the cascade before the prompt-cache prefix; if you tokenize after the cache hits, the cache stores cleartext and the deletion pipeline can’t reach it. Run the cascade before the observability tracer sees the prompt; span attributes are personal data when they contain personal data, and the trace store is a separate retention tier the deletion API has to invalidate. Run the cascade after the user’s authentication is verified; pre-auth requests don’t deserve the latency budget, and unauthenticated traffic that contains personal data is its own incident.

The structural alternative; and the one the on-device-inference camp argues is the only honest answer; is to avoid the latency entirely by avoiding the network call. On-device inference shipping in 2026 (Apple Foundation Models, executorch on Android, MLC Chat) means the personal data never leaves the device, the PII detection cascade becomes optional rather than mandatory, the residency question is trivially answered, and the deletion is what the OS already does when the user uninstalls the app. The trade-off is model quality; the 3B-parameter on-device class is materially weaker than frontier cloud models, and the model-routing layer becomes the natural place to send only privacy-sensitive workloads on-device while keeping the rest in the cloud. We return to this in trade-offs.

Code: a detect-redact-deidentify pipeline in Python

The example is the v1 production cut: Presidio’s analyzer for detection, a vault for reversible mapping, redaction modes per entity, and a clean interface that the rest of the harness can call. Install: pip install presidio-analyzer presidio-anonymizer openai anthropic. The analyzer needs python -m spacy download en_core_web_lg once.

python

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
from __future__ import annotations
import os, secrets, time, json
from dataclasses import dataclass, field
from typing import Literal

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig
from anthropic import Anthropic

# Detection: Presidio's analyzer covers the structured + NER tiers.
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
client = Anthropic()

RedactionMode = Literal["redact", "mask", "synthetic", "tokenize"]

@dataclass
class VaultEntry:
    token: str
    cleartext: str
    entity_type: str
    tenant_id: str
    created_at: float = field(default_factory=time.time)

class Vault:
    """Token vault for reversible redaction. Tenant-scoped, in-memory here;
    in production, swap for an HSM-backed store with its own audit log."""

    def __init__(self) -> None:
        self._by_token: dict[str, VaultEntry] = {}
        self._by_cleartext: dict[tuple[str, str], str] = {}  # (tenant, cleartext) -> token

    def mint(self, cleartext: str, entity_type: str, tenant_id: str) -> str:
        key = (tenant_id, cleartext)
        if key in self._by_cleartext:
            return self._by_cleartext[key]
        # Token format: TYPE-<random>; opaque to the model, type-readable for prompts.
        token = f"[{entity_type}-{secrets.token_hex(4).upper()}]"
        self._by_token[token] = VaultEntry(token, cleartext, entity_type, tenant_id)
        self._by_cleartext[key] = token
        return token

    def reveal(self, token: str, tenant_id: str) -> str | None:
        entry = self._by_token.get(token)
        if entry is None or entry.tenant_id != tenant_id:
            return None
        return entry.cleartext

    def purge(self, tenant_id: str) -> int:
        """GDPR right-to-erasure. Returns count of tokens purged for verification."""
        to_drop = [t for t, e in self._by_token.items() if e.tenant_id == tenant_id]
        for t in to_drop:
            entry = self._by_token.pop(t)
            self._by_cleartext.pop((entry.tenant_id, entry.cleartext), None)
        return len(to_drop)

@dataclass
class PiiResult:
    transformed_text: str
    detected: list[dict]
    tokens_minted: list[str]
    cascade_ms: float

def detect_and_transform(
    text: str,
    tenant_id: str,
    vault: Vault,
    mode: RedactionMode = "tokenize",
    language: str = "en",
) -> PiiResult:
    """Cascade: regex + NER via Presidio; transform per the policy mode."""
    t0 = time.perf_counter()
    results = analyzer.analyze(text=text, language=language)

    detected = [
        {
            "type": r.entity_type,
            "start": r.start,
            "end": r.end,
            "score": r.score,
            "span": text[r.start : r.end],
        }
        for r in results
    ]

    if mode == "redact":
        op = OperatorConfig("replace", {"new_value": "[REDACTED]"})
        out = anonymizer.anonymize(text=text, analyzer_results=results, operators={"DEFAULT": op})
        return PiiResult(out.text, detected, [], (time.perf_counter() - t0) * 1000)

    if mode in ("mask", "tokenize"):
        tokens_minted: list[str] = []
        # Build span -> token map for each unique cleartext per entity type.
        operators: dict[str, OperatorConfig] = {}
        # Presidio applies a single operator per entity type by default; for
        # per-span tokens we mint up-front and rewrite the text ourselves.
        # Sort spans right-to-left so offsets stay valid as we splice.
        out_text = text
        for r in sorted(results, key=lambda x: x.start, reverse=True):
            cleartext = text[r.start : r.end]
            token = vault.mint(cleartext, r.entity_type, tenant_id)
            tokens_minted.append(token)
            out_text = out_text[: r.start] + token + out_text[r.end :]
        return PiiResult(out_text, detected, tokens_minted, (time.perf_counter() - t0) * 1000)

    if mode == "synthetic":
        # Presidio's `replace` with type-aware synthetic values via Faker.
        from faker import Faker
        fake = Faker()
        replacements = {
            "EMAIL_ADDRESS": lambda: fake.email(),
            "PHONE_NUMBER": lambda: fake.phone_number(),
            "PERSON": lambda: fake.name(),
            "LOCATION": lambda: fake.city(),
            "CREDIT_CARD": lambda: fake.credit_card_number(),
            "US_SSN": lambda: fake.ssn(),
            "IBAN_CODE": lambda: fake.iban(),
        }
        # Right-to-left splice with synthetic substitution; vault still tracks
        # so the response layer can un-substitute if utility requires.
        out_text = text
        tokens_minted = []
        for r in sorted(results, key=lambda x: x.start, reverse=True):
            cleartext = text[r.start : r.end]
            synth = replacements.get(r.entity_type, lambda: "[REDACTED]")()
            # Store cleartext->synthetic in vault so we can reverse on rendering.
            token = vault.mint(cleartext, r.entity_type, tenant_id)
            tokens_minted.append(token)
            out_text = out_text[: r.start] + synth + out_text[r.end :]
        return PiiResult(out_text, detected, tokens_minted, (time.perf_counter() - t0) * 1000)

    raise ValueError(f"unknown mode: {mode}")

def reveal_tokens(text: str, tenant_id: str, vault: Vault) -> str:
    """Un-tokenize on the rendering boundary. Only the response layer that
    already had cleartext for this tenant should call this."""
    import re
    pattern = re.compile(r"\[[A-Z_]+-[0-9A-F]{8}\]")
    return pattern.sub(
        lambda m: vault.reveal(m.group(0), tenant_id) or m.group(0),
        text,
    )

# --- Production path: request -> detect -> call model -> detect on output -> render ---
def answer(user_text: str, tenant_id: str, vault: Vault) -> dict:
    inbound = detect_and_transform(user_text, tenant_id, vault, mode="tokenize")

    # Model never sees the cleartext; the tokens give it enough structure to
    # reason about the request without seeing the personal data.
    system = (
        "You are a helpful assistant. Inputs may contain opaque tokens like "
        "[EMAIL-DEADBEEF] standing in for personal data. Quote them verbatim "
        "where you need to reference the underlying value; the rendering "
        "layer will substitute back before display."
    )
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=system,
        messages=[{"role": "user", "content": inbound.transformed_text}],
    )
    model_text = response.content[0].text  # type: ignore[attr-defined]

    # Output-side cascade: catch PII the model might have generated that wasn't
    # in the input (training-data leakage, hallucinated identifiers).
    outbound = detect_and_transform(model_text, tenant_id, vault, mode="redact")

    # Un-tokenize ONLY the tokens we minted on this request's inbound side;
    # any newly-detected outbound PII stays redacted.
    rendered = reveal_tokens(outbound.transformed_text, tenant_id, vault)

    return {
        "rendered": rendered,
        "inbound_detected": inbound.detected,
        "outbound_detected": outbound.detected,
        "cascade_latency_ms": inbound.cascade_ms + outbound.cascade_ms,
    }

The shape worth internalizing: detection is symmetric (input and output), tokenization is asymmetric (only the tokens this request minted are eligible to be reversed on the response), and the deletion API (vault.purge) is the artifact a GDPR auditor asks for. The naive failure mode; un-tokenize every token in the output; is exactly how a multi-tenant system leaks across tenants when the model regurgitates a token from a different tenant’s prompt cache.

PII Detection and Data Privacy for LLM Systems

Detect PII in stages

Choose a transformation

Place privacy checks in the request path

Code: a detect-redact-deidentify pipeline in Python

Further reading from the field