Also known as: PII detection, data redaction, anonymization, de-identification
TL;DR
Detecting and removing personally-identifiable information from LLM inputs and outputs — names, emails, phone numbers, addresses, IDs. A classic small-model task: high-volume, narrow, latency-sensitive, with structured target output.
PII redaction — finding and removing personally identifiable information from LLM inputs and outputs — is one of the most underrated production tasks in modern AI systems. It’s not glamorous; it’s also non-negotiable in any pipeline that touches healthcare, finance, legal, customer support, or HR data.
What counts as PII
PII categories
Direct identifiers. Full names, email addresses, phone numbers, physical addresses, government IDs (SSN, passport), credit card numbers, IP addresses.
Indirect identifiers. Date of birth + zip code, employer + role, distinctive medical conditions, biometric descriptors. Combinations that re-identify with high probability.
Quasi-PII. Account numbers, internal user IDs, session tokens. Not legally PII but often equally sensitive in production contexts.
Different jurisdictions (GDPR, HIPAA, CCPA) have different definitions. Production systems usually adopt the strictest superset.
The Sweeney 2000 result: 87% of the US population is uniquely identifiable from the triple (zip code, date of birth, sex) alone. Quasi-identifiers compose multiplicatively — each one cuts the candidate population by an order of magnitude, and three rough features can re-identify a specific person from anonymized data. HIPAA’s Safe Harbor rule explicitly removes 18 such fields including dates more granular than year and zip codes more granular than the first three digits, precisely because composition attacks defeat per-field redaction. The practical implication: a PII model that treats fields independently is broken; you need a model that recognizes when a combination of features is identifying even when each in isolation isn’t.
Why it’s a small-model task
PII redaction has the classic “specialize and ship” shape:
High volume. Every input passing through your LLM stack needs to be checked.
Latency-sensitive. Sits in the request hot path. Under 50ms budget; under 10ms ideal.
Narrow. Detect spans matching a fixed taxonomy of PII types. No open-ended reasoning needed.
Structured output. Span annotations (start, end, type) per detection. Token classification or sequence-to-sequence with structured output.
Stable target. The notion of “this is a phone number” doesn’t drift the way “what’s a good response” does.
Every property that argues for a specialized fine-tuned small model over a frontier LLM call is present. A 0.5-3B model trained specifically on PII detection runs at single-digit milliseconds per request, costs ~10⁻⁵ of a frontier-LLM call, and hits 99%+ recall.
How it gets built
The training pipeline is the standard specialize-and-distill recipe:
Synthetic data generation . A frontier LLM produces (text containing PII, list of PII spans) pairs. Coverage tuned by prompting for diverse PII types, contexts, formats.
Augment with adversarial cases. PII embedded in code, in JSON, in natural prose; PII that looks like other text (a date that’s actually a code, a name that’s also a common noun); contextual disambiguation.
Fine-tune a base model. Token classification or generation-with-tags. Qwen3-1.5B / Llama-3-1B / Phi-3 are common starting points.
Validate on real held-out PII data. With strict access controls; the production validation set is the high-stakes part.
Where it fits in the pipeline
Two places it usually sits:
Input filter. Before the user message reaches the LLM, redact PII. Logs and traces never see raw PII; the LLM works on placeholders (“[NAME_1] called [PHONE_1] yesterday”).
Output filter. Before the LLM’s response leaves the system, scan for PII the model may have leaked from context. Defense in depth.
For observability specifically, the input filter is what makes it safe to log full traces — you can capture rich data without violating compliance.
The recall bias
Production PII redaction is asymmetric: missing PII is a regulatory and reputational disaster; redacting a non-PII word that looks like PII just makes one response slightly worse. So PII models are tuned to high recall (99.5%+) at the cost of some precision . Threshold tuning is one of the few cases where “just lower the threshold” is the right answer — pre-compliance reviewers will tell you the same.
The right Pareto frontier for PII redaction is wildly asymmetric: a missed name is a lawsuit; a redacted non-name is a typo.
Go further
Why use a specialized model instead of regex or a frontier LLM?
Regex misses context-dependent PII (Bob with last name Apple is different from buying an Apple). Frontier LLMs work but cost $0.01-1 per call and add 200-2000ms latency on every request — unworkable in the input pipeline. A 0.5-3B fine-tuned model hits 99%+ recall at <50ms latency on a single GPU.
PII redaction biases hard toward recall — missing PII is a compliance breach; redacting too aggressively just makes the response a bit less natural. Production systems target 99.5%+ recall; precision around 90-95% is fine. The Pareto frontier shifts dramatically vs balanced classification tasks.
Synthetic generation, mostly. A frontier LLM produces (text with PII, redacted text, span annotations) triples at scale; the small model trains on that. Real customer data is too risky to use directly for training PII detectors — synthetic data sidesteps the meta-compliance problem.