What this policy does
A reasonable default for “I just want to redact common PII without thinking about it too hard.” Catches:
- Personal identity: names (confidence-gated to reduce false positives), passport numbers, driver’s license numbers
- Contact info: phone numbers, email addresses, URLs, IP addresses
- Government identifiers: SSNs
- Financial: credit cards (Luhn-validated), IBANs
Does not cover:
- Healthcare-specific identifiers (MRN, hospital names) — use a healthcare policy instead
- PCI-specific masking (this policy fully redacts cards; for PCI scope reduction with last-4 visible, use pci-dss-scope-reduction.json)
- Court-filing rules (use legal policies)
- Custom identifiers (MRNs, account numbers, internal IDs) — add
identifierspatterns for your domain
When to use this
- Quick starts when you’re evaluating Philter against your data
- Catch-all log scrubbing in non-regulated environments
- Default-deny posture: redact aggressively, then loosen specific entity types as your use case clarifies
When NOT to use this
- Regulated workloads. HIPAA, PCI, GDPR, FERPA, and similar regimes have specific requirements — use a policy designed for that framework.
- Datasets where over-redaction breaks downstream value. This policy is biased toward over-redaction. For research, ML, or analytics use cases, see the date-shifted clinical-notes policy or build a domain-specific one.
When to customize
- Name confidence. Default
> 70is moderately conservative. Lower to> 50for higher recall (catches more rare/foreign names at the cost of false positives on capitalized common words). Raise to> 85for higher precision. - URL and IP redaction. Some applications need to retain these for analytics. Remove the
urloripAddressentries if so. - Add custom identifiers for any deployment-specific patterns: internal customer IDs, ticket numbers, employee badges, etc.
Tuning workflow
- Run this policy against a representative sample of your data.
- Inspect the redactions. Note any over-redaction (legitimate text caught) or under-redaction (PII missed).
- Tighten thresholds, add
ignoredterms, or add customidentifierspatterns based on what you find. - Re-evaluate. Repeat until precision and recall meet your bar.
Philter Scope automates step 3 — score policy changes against a gold-standard test set so you can measure regressions instead of guessing at them.