Redaction for Insurance: Claims, Customer Data, and the State-by-State Patchwork

Insurance is the vertical most people forget to mention when discussing regulated data. Healthcare gets HIPAA; banking gets GLBA and PCI; tech gets GDPR and CCPA. Insurance gets all of them at once — plus state-level insurance commissioner rules, plus the NAIC Insurance Data Security Model Law, plus (for health insurers) HIPAA in addition to everything else.

For carriers, third-party administrators, and insurtech platforms, that overlapping regulatory environment combines with a uniquely PII-dense data flow: claims adjusters write free-text notes, customers correspond via email and chat, third-party medical reports arrive in PDFs, agent training data captures live calls. PII is everywhere; the redaction problem is constant.

This post walks through the regulatory landscape, the identifier set, and three architecture patterns where leakage most commonly happens in insurance workflows.

The regulatory landscape

Five frameworks dominate insurance data handling, and most carriers are subject to all five at once:

GLBA Safeguards Rule. The Gramm-Leach-Bliley Act applies to insurance because insurance is a "financial institution" under the statute. The 2023 FTC update requires a written information security program with specific technical controls. NPPI categories overlap heavily with what carriers collect for underwriting and claims.
HIPAA (for health, dental, and vision insurers). Same Safe Harbor identifier rules as healthcare providers — we covered the 18 identifier categories in detail. For carriers operating in both health and other lines, the HIPAA-eligible data has to be segregated by entity type.
State insurance commissioner rules. Every state has its own. New York's DFS Cybersecurity Regulation (23 NYCRR 500) is the strictest and most often cited as the de facto national standard; California (CCPA + insurance-specific rules), Texas (TDPSA), and others add their own layers.
NAIC Insurance Data Security Model Law (MDL-668). Adopted in most states; mandates risk assessments, incident response plans, and notification requirements specific to insurance carriers. Effectively raises the floor for what "reasonable security" means.
State medical records privacy laws (where applicable). Some states extend HIPAA-like protections to medical information held by insurers regardless of the federal HIPAA applicability.

The practical implication: a carrier operating in 20 states is subject to ~20 different state regulations on top of the federal floor, and the requirements don't always align cleanly. The right architectural response is to redact to the strictest applicable standard and document it — one policy that satisfies the maximum, not a per-state policy matrix.

The insurance identifier set

Insurance carriers collect a wider variety of identifiers than most financial-services peers because the underwriting and claims processes pull from many adjacent contexts:

Identifier	Source / context	Philterd handling
SSN / TIN	GLBA, underwriting	Built-in SSN/TIN detector
Policy numbers	GLBA / NAIC	Custom identifier filter — carrier-specific format
Claim numbers	GLBA / NAIC	Custom identifier filter
VIN (auto claims)	GLBA, state insurance laws	Built-in VIN detector with checksum validation
Driver's license number	GLBA, state DMV data	Built-in driver's license detector
License plate	State insurance laws	Custom identifier filter — per-state plate formats
Bank account / routing	GLBA (premium payment)	Built-in ABA detector + custom identifier
Medical record numbers	HIPAA (health lines)	Custom identifier filter
Diagnosis / ICD codes	HIPAA	Custom dictionary (full ICD-10 / ICD-11 lists available)
Provider / NPI numbers	HIPAA	Custom identifier filter with 10-digit NPI validation
Property addresses	GLBA, NAIC	NER + address detector
Biometric data (where collected)	State BIPA-like laws	Out of scope for text redaction; handled at ingestion-layer isolation
Date of birth, names, contact info	GLBA + HIPAA + state laws	Built-in detectors for each

The custom identifier categories (policy numbers, claim numbers, plate formats per state) are where carrier-specific configuration earns its keep. The format of a Geico policy number differs from a Progressive policy number; both look different from an Allstate claim ID. Phileas handles each via a regex pattern in the policy file — no model retraining required.

Architecture 1: claims processing pipelines

Claims is where insurance PII piles up fastest. A single auto-claim generates: a First Notice of Loss document, photographs, police reports, repair estimates, adjuster narrative notes, vehicle-history reports, recorded statements (transcribed), and final settlement documentation. Each artifact is dense with PII and each downstream consumer (analytics, fraud detection, ML training for next-gen adjuster-assist tools, archival storage) has its own redaction requirements.

  Claim artifacts ──▶ raw claims store (locked to claims-handling)
        │                          │
        │                          ▼
        │              Philter (NPPI + HIPAA where applicable)
        │                          │
        │                          ▼
        │                  deidentified claims data
        │                          │
        │              ┌───────────┼───────────┬──────────────┐
        │              ▼           ▼           ▼              ▼
        │       Fraud analytics  Reserves   ML training   Quality review
        │
        └──▶ Phinder discovery scan ──▶ governance dashboard

The split between "raw zone for claims handling" (which legitimately needs the full PII) and "de-identified zone for everyone else" is the same separation-of-duties pattern that works in financial services and healthcare.

Architecture 2: customer correspondence (email and chat)

Modern carriers run multichannel customer service: email, in-app chat, agent-mediated chat, SMS, voice (transcribed). Customers freely paste their full Social Security Numbers, policy numbers, bank accounts, and dates of birth into chat windows because they're used to validating their identity that way. The transcripts then flow into ticket systems, analytics, QA scoring, and ML training.

Two surfaces need attention:

Inbound redaction at the ticket-system boundary. The ticket system should receive only the redacted transcript; the raw transcript should be retained briefly (for compliance / fraud purposes) in a locked-down store and aged out per the retention policy.
Agent-facing redaction. An optional but increasingly common pattern: redact PII in the agent's view of the conversation, so the agent sees masked values unless they explicitly invoke a "show full" action that's audited. Reduces the insider threat surface.

The streaming variant of this fits onto Kafka or Kinesis — see "Architecting Privacy in Kafka" for the pipeline pattern that translates directly.

Architecture 3: third-party data exchange

Insurance is unusually federated: carriers exchange data with reinsurers, third-party administrators, independent adjusters, medical examiners, and external fraud-intelligence consortiums. Each exchange is an opportunity for PII to escape its intended scope.

The standard pattern: a per-recipient redaction policy that specifies exactly what the recipient is allowed to see. A reinsurer sharing claim outcomes doesn't need policyholder names. A fraud consortium sharing pattern data doesn't need policy numbers. A medical examiner reviewing a case needs the patient identifiers but not the financial ones.

  Claim data ──▶ Philter (reinsurer-policy)   ──▶ Reinsurer
                                              ──▶ fraud-consortium-policy ──▶ Consortium
                                              ──▶ med-examiner-policy ──▶ Examiner
                                              ──▶ regulator-policy ──▶ State filing

Each downstream gets a tailored slice. The same Philter instance, the same engine, different policy files per recipient.

The audit story

State insurance commissioners and the FTC (under the Safeguards Rule) ask for largely the same artifacts as healthcare and finance regulators:

Discovery inventory: what NPPI / PHI lives where, refreshed on a schedule. Phinder handles this against object storage and shared drives.
Policy regression validation: Philter Scope measures precision and recall against a gold-standard test set on every policy change, run in CI — the compliance-as-code pattern applies identically.
Live monitoring: Phield tracks detection volumes; alerts on entity-type drift that suggests an upstream format change.
Per-recipient logs: for any third-party data exchange, the audit trail showing which policy was applied to which transmission to which counterparty.

BIPA and biometric considerations

Several states (Illinois BIPA, Texas, Washington, plus emerging laws in others) impose specific requirements on biometric data — voiceprints, facial recognition, fingerprints. Insurance carriers increasingly use voice biometrics for authentication, which puts them squarely in BIPA scope. The textual redaction toolkit doesn't address biometrics directly; the architectural response is to isolate biometric storage from text storage and to never extract biometric identifiers into text contexts.

The bottom line

Insurance carriers sit at a regulatory intersection that's stricter than the sum of its parts. The right architectural response isn't to maintain separate pipelines for each regulation — it's to redact to the strictest applicable standard at every internal boundary, then tailor per-recipient policies for external exchanges. One engine, many policy files, one audit story.

The Philterd toolkit covers the textual scope of every framework that applies to insurance (GLBA, HIPAA, state insurance commissioner rules, NAIC MDL-668) with the same building blocks we use for finance and healthcare. If you're at a carrier, TPA, or insurtech and want help mapping your data flows to a compliant architecture, let's talk. Most engagements start with a precision/recall evaluation on representative claims data before any architectural commitments are made.