Healthcare · Philterd

Medical Chatbot — User Input Redaction

Redact PHI from user messages to a healthcare chatbot before they reach the LLM — preserves clinical meaning while removing identifiers.

View policy → Download JSON → View source on GitHub

v1.0.0 Updated 2026-05-18 Philter >=3.0.0 By Philterd

HIPAAPHIchatbotLLMconversational AIRAG

The policy

The full medical-chatbot.json file — the same content you’d get by downloading. Copy any part of it, or use the buttons in the hero to grab the whole file.

{
  "name": "medical-chatbot",
  "config": {
    "splitting": {
      "enabled": false,
      "threshold": 2000
    }
  },
  "ignored": [],
  "identifiers": {
    "personsName": {
      "personsFilterStrategies": [
        {"strategy": "REPLACE", "redactionFormat": "[PERSON]", "conditions": "confidence > 50"}
      ]
    },
    "age": {
      "ageFilterStrategies": [
        {"strategy": "REDACT", "redactionFormat": "[AGE>89]", "conditions": "context == \"age\" > 89"}
      ]
    },
    "date": {
      "onlyValidDates": true,
      "dateFilterStrategies": [
        {"strategy": "REDACT", "redactionFormat": "[REDACTED-DATE]"}
      ]
    },
    "phoneNumber": {
      "phoneNumberFilterStrategies": [
        {"strategy": "REDACT", "redactionFormat": "[REDACTED-PHONE]"}
      ]
    },
    "emailAddress": {
      "emailAddressFilterStrategies": [
        {"strategy": "REDACT", "redactionFormat": "[REDACTED-EMAIL]"}
      ]
    },
    "ssn": {
      "ssnFilterStrategies": [
        {"strategy": "REDACT", "redactionFormat": "[REDACTED-SSN]"}
      ]
    },
    "address": {
      "addressFilterStrategies": [
        {"strategy": "REDACT", "redactionFormat": "[REDACTED-ADDRESS]"}
      ]
    },
    "zipCode": {
      "zipCodeFilterStrategies": [
        {"strategy": "TRUNCATE", "truncateDigits": 3}
      ]
    },
    "ipAddress": {
      "ipAddressFilterStrategies": [
        {"strategy": "REDACT", "redactionFormat": "[REDACTED-IP]"}
      ]
    },
    "hospital": {
      "hospitalFilterStrategies": [
        {"strategy": "REPLACE", "redactionFormat": "[FACILITY]"}
      ]
    },
    "physicianName": {
      "physicianNameFilterStrategies": [
        {"strategy": "REPLACE", "redactionFormat": "[PROVIDER]"}
      ]
    },
    "identifiers": [
      {
        "id": "mrn",
        "pattern": "\\bMRN[\\s:#]*\\d{5,}\\b",
        "caseSensitive": false,
        "identifierFilterStrategies": [
          {"strategy": "REDACT", "redactionFormat": "[REDACTED-MRN]"}
        ]
      },
      {
        "id": "insurance-id",
        "pattern": "\\b(?:member|policy|insurance|plan)[\\s-]?(?:id|number|#)[\\s:#]*[A-Z0-9-]{6,}\\b",
        "caseSensitive": false,
        "identifierFilterStrategies": [
          {"strategy": "REDACT", "redactionFormat": "[REDACTED-INSURANCE-ID]"}
        ]
      }
    ]
  }
}

Example

Input

Hi, I'm worried about my mom Linda Chen — she's 72, lives at 1234 Oak St in Austin TX 78701, and her MRN at Mercy Hospital is 47291. Her blood pressure was 165/95 this morning.

Output

Hi, I'm worried about my mom [PERSON] — she's 72, lives at [REDACTED-ADDRESS] in [REDACTED-ADDRESS] 787, and her [REDACTED-MRN] at [FACILITY] is . Her blood pressure was 165/95 this morning.

Entities this policy acts on

PERSONAGEDATEPHONEEMAILSSNADDRESSZIPIPFACILITYPROVIDERMRNINSURANCE_ID

What this policy does

Designed for the specific failure mode of healthcare chatbots: users type conversational messages that mix clinical context (the thing you want the chatbot to act on) with personally identifying details (the thing you don’t want sitting in your LLM provider’s logs, training data, or RAG vector store).

The policy preserves clinical meaning while stripping identifiers:

Personal names → [PERSON] tokens (confidence threshold lowered to > 50 because chat messages are short and context-poor — names are harder to detect with high confidence)
Ages ≤ 89 → preserved (clinically relevant; “she’s 72 with hypertension” is the actual question)
Ages > 89 → [AGE>89] per HIPAA Safe Harbor §164.514(b)(2)(i)(C)
Dates → fully redacted (even seemingly innocuous “last Tuesday” gets caught when written as a date)
Addresses, ZIP codes → redacted (ZIP truncated to 3 digits)
Phone, email, IP, SSN → fully redacted
Facility names, provider names → replaced with [FACILITY] / [PROVIDER] (preserves the structural fact that a facility/provider was mentioned, without identifying which one)
MRN, insurance/member IDs → custom regex patterns, fully redacted

What’s deliberately preserved:

Clinical observations: “blood pressure was 165/95”, “her A1C is 7.2”
Medications: “she’s on lisinopril and metformin”
Symptoms: “chest pain for 3 days”, “shortness of breath when walking”
Relative time references: “last week”, “for the past 3 months” (only specific dates are redacted)
Conditions and diagnoses: “she has type 2 diabetes”

Without these, the chatbot can’t answer the user’s actual question. The whole point of a healthcare chatbot is to engage with clinical content — just not with the identifying details around it.

When to use this

Healthcare consumer chatbots (symptom checkers, post-discharge follow-up, medication reminders, patient education)
Provider-facing clinical assistants where the user types free-text questions and the system needs to call an external LLM
RAG systems serving healthcare queries where the user query may itself contain PHI
Telemedicine intake flows where free-text fields capture clinical history
Pair with Philter AI Proxy to drop in as a transparent middleware between your application and the LLM provider

When NOT to use this

For training a model on user messages. This policy preserves enough clinical detail to be useful in real-time, but that detail is also potentially re-identifiable in aggregate. For training data, use llm-training-data-prep.json, which is more aggressive.
For sharing transcripts externally as de-identified. Same reason. Use hipaa-safe-harbor.json for external sharing.
For non-conversational clinical text. For long-form clinical notes, clinical-notes-deid.json (with date-shifting) is better.

When to customize

Name confidence threshold. Default > 50 is loose, reflecting the short-context reality of chat messages. If your chatbot gets a lot of capitalized common words misclassified as names, raise to > 65. If you’re missing names that humans would obviously spot, lower to > 40.
Token vocabulary. Default uses bracketed tokens ([PERSON], [FACILITY]). If your downstream LLM is fine-tuned to expect specific tokens (<patient>, <<NAME>>), adjust the redactionFormat fields.
Insurance-ID regex. The default \b(?:member|policy|insurance|plan)[\s-]?(?:id|number|#)[\s:#]*[A-Z0-9-]{6,}\b is conservative. Update with your network’s actual ID format if known.
MRN regex. Same caveat as other healthcare policies — adjust for your EHR’s format.
Relative dates. This policy doesn’t redact relative time references like “yesterday” or “last week” because Philter’s date filter only catches structured dates. If your messages include date-mentioning patterns Philter doesn’t catch, add custom identifier regex.

Architectural pattern

user message → [Philter (this policy)] → LLM provider → [Philter (output policy)] → user

The output-side Philter policy is usually lighter (the LLM shouldn’t generate PHI, but defense-in-depth is worth the latency). Common pattern is to use a smaller, faster Philter configuration for the output side and the full medical-chatbot policy for the input side.

See Building a Privacy-Aware RAG System for a full architecture write-up, and Prompt Engineering for Privacy for the prompt-level patterns that complement input-side redaction.

Compliance notes

This policy is for real-time message redaction, not for de-identifying records under HIPAA Safe Harbor. The output may still constitute PHI under HIPAA (because residual quasi-identifiers exist) — treat the messages and their LLM responses as PHI for the purposes of access controls, audit logging, and BAA scope.
If your chatbot is provided by a covered entity OR a business associate, the LLM provider you call needs a BAA. Major providers (Anthropic, OpenAI, AWS Bedrock, Azure OpenAI) offer BAAs under specific commercial agreements. Verify before sending PHI — even redacted PHI — to the model.
Pair this policy with documented logging redaction. Your application logs of the input messages (pre-redaction) are themselves PHI, so they need to live in HIPAA-eligible storage with the same controls as any other clinical system.

References

Use this policy

Download and load into your running Philter instance:

# Download the policy
curl -O https://raw.githubusercontent.com/philterd/pii-redaction-policies/main/policies/philterd/healthcare/medical-chatbot.json

# Upload to your Philter instance
curl -X POST http://localhost:8080/api/policies \
     -H "Content-Type: application/json" \
     --data @medical-chatbot.json

# Redact text using the policy
curl http://localhost:8080/api/filter?p=medical-chatbot \
     --data "your text here" \
     -H "Content-Type: text/plain"

No Philter instance yet? Deploy one in 5 minutes → · Want to tune this policy against your data? Talk to the team.

← All policies