Prompt Engineering for Privacy: Practical Patterns for Not Leaking PII

The prompt-engineering literature is enormous and almost entirely focused on getting better answers. It says very little about a quieter problem: every prompt sent to a hosted LLM is a data egress point. The text leaves your perimeter, lands on a provider's servers, sits in their logs for some retention period, and (depending on the provider and the agreement) may be used to train future models. The model might return your PII to another user. The provider might suffer a breach. Either way, the data is no longer yours.

This post is six concrete patterns for structuring prompts so PII doesn't leak through them. Each one is practical — small changes to how you assemble prompts, not a wholesale architecture rewrite.

Pattern 1: redact inputs before they ever reach the prompt

The strongest defense is the obvious one: never put PII in the prompt in the first place. For application code that interpolates user input into a system prompt, run the user input through a redaction step first.

import requests

def redact(text: str) -> str:
    r = requests.post(
        "http://philter.internal:8080/api/filter",
        params={"c": "prompt-context", "p": "ai-policy"},
        data=text,
        headers={"Content-Type": "text/plain"},
        timeout=2,
    )
    r.raise_for_status()
    return r.text

# Before: raw user query goes into the prompt
prompt = f"Help the user with this question: {user_query}"

# After: redacted user query
prompt = f"Help the user with this question: {redact(user_query)}"

For complete coverage, route every LLM call through Philter AI Proxy instead of redacting in application code. The proxy redacts all outbound traffic at the network boundary; your application keeps calling OpenAI / Anthropic / Bedrock as before. The general antipattern argument applies here: the moment you ship raw PII outside your perimeter, you've created a problem you can't unmake.

Pattern 2: structure prompts to minimize PII context

Many prompts include user data because the developer assumed the model needs it. Often the model doesn't. Aggressive scoping — sending only what the model strictly requires — reduces the attack surface for free.

# Before: dumps the entire customer record into the prompt
prompt = f"""
Customer: {customer.to_json()}
Their order history: {orders.to_json()}
Question: How should we resolve their issue?
"""

# After: extracted only the fields relevant to the question
prompt = f"""
Customer status: {customer.tier}
Order count last 90 days: {len(recent_orders)}
Open issues: {len(open_issues)}
Question: How should we resolve their issue?
"""

The transformed prompt loses the customer name, email, address, account number, full order history, and itemized list — none of which the model needed to suggest a resolution path. The "minimum necessary" principle from HIPAA generalizes here even outside healthcare: send the LLM only what it would need to answer the question, not the entire context you happen to have available.

Pattern 3: parameterize instead of interpolate

Treating prompts like SQL queries — using templates with explicit parameter slots, never string-formatting raw input directly — makes the redaction surface a single place rather than scattered throughout the codebase.

# Before: ad-hoc interpolation; redaction has to happen at every call site
prompt = f"Summarize this email from {sender_name}: {email_body}"

# After: parameterized template; redaction is centralized
TEMPLATE = "Summarize this email from {sender}: {body}"

def render(sender: str, body: str) -> str:
    return TEMPLATE.format(
        sender=redact(sender),
        body=redact(body),
    )

The pattern is the same one that prevented SQL injection: turn untrusted input into a structured value rather than raw string concatenation. Once your prompts go through a single render function, adding new redaction rules is a single-file change.

Pattern 4: use pseudonyms that the model can reason about

Sometimes the model genuinely needs to know that a person, account, or entity exists — even if it doesn't need to know which one. In those cases, replacing PII with consistent pseudonyms preserves the reasoning while removing the identification.

# A clinical summarization prompt that needs entity coherence
# but not real identities

original = """
The patient John Doe (DOB 1972-03-14) was admitted on 2026-03-11
complaining of chest pain. Dr. Alice Smith ordered an EKG.
The EKG was normal; Mr. Doe was discharged on 2026-03-12.
"""

redacted_with_pseudonyms = """
The patient Patient_A47 (DOB shifted) was admitted on Date_1
complaining of chest pain. Dr. Provider_B12 ordered an EKG.
The EKG was normal; Patient_A47 was discharged on Date_2.
"""

The pseudonymized version is just as useful to the summarization model — the entity relationships ("patient was admitted, provider ordered test, same patient was discharged") are preserved. But the model can't memorize and later regurgitate the real names. Phileas's policy engine supports consistent pseudonymization within a context; same input always maps to the same pseudonym across mentions.

Pattern 5: scan outputs, not just inputs

LLMs hallucinate, including hallucinated PII. A model fine-tuned on a dataset containing emails might confidently invent a plausible-looking email address in its response. A model with web-scraped training data might surface a real person's information in response to a generic question. The output side needs scanning too.

# After receiving an LLM response, re-redact before returning to the user
def safe_completion(prompt: str) -> str:
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    )
    raw_output = response.choices[0].message.content
    return redact(raw_output)  # last line of defense

The Philter AI Proxy applies bidirectional scanning by default — inbound prompts get redacted before forwarding, outbound responses get scanned for leakage on the way back. For application code that isn't behind the proxy, the manual pattern above is the equivalent.

Pattern 6: keep conversation memory bounded

Chat applications often append the conversation history to each new prompt, so the model can maintain context across turns. The history grows; the PII exposure compounds. By turn 20, a chat that started with one sensitive value has sent that value 20 times.

Three mitigations:

Summarize older turns instead of replaying them verbatim. A short summary of the conversation so far gives the model context without re-sending every original message.
Apply the redaction step to every turn that gets added to history — not just the most recent. The PII that was acceptable at turn 1 (because the user opted in) isn't re-acceptable at turn 20 (because by then the original consent context is gone).
Bound history length explicitly. Drop turns older than N from the prompt, even if the model could technically handle longer context.

Anti-patterns to actively avoid

"Just trust the system prompt." Saying "do not include PII in your response" in the system prompt is at best a soft hint. Models violate system-prompt instructions regularly, especially under adversarial input.
Client-side redaction with a hand-rolled regex. Catches SSNs and credit cards; misses everything that requires NLP context. Useful as belt-and-suspenders, useless as a primary defense. More on why hybrid wins.
Letting users opt out of redaction "for better answers." Once the data is out, it's out. There is no opt-back-in.
Trusting "we don't train on customer data" terms of service. Read the terms again; the carve-outs are extensive. Even if the terms hold, breach risk doesn't.

The bottom line

Prompt engineering for privacy isn't a separate discipline from prompt engineering for quality — it's the same discipline with one additional question: what's the minimum the model needs to know to answer this well? Send only that. Redact what's left at both ends. Scan outputs as carefully as inputs. Keep history bounded.

The patterns above all assume you have a redaction primitive available to call. Philter's HTTP API and Phileas's embeddable library both fit that primitive cleanly. The Philter AI Proxy applies most of these patterns automatically without requiring application changes — if you're already shipping LLM-powered features and don't want to refactor every call site, that's the lowest-friction starting point.

For a deeper architectural view, see "Building a Privacy-Aware RAG System"; the same redaction primitives, applied at retrieval + inference boundaries instead of just at the prompt.