The Journey

Your PII Redaction Journey

PII redaction isn’t a single product — it’s a sequence of problems you hit as the work matures. You start by redacting. Then you measure what you redacted, discover the data you didn’t know about, monitor it in production, add a human checkpoint when stakes rise, extend to AI workloads, and eventually publish safe analytics.

Most teams live at stages 1–2. Each later stage is a tool for a specific problem you adopt when (and only when) you hit it. This page maps the journey to the products that solve each stage.

Start at Stage 1 →

Stage 1 · Start

Start redacting

The problem. Sensitive data is flowing through your systems — logs, support tickets, training data, customer correspondence — and you need to stop it from landing where it shouldn’t. Regex isn’t enough; cloud APIs send your text offsite; commercial SaaS is a closed box your security team won’t approve.

The product. Philter (HTTP API, deploys in 5 minutes from AWS/Azure/GCP marketplaces) or Phileas (Java, Python, .NET, or Go library, embedded directly in your application).

When teams adopt. Day 1. The trigger is usually a compliance ask, an upcoming audit, a near-miss leak, or an AI feature that needs guardrails before launch.

Deploy Philter in 5 minutes → Or embed Phileas

Stage 2 · Measure

Measure what you redacted

The problem. Your policy is running. Is it actually working? Without measurement, every policy tweak is a guess and every audit question (“what’s your recall on SSNs?”) has no answer.

The product. Philter Scope — score policies on precision, recall, and F1 against gold-standard test data. Version your policies. Fail the build when a change regresses.

When teams adopt. When the first audit asks “what’s your detection rate?” and the team realizes they don’t have an answer. Or when a policy change unexpectedly breaks production and there’s no way to have caught it earlier.

View Philter Scope →

Stage 3 · Discover

Map the scope you don’t know about

The problem. You’ve redacted the systems you know contain PII. What about the systems you don’t? The forgotten S3 bucket from a 2019 project. The shared drive with five years of contracts. The data warehouse table the analytics team built without telling security.

The product. Phinder — high-speed discovery scanner that crawls files, object storage, and document repositories to map where sensitive information actually lives.

When teams adopt. When a security review, M&A due diligence, or a privacy-impact assessment requires a sensitive-data inventory. Most teams find significantly more PII than they expected.

View Phinder →

Stage 4 · Monitor

Monitor in production

The problem. Redaction is now critical-path. If the detection rate suddenly drops — model drift, an upstream format change, a misconfigured policy — you need to know within minutes, not weeks. Today you’re flying blind between audits.

The product. Phield — production monitoring with anomaly detection on PII flow, plus alerting that integrates with your existing on-call.

When teams adopt. When redaction goes from “nice safety net” to “what the business depends on.” Usually 6–12 months after stage 1, often after a near-miss that should have been caught sooner.

View Phield →

Stage 5 · Human checkpoint

Add the human checkpoint

The problem. Automated redaction is good but not perfect. Some workflows can’t tolerate the false-negative rate — or, more often, can’t tolerate the absence of attribution. Someone, somewhere, needs to put their name on each redaction decision.

The product. Arbiter — human-in-the-loop review interface. Reviewers see every automated detection in context, accept or override with structured exemption codes, and the decisions flow into an audit trail.

When teams adopt. When the workflow has explicit human-attestation requirements: court filings, FDA submissions, regulated AI training corpora, sensitive disclosure prep. Or when the auditor asks the question that needs a human answer.

View Arbiter →

Stage 6 · AI workloads

Extend to AI workloads

The problem. Your team starts building LLM features — chatbots, RAG, agent workflows, summarization. Your existing redaction pipeline doesn’t cover the prompt and response traffic to hosted LLM providers, and security says “you can’t send customer data to OpenAI” just as product says “we’re shipping the feature next quarter.”

The product. Philter AI Proxy — drop-in middleware between your application and the LLM provider. Redacts PII from outbound prompts; optionally scans incoming responses. Point your existing SDK at the proxy URL; everything else stays the same.

When teams adopt. When the AI roadmap meets the security review. Often the fastest-moving stage on the journey because the AI rollout has hard deadlines and the security gate is a known blocker.

View Philter AI Proxy →

Stage 7 · Safe analytics

Enable safe analytics

The problem. You want to publish or share aggregate PII statistics — “how many SSNs flowed through this pipeline last month,” “what percentage of clinical notes contain medication mentions,” “which redaction categories spiked this quarter” — without exposing individuals. Naive aggregation can re-identify; differential privacy is the principled answer.

The product. Philter Diffuse — differential privacy for PII aggregations. Mathematical bounds on individual-level information leakage; configurable privacy budget (ε) per query.

When teams adopt. When the analytics or research team needs to share aggregate findings externally, or when an internal "privacy AND utility" mandate makes naive aggregation insufficient. Latest stage on the journey; rarely the first need.

View Philter Diffuse →

Tools used at every stage

Three tools don’t fit a single stage — they support the journey across all seven.

Redaction Policy Editor

Visual policy authoring — click to choose entity types, drag to order conditions, save as a Philter policy file. Used at every stage to build or tune the JSON the engine consumes. Hosted free at policies.philterd.ai.

Policy Library

Pre-built policies for HIPAA Safe Harbor, PCI DSS, GLBA, FERPA, FRBP 9037, medical chatbots, LLM training-data prep, and more. Starting points for stages 1, 5, 6, and 7 — community-contributed and curated.

PhEye

The NLP models and model server underneath Philter and Phileas. Most teams never deploy it separately — but if you’re running at scale, training custom lenses, or serving models to multiple Philter instances, PhEye is what you operate.

Where are you on the journey?

The honest answer for most teams is “stage 1, or about to be.” That’s the right place to be. The whole point of the journey is that you adopt later stages when you hit later problems — not as a checklist.

If you’re evaluating where to start, the simplest path is to deploy Philter, run it against a representative slice of your data, and see what the precision and recall look like. Everything else on the journey is a tool you adopt when the corresponding problem becomes the loudest.

Need help finding your stage?

Tell us what problem you’re trying to solve. Most conversations end with us pointing you at the product or stage that fits — sometimes that means pointing you at someone else’s tool instead.