Choose what to redact
Philter can detect and redact many PII and PHI types — names, ages, phone numbers, ID numbers, credit card numbers, VINs, SSNs and ITINs, and more — and you decide which to act on.
Self-hosted PII and PHI redaction API

Philter is open source software that redacts PII and PHI from text. Use it to maintain HIPAA compliance, meet industry regulations, and leverage your documents for valuable secondary purposes.
Spin up Philter directly from the marketplace of your choice. One-click deploy into your own VPC; no data ever leaves your account.
Full deployment walkthrough — pricing, 5-minute launch steps, FAQ — on the deploy guide.
Redaction is much more than finding words and replacing them with asterisks — but we're guessing you already knew that, because that's why you're here.
Philter can detect and redact many PII and PHI types — names, ages, phone numbers, ID numbers, credit card numbers, VINs, SSNs and ITINs, and more — and you decide which to act on.
For every entity type, choose how it's handled: mask it, encrypt it (including format-preserving encryption), replace it with a synthetic value, drop it, or pass it through.
Need to redact only some names, or only zip codes above a population threshold, or only credit cards that pass Luhn validation? Philter's policy engine handles it.
Philter processes plain text; Microsoft Word and Excel files are handled via the Office add-ins — so the same redaction policies cover the formats your team works in.
Define your own dictionaries (project codenames, internal client IDs) and your own identifier patterns (medical record numbers, transaction IDs) — Philter treats them as first-class entities alongside the built-in detectors.
A Philter lens is a specially-trained NLP model. Use the General Purpose lens out of the box, or switch to the Healthcare or COVID-19 lenses for higher accuracy on healthcare PHI.
Philter has a wide range of use-cases — sensitive information and the need to redact it is pervasive across industries. Below are real engagements where Philter was applied: the problem at hand, how we solved it, and what was learned.
Our proven approach. We start every engagement with an exploratory virtual meeting to determine whether Philter is a viable fit. If it is, we request sample data, run it through Philter, and share the metrics. Every dataset is different — statistics from one customer don't translate to another — so we measure performance against your data before you commit.
The problem. A healthcare IT vendor processing plain-text patient narratives needed to efficiently remove PHI so the text could be used for secondary purposes downstream.
Requirements:
Results. We executed a Business Associate Addendum to securely share PHI, then manually annotated a gold-standard dataset. Philter processed each document in milliseconds and identified and redacted 98% of the sensitive information — comfortably exceeding the 95% threshold.
Integration. The vendor ran a streaming pipeline that consumed from Apache Kafka. We provided design documentation showing how Philter slots into the Kafka flow, and the vendor deployed Philter from the AWS Marketplace into their existing AWS environment — streamlining procurement (no payment-info exchange) and putting them in control of spend (per-hour billing).
The problem. A legal firm handling federal bankruptcy filings needed PII removed from Microsoft Word documents to comply with Rule 9037 — Privacy Protection For Filings Made with the Court.
Requirements:
Results. We manually annotated the sample documents, configured Philter's SSN/TIN filter, set up Identifier filters for account numbers, and used the NER filter to find persons' names and replace them with initials. Philter identified 100% of all SSNs, TINs, financial account numbers, and birthdates. Juvenile names were caught via both a dictionary filter and the NER filter.
Integration. The firm's small outsourced IT team was mid-migration to AWS. We stood up the AWS resources to host Philter (with encryption at rest and in transit) so the deployment served both on-premises and cloud workloads — kick-starting the broader cloud move. The Philter Toolbox watched a Windows shared drive and processed documents automatically whenever staff saved a new one, keeping redaction invisible to end-users.
The PII redaction landscape has more options than ever — and they're not all built for the same kind of team. Here's a straight-talking look at where Philter fits, and where another tool might serve you better.
| Philter | Microsoft Presidio | AWS Comprehend (PII) | Google Cloud DLP | Private AI | |
|---|---|---|---|---|---|
| License | Apache 2.0 · open source | MIT · open source | Commercial (AWS) | Commercial (Google) | Commercial |
| Deployment | Self-hosted in your VPC | Self-hosted | Multi-tenant AWS service | Multi-tenant GCP service | SaaS API or container |
| Data residency | Stays in your account | Stays in your account | Sent to AWS regions | Sent to GCP regions | SaaS path leaves perimeter |
| Cloud portability | AWS, GCP, Azure, on-prem, air-gapped | BYO deployment | AWS only | GCP only | SaaS or BYOC |
| Marketplace billing | AWS · GCP · Azure | No | Native AWS billing | Native GCP billing | Vendor billing |
| Domain lenses | General, Healthcare, COVID-19 | General (bring your own models) | General | General | Healthcare, finance |
| Format-preserving encryption | Yes | Basic masking only | No | Yes | Limited |
| LLM proxy mode | Yes · Philter AI Proxy | Custom integration | Not native | Not native | Yes |
| Differential privacy | Yes · Philter Diffuse | No | No | Limited | No |
| SDK languages | Java, .NET, Go (+ Phileas in Java/Python/Go) | Python | AWS SDKs | GCP SDKs | Python, REST |
Vendor capabilities change frequently. The summary above reflects publicly documented behavior at the time of writing. Always read the current docs and run your own evaluation before deciding.
Looking for a side-by-side deep dive? Our dedicated comparison pages cover each alternative in detail — data path, pricing math at production scale, customization depth, and when the other tool is actually the better fit.
Both are open source and self-hosted. Where they differ:
When Presidio is the better fit: Python-only stack, no need for cloud-marketplace billing, willing to assemble and operate the deployment yourself.
AWS Comprehend is a managed PII detection API on AWS. Where they differ:
When Comprehend is the better fit: AWS-only stack, low customization needs, comfort with the multi-tenant data path.
Google Cloud DLP is GCP's managed PII detection and de-identification service. Where they differ:
When Cloud DLP is the better fit: GCP-only stack, fully-managed service is preferred, cross-cloud portability isn't a requirement.
Private AI is a commercial PII redaction service with SaaS and container deployment options. Where they differ:
When Private AI is the better fit: SaaS is acceptable, you prefer commercial vendor support contracts, you don't need broader privacy tooling beyond redaction.
If something here isn’t covered, get in touch — we’ll answer.
confidence condition in a filter strategy lets you tune detection — for example, confidence > 75 ignores entities the model isn't sure about and only redacts high-confidence matches.Three ways to get going — deploy the open source yourself, spin it up from a cloud marketplace, or work with our team directly. Pick the path that fits.