Talk to an Expert

Tell us about your stack and the privacy problems you're trying to solve. We typically respond within one business day.

Prefer to skip the form? Pick a time on our calendar →
or send a message

Please do not enter PII or PHI in this form. If you need to share an example, use a sanitized one.

Self-hosted PII and PHI redaction API

Philter

Philter is open source software that redacts PII and PHI from text. Use it to maintain HIPAA compliance, meet industry regulations, and leverage your documents for valuable secondary purposes.

Available on the cloud marketplaces

Spin up Philter directly from the marketplace of your choice. One-click deploy into your own VPC; no data ever leaves your account.

Full deployment walkthrough — pricing, 5-minute launch steps, FAQ — on the deploy guide.

Redaction is much more than *****.

Redaction is much more than finding words and replacing them with asterisks — but we're guessing you already knew that, because that's why you're here.

Choose what to redact

Philter can detect and redact many PII and PHI types — names, ages, phone numbers, ID numbers, credit card numbers, VINs, SSNs and ITINs, and more — and you decide which to act on.

Choose how to redact

For every entity type, choose how it's handled: mask it, encrypt it (including format-preserving encryption), replace it with a synthetic value, drop it, or pass it through.

Redact only certain instances

Need to redact only some names, or only zip codes above a population threshold, or only credit cards that pass Luhn validation? Philter's policy engine handles it.

Text and Office documents

Philter processes plain text; Microsoft Word and Excel files are handled via the Office add-ins — so the same redaction policies cover the formats your team works in.

Custom entity types

Define your own dictionaries (project codenames, internal client IDs) and your own identifier patterns (medical record numbers, transaction IDs) — Philter treats them as first-class entities alongside the built-in detectors.

Domain-specific lenses

A Philter lens is a specially-trained NLP model. Use the General Purpose lens out of the box, or switch to the Healthcare or COVID-19 lenses for higher accuracy on healthcare PHI.

Case Studies

Philter has a wide range of use-cases — sensitive information and the need to redact it is pervasive across industries. Below are real engagements where Philter was applied: the problem at hand, how we solved it, and what was learned.

Our proven approach. We start every engagement with an exploratory virtual meeting to determine whether Philter is a viable fit. If it is, we request sample data, run it through Philter, and share the metrics. Every dataset is different — statistics from one customer don't translate to another — so we measure performance against your data before you commit.

Healthcare
98% of sensitive information redacted

Filtering PHI in Patient Text for a Healthcare IT Solutions Provider

The problem. A healthcare IT vendor processing plain-text patient narratives needed to efficiently remove PHI so the text could be used for secondary purposes downstream.

Requirements:

  • No significant timing delays in the processing pipeline.
  • 95% of all sensitive information (patient names, ages, ID numbers, and dates) removed.

Results. We executed a Business Associate Addendum to securely share PHI, then manually annotated a gold-standard dataset. Philter processed each document in milliseconds and identified and redacted 98% of the sensitive information — comfortably exceeding the 95% threshold.

Integration. The vendor ran a streaming pipeline that consumed from Apache Kafka. We provided design documentation showing how Philter slots into the Kafka flow, and the vendor deployed Philter from the AWS Marketplace into their existing AWS environment — streamlining procurement (no payment-info exchange) and putting them in control of spend (per-hour billing).

Legal
100% of SSNs, TINs, accounts, and birthdates identified

PII Filtering of Bankruptcy Documents for a Legal Firm

The problem. A legal firm handling federal bankruptcy filings needed PII removed from Microsoft Word documents to comply with Rule 9037 — Privacy Protection For Filings Made with the Court.

Requirements:

  • SSN and TIN numbers — redact to the last 4 digits.
  • Birthdates — redact to just the 4-digit year.
  • Persons' names — redact minors' names to initials.
  • Financial account numbers — redact to the last 4 digits.

Results. We manually annotated the sample documents, configured Philter's SSN/TIN filter, set up Identifier filters for account numbers, and used the NER filter to find persons' names and replace them with initials. Philter identified 100% of all SSNs, TINs, financial account numbers, and birthdates. Juvenile names were caught via both a dictionary filter and the NER filter.

Integration. The firm's small outsourced IT team was mid-migration to AWS. We stood up the AWS resources to host Philter (with encryption at rest and in transit) so the deployment served both on-premises and cloud workloads — kick-starting the broader cloud move. The Philter Toolbox watched a Windows shared drive and processed documents automatically whenever staff saved a new one, keeping redaction invisible to end-users.

How Philter compares

The PII redaction landscape has more options than ever — and they're not all built for the same kind of team. Here's a straight-talking look at where Philter fits, and where another tool might serve you better.

PhilterMicrosoft PresidioAWS Comprehend (PII)Google Cloud DLPPrivate AI
LicenseApache 2.0 · open sourceMIT · open sourceCommercial (AWS)Commercial (Google)Commercial
DeploymentSelf-hosted in your VPCSelf-hostedMulti-tenant AWS serviceMulti-tenant GCP serviceSaaS API or container
Data residencyStays in your accountStays in your accountSent to AWS regionsSent to GCP regionsSaaS path leaves perimeter
Cloud portabilityAWS, GCP, Azure, on-prem, air-gappedBYO deploymentAWS onlyGCP onlySaaS or BYOC
Marketplace billingAWS · GCP · AzureNoNative AWS billingNative GCP billingVendor billing
Domain lensesGeneral, Healthcare, COVID-19General (bring your own models)GeneralGeneralHealthcare, finance
Format-preserving encryptionYesBasic masking onlyNoYesLimited
LLM proxy modeYes · Philter AI ProxyCustom integrationNot nativeNot nativeYes
Differential privacyYes · Philter DiffuseNoNoLimitedNo
SDK languagesJava, .NET, Go (+ Phileas in Java/Python/Go)PythonAWS SDKsGCP SDKsPython, REST

Vendor capabilities change frequently. The summary above reflects publicly documented behavior at the time of writing. Always read the current docs and run your own evaluation before deciding.

Philter vs Microsoft Presidio

Both are open source and self-hosted. Where they differ:

  • Language. Philter is JVM-first with first-class Python and Go bindings via Phileas. Presidio is Python-first; non-Python integration is custom work.
  • Models out of the box. Philter ships purpose-built NLP lenses for general, healthcare, and COVID-19 text. Presidio ships generic spaCy/Stanza recognizers and expects you to wire up the rest.
  • Cloud-marketplace presence. Philter has one-click deployments on AWS, GCP, and Azure marketplaces with per-hour billing. Presidio is BYO deployment.
  • Commercial backing. Philter has commercial support and consulting paths without ever becoming a closed product. Presidio is a Microsoft research project with no commercial support tier.

When Presidio is the better fit: Python-only stack, no need for cloud-marketplace billing, willing to assemble and operate the deployment yourself.

Philter vs AWS Comprehend (PII detection)

AWS Comprehend is a managed PII detection API on AWS. Where they differ:

  • Data path. Comprehend sends your text to a multi-tenant AWS service. Philter runs entirely inside your VPC; sensitive data never leaves your account.
  • Cloud lock-in. Comprehend is AWS-only. Philter runs on AWS, GCP, Azure, on-prem, or air-gapped.
  • Customization. Philter exposes a full policy engine with dictionaries, regex patterns, custom identifier rules, and per-entity replacement strategies. Comprehend's customization surface is narrower.
  • Pricing model. Comprehend is consumption-based (per character). Philter on the AWS Marketplace is per-instance-hour — predictable as your volume scales.

When Comprehend is the better fit: AWS-only stack, low customization needs, comfort with the multi-tenant data path.

Philter vs Google Cloud DLP

Google Cloud DLP is GCP's managed PII detection and de-identification service. Where they differ:

  • Data path. DLP processes your text in Google's managed environment. Philter runs in your VPC.
  • Cloud lock-in. DLP is GCP-only. Philter is multi-cloud and on-prem capable.
  • Toolkit breadth. Philter is one of nine tools — discovery (Phinder), monitoring (Phield), policy editing (Redaction Policy Editor), benchmarking (Philter Scope), and more. DLP is the redaction surface; everything else you build or buy separately.

When Cloud DLP is the better fit: GCP-only stack, fully-managed service is preferred, cross-cloud portability isn't a requirement.

Philter vs Private AI

Private AI is a commercial PII redaction service with SaaS and container deployment options. Where they differ:

  • License. Private AI is commercial proprietary. Philter is released under the permissive and business-friendly Apache license — auditable source, no licensing review, no per-seat fees.
  • Pricing. Private AI is commercial volume- or seat-based. Philter's open source tier is free; commercial paths are marketplace hour-billing or engagement-based consulting.
  • Ecosystem. Philter is part of a 9-tool ecosystem covering the full PII lifecycle (discover → redact → monitor → analyze with differential privacy). Private AI is focused on the redaction API itself.

When Private AI is the better fit: SaaS is acceptable, you prefer commercial vendor support contracts, you don't need broader privacy tooling beyond redaction.

Choose Philter when

  • You need sensitive data to stay inside your perimeter.
  • You want auditable open source, not a vendor black box.
  • You operate across multiple clouds or in air-gapped environments.
  • You want a single set of policies to cover redaction, discovery, monitoring, and LLM traffic.
  • You're in healthcare, finance, legal, or government — where compliance posture matters more than convenience.

Pick another tool when

  • You want a hosted SaaS API and don't care where the data flows.
  • You only need basic email-and-SSN regex matching.
  • You're Python-first with no production NLP roadmap (Presidio's ergonomics may suit you better).

Frequently asked questions

If something here isn’t covered, get in touch — we’ll answer.

What is Philter?
Philter is an application that redacts protected health information (PHI), personally identifiable information (PII), non-public personal information (NPPI), and other sensitive information from text. Philter processes plain text, plus Microsoft Word and Excel files via its Office add-ins. Philter runs in your private cloud so your sensitive data never has to traverse the public internet. Use Philter's API to process text from virtually any system or process. Philter is open source — learn more on GitHub.
Does Philter use ChatGPT or other third-party APIs?
No. Philter never transmits your text or documents to any third-party service. Philter can run in a firewalled (or even air-gapped) environment. For example, in AWS you can deploy Philter to a private subnet and use security groups and network ACLs to prevent any outbound traffic from the instance and its subnet — and we recommend doing so to strengthen your overall security posture.
Is Philter open source?
Philter is built on Phileas, an open source library for finding and redacting PII and PHI in text and documents. Philter wraps Phileas to make it more user-friendly, provide an HTTP (REST) interface, and ship with NLP models. All other capabilities of Philter are powered by Phileas. Phileas is licensed under the Apache License, version 2. You're welcome to check out the code, file issues, and contribute pull requests.
What types of PII, PHI, and sensitive information can Philter redact?
Philter detects many entity types and we add new ones regularly. Among them: Ages · Bitcoin Addresses · US Cities · US Counties · Credit Card Numbers · Custom Dictionaries · Custom Identifiers (e.g. medical record numbers, transaction numbers) · Dates · US Driver's License Numbers · Email Addresses · Hospital Names · IBAN Codes · IP Addresses (IPv4 and IPv6) · MAC Addresses · Passport Numbers · Persons' Names (fuzzy matching, first/last/whole) · Phone/Fax Numbers · Physician Names · SSNs and TINs · Shipping Tracking Numbers · US States · URLs · VINs · US Zip Codes
How does Philter know what to redact?
You create policies that tell Philter which types of PII and PHI to find and how to handle them. A policy lists the entity types (phone numbers, names, etc.), when to act on them, and how to redact them. You can have as many policies as you need and select which one to apply per request. Policies are documented in Philter's User's Guide.
How does Philter identify sensitive information?
Philter uses a variety of methods, including specially trained machine learning models. A Philter lens is a trained model for a specific kind of text — using a lens that matches your data gives you more accurate results. Philter ships with a General Purpose lens that works across many document types. If you're focused on healthcare PHI, Healthcare or COVID-19 lenses are also available. Contact us for details.
How is Philter deployed?
Philter can be deployed as a container or directly into your cloud from the AWS, Google Cloud, or Microsoft Azure marketplaces in just a few minutes. For container-based deployments, please contact us.
How do I send text to Philter for redaction?
Three options: 1. Call the HTTP API directly — Philter accepts text and returns the redacted text. 2. Use the Philter CLI for convenient command-line access. 3. Use the open source SDKs for Java, .NET, and Go.
What are Philter's accuracy, precision, and recall metrics?
Precision and recall depend greatly on your data. Every dataset is different, so quoting a single F1 score across users would be meaningless — if a vendor cites accuracy without seeing your documents, be very cautious. We'd rather take some representative text from you and spend a few days gathering precision and recall metrics specific to your data, then send those back to you alongside the redacted output.
Is Philter guaranteed to find 100% of all sensitive information?
Philter uses state-of-the-art NLP technology, which is fundamentally non-deterministic. Identification accuracy depends on how similar your text is to the training corpus, how the text is formatted, and how long it is. For that reason, it's important to assess Philter's performance on your own data before relying on it in production. Every detected entity has a confidence score between 0 and 100. The confidence condition in a filter strategy lets you tune detection — for example, confidence > 75 ignores entities the model isn't sure about and only redacts high-confidence matches.
How does Philter compare with Amazon Comprehend, Google DLP, and similar services?
Direct comparisons are tricky because Philter is designed differently. Philter goes beyond identification — it includes disambiguation, ignore lists, value replacement, and anonymization out of the box. The other services may technically support some of these, but only if you build them yourself on top. Philter is also not a managed SaaS API. You deploy it into your environment and call its API over your own network, so your text never has to leave your perimeter. We think that's substantially more secure than handing sensitive text to a third-party API. Finally, Philter is flexible: you can use your own models and you have full control over the filtering pipeline.
What platforms are supported by Philter?
Philter is available on the major cloud marketplaces: - AWS Marketplace - Google Cloud Marketplace - Microsoft Azure Marketplace For other platforms or container deployments, please contact us.
What is Philter's license agreement?
Philter's licensing details are available on the project repository.

Ready to use Philter?

Three ways to get going — deploy the open source yourself, spin it up from a cloud marketplace, or work with our team directly. Pick the path that fits.

See your options