Q: Does Philter use ChatGPT or other third-party APIs?

No. Philter never transmits your text or documents to any third-party service. Philter can run in a firewalled (or even air-gapped) environment. For example, in AWS you can deploy Philter to a private subnet and use security groups and network ACLs to prevent any outbound traffic from the instance and its subnet, and we recommend doing so to strengthen your overall security posture.

Q: Is Philter open source?

Philter is built on Phileas , an open source library for finding and redacting PII and PHI in text and documents. Philter wraps Phileas to make it more user-friendly, provide an HTTP (REST) interface, and ship with NLP models. All other capabilities of Philter are powered by Phileas. Phileas is licensed under the Apache License, version 2. You're welcome to check out the code, file issues, and contribute pull requests.

Question 1

How long has Philter been around?

Accepted Answer

Phileas (the open source library underneath Philter) was first released in 2017. Philter followed as the API layer on top. That makes the Philterd toolkit one of the oldest continuously maintained open source PII redaction engines available, predating the current wave of LLM-wrapper privacy tools by more than six years. The detection engine is a hybrid of purpose-built NLP models and pattern matching, not a prompt sent to a third-party LLM. You can inspect the full commit history, every model, and every line of policy logic on GitHub. The Apache OpenNLP project (one of the foundational NLP frameworks in the Java ecosystem) underpins the model layer; Philterd’s founder is the PMC Chair of that project.

Question 2

What is Philter?

Accepted Answer

Philter is an application that redacts protected health information (PHI), personally identifiable information (PII), non-public personal information (NPPI), and other sensitive information from text. Philter processes plain text, plus Microsoft Word and Excel files via its Office add-ins. Philter runs in your private cloud so your sensitive data never has to traverse the public internet. Use Philter's API to process text from virtually any system or process. Philter is open source. Learn more on GitHub.

Question 3

Does Philter use ChatGPT or other third-party APIs?

Accepted Answer

No. Philter never transmits your text or documents to any third-party service. Philter can run in a firewalled (or even air-gapped) environment. For example, in AWS you can deploy Philter to a private subnet and use security groups and network ACLs to prevent any outbound traffic from the instance and its subnet, and we recommend doing so to strengthen your overall security posture.

Question 4

Is Philter open source?

Accepted Answer

Philter is built on Phileas, an open source library for finding and redacting PII and PHI in text and documents. Philter wraps Phileas to make it more user-friendly, provide an HTTP (REST) interface, and ship with NLP models. All other capabilities of Philter are powered by Phileas. Phileas is licensed under the Apache License, version 2. You're welcome to check out the code, file issues, and contribute pull requests.

Question 5

What types of PII, PHI, and sensitive information can Philter redact?

Accepted Answer

Philter detects many entity types and we add new ones regularly. Among them:

Ages · Bitcoin Addresses · US Cities · US Counties · Credit Card Numbers · Custom Dictionaries · Custom Identifiers (e.g. medical record numbers, transaction numbers) · Dates · US Driver's License Numbers · Email Addresses · Hospital Names · IBAN Codes · IP Addresses (IPv4 and IPv6) · MAC Addresses · Passport Numbers · Persons' Names (fuzzy matching, first/last/whole) · Phone/Fax Numbers · Physician Names · SSNs and TINs · Shipping Tracking Numbers · US States · URLs · VINs · US Zip Codes

Question 6

How does Philter know what to redact?

Accepted Answer

You create policies that tell Philter which types of PII and PHI to find and how to handle them. A policy lists the entity types (phone numbers, names, etc.), when to act on them, and how to redact them. You can have as many policies as you need and select which one to apply per request. Policies are documented in Philter's User's Guide.

Question 7

How does Philter identify sensitive information?

Accepted Answer

Philter uses a variety of methods, including specially trained machine learning models. A Philter lens is a trained model for a specific kind of text. Using a lens that matches your data gives you more accurate results. Philter ships with a General Purpose lens that works across many document types. If you're focused on healthcare PHI, Healthcare or COVID-19 lenses are also available. Contact us for details.

Question 8

How is Philter deployed?

Accepted Answer

Philter can be deployed as a container or directly into your cloud from the AWS, Google Cloud, or Microsoft Azure marketplaces in just a few minutes. For container-based deployments, please contact us.

Question 9

How do I send text to Philter for redaction?

Accepted Answer

Three options: 1. Call the HTTP API directly. Philter accepts text and returns the redacted text. Generate a client in any language from the Philter OpenAPI specification. 2. Use the Philter CLI for convenient command-line access. 3. Use the open source Java SDK.

Question 10

What are Philter's accuracy, precision, and recall metrics?

Accepted Answer

Precision and recall depend greatly on your data. Every dataset is different, so quoting a single F1 score across users would be meaningless. If a vendor cites accuracy without seeing your documents, be very cautious.

We'd rather take some representative text from you and spend a few days gathering precision and recall metrics specific to your data, then send those back to you alongside the redacted output.

Question 11

Is Philter guaranteed to find 100% of all sensitive information?

Accepted Answer

Philter uses state-of-the-art NLP technology, which is fundamentally non-deterministic. Identification accuracy depends on how similar your text is to the training corpus, how the text is formatted, and how long it is. For that reason, it's important to assess Philter's performance on your own data before relying on it in production. Every detected entity has a confidence score between 0 and 100. The confidence condition in a filter strategy lets you tune detection. For example, confidence > 75 ignores entities the model isn't sure about and only redacts high-confidence matches.

Question 12

How does Philter compare with Amazon Comprehend, Google DLP, and similar services?

Accepted Answer

Direct comparisons are tricky because Philter is designed differently. Philter goes beyond identification. It includes disambiguation, ignore lists, value replacement, and anonymization out of the box. The other services may technically support some of these, but only if you build them yourself on top. Philter is also not a managed SaaS API. You deploy it into your environment and call its API over your own network, so your text never has to leave your perimeter. We think that's substantially more secure than handing sensitive text to a third-party API. Finally, Philter is flexible: you can use your own models and you have full control over the filtering pipeline.

Question 13

What platforms are supported by Philter?

Accepted Answer

Philter is available on the major cloud marketplaces: - AWS Marketplace - Google Cloud Marketplace - Microsoft Azure Marketplace For other platforms or container deployments, please contact us.

Question 14

What is Philter's license agreement?

Accepted Answer

Philter is open source software under the Apache License, Version 2.0. The full Philter license agreement is published on this site, and the licensing details are also available on the project repository.

	Philter	Microsoft Presidio	AWS Comprehend (PII)	Google Cloud DLP	Private AI
License	Apache 2.0 · open source	MIT · open source	Commercial (AWS)	Commercial (Google)	Commercial
Deployment	Self-hosted in your VPC	Self-hosted	Multi-tenant AWS service	Multi-tenant GCP service	SaaS API or container
Data residency	Stays in your account	Stays in your account	Sent to AWS regions	Sent to GCP regions	SaaS path leaves perimeter
Cloud portability	AWS, GCP, Azure, on-prem, air-gapped	BYO deployment	AWS only	GCP only	SaaS or BYOC
Marketplace billing	AWS · GCP · Azure	No	Native AWS billing	Native GCP billing	Vendor billing
Domain lenses	General, Healthcare, COVID-19	General (bring your own models)	General	General	Healthcare, finance
Format-preserving encryption	Yes	Basic masking only	No	Yes	Limited
LLM proxy mode	Yes · Philter AI Proxy	Custom integration	Not native	Not native	Yes
Differential privacy	Yes · Philter Diffuse	No	No	Limited	No
SDK languages	Java SDK, plus any language via the OpenAPI spec (+ Phileas in Java/Python/.NET)	Python	AWS SDKs	GCP SDKs	Python, REST

Contact Us

Try it live

Available on the cloud marketplaces

Redaction is much more than *****.

Choose what to redact

Choose how to redact

Redact only certain instances

Text and Office documents

Custom entity types

Domain-specific lenses

Built-in audit trail

Case Studies

Filtering PHI in Patient Text for a Healthcare IT Solutions Provider

PII Filtering of Bankruptcy Documents for a Legal Firm

How Free, Self-Hosted PII & PHI Redaction API compares

Choose Philter when

Pick another tool when

Frequently asked questions

Ready to use Free, Self-Hosted PII & PHI Redaction API?