Phileas — The Open Source PII and PHI redaction engine

I am delighted to announce that the project providing the core PII and PHI redaction capabilities is now open source. Introducing Phileas, the PII and PHI redaction engine — available under the Apache 2.0 license on GitHub.
Both Philter and Phirestream use Phileas to identify and redact sensitive information like PII and PHI. Phileas does all of the heavy lifting, while Philter and Phirestream make its functionality user-friendly and provide the NLP models.
What Phileas is
Phileas is a JVM-based Java library — the core engine that finds and redacts sensitive information in text. It exposes a clean API for:
- Detection — identifying entities like SSNs, credit card numbers, names, dates, addresses, phone numbers, and more.
- Redaction — transforming detected entities via per-entity strategies: redact with a placeholder, mask, encrypt (including format-preserving), replace with synthetic values, drop, or hash.
- Policy-driven configuration — everything is controlled by a JSON policy file, so behavior changes don't require code changes.
The detection layer combines two complementary approaches: deterministic pattern matching for structured data (SSNs, credit cards, phone numbers) and contextual NER for unstructured text (names, locations, organizations). Each policy decides which detectors to enable and how to handle each entity type.
A first example
The simplest Phileas use case looks like this:
// build.gradle
implementation 'io.philterd:phileas:2.12.0'import io.philterd.phileas.model.policy.Policy;
import io.philterd.phileas.services.PhileasFilterService;
public class Example {
public static void main(String[] args) throws Exception {
Policy policy = Policy.fromFile("policy.json");
PhileasFilterService phileas = new PhileasFilterService(policy);
String input = "Patient John Doe SSN 123-45-6789.";
String redacted = phileas
.filter("default", "default", "doc-001", input)
.getFilteredText();
System.out.println(redacted);
// → "Patient *** SSN ***********."
}
}The policy file controls everything — which entity types are active, with what confidence threshold, and what replacement strategy is applied to each. A minimal policy covering SSNs and person names:
{
"name": "default",
"identifiers": {
"ssn": {
"ssnFilterStrategies": [
{ "strategy": "REDACT", "redactionFormat": "***********" }
]
},
"person": {
"personFilterStrategies": [
{ "strategy": "REDACT", "redactionFormat": "***" }
]
}
}
}The full policy schema covers dictionaries, custom identifier patterns (with regex), conditional logic (confidence > 75), language settings, and more — documented in the Phileas User's Guide.
How Phileas relates to Philter
The simplest way to describe the relationship: Phileas is the library; Philter is the API.
If you're already on the JVM and want fine-grained, in-process control over redaction — you embed Phileas as a Maven dependency. No service to deploy, no network hop. Phileas lives inside your JVM and works on the text already in memory.
If you want a turnkey HTTP service that any language can call (Python, Go, Node, .NET) — you deploy Philter. Philter wraps Phileas behind a REST API, ships with the NLP models pre-loaded, includes SDK clients for Java/.NET/Go, and is available on the AWS, GCP, and Azure marketplaces for one-click deployment.
Either way, you're using the same engine with the same policy format. The choice is about deployment shape, not capability:
// In-process via Phileas
String redacted = phileas.filter(...).getFilteredText();
# Over HTTP via Philter (any language)
curl -d "Patient John Doe SSN 123-45-6789." \
-H "Content-type: text/plain" \
http://localhost:8080/api/filter
# → "Patient *** SSN ***********."Why we open-sourced it
Two reasons.
First, we believe in open source. Privacy-critical software shouldn't be a black box. The engine that decides what counts as PII in your data is exactly the kind of code your security team should be able to read, audit, and verify.
Second, we want a more open relationship with users. With Phileas on GitHub, anyone can submit issues, propose features, and contribute code. We're migrating our tasks from a private Jira to GitHub issues so the roadmap is visible too.
For five years before this announcement, Phileas was an internal project used by Philter (and Phirestream). It was already battle-tested in production at healthcare, legal, and financial customers. Open-sourcing it doesn't change the engine — it changes who can see how it works.
What hasn't changed
Philter remains on the AWS, Azure, and Google Cloud marketplaces. We continue to provide commercial support, consulting, and the production-ready packaging that turns the library into a deployable service. New versions of Philter will be built on the open source Phileas project.
If you're looking for a self-managed, one-click-deployable redaction API: use Philter. If you're looking for an embeddable Java library or want to read the source: grab Phileas on GitHub.

See the Phileas product page for an overview of features, supported languages (Java, Python, Go), and the rest of the Philterd open source toolkit that builds on top of it.
Related posts: