This guide takes you from an empty file to a working redaction policy in a handful of steps. By the end you will have detected an entity, applied the policy with Philter, changed how a value is redacted, added more identifiers, and silenced a false positive. For the field-by-field reference behind every choice here, keep the policy schema guide open alongside this one.
What you will build
A policy that finds email addresses, phone numbers, and Social Security numbers in text and redacts each one. We will start with a single identifier and grow it.
You will need a place to run the policy. The REST examples below assume a Philter instance on localhost:8080. If you are embedding Phileas as a library, the same JSON loads directly in your code.
Step 1: start with an empty policy
A policy is a JSON object. The only part you almost always need is identifiers, the map of things to detect. Start here:
{
"identifiers": {}
}
This is valid, but it detects nothing. Let us give it something to find.
Step 2: detect your first entity
Add an emailAddress identifier with one filter strategy. The strategy says what to do when an email address is detected:
{
"identifiers": {
"emailAddress": {
"emailAddressFilterStrategies": [
{
"strategy": "REDACT",
"redactionFormat": "{{{REDACTED-%t}}}"
}
]
}
}
}
The pattern is always the same: an entity key (emailAddress), and inside it a strategies array named after the entity (emailAddressFilterStrategies). The REDACT strategy replaces the value with redactionFormat, where %t expands to the filter type. An email address becomes {{{REDACTED-email-address}}}.
Step 3: apply the policy
Save the file as my-first-policy.json, then upload it to Philter and redact some text:
# 1. Save the policy to your Philter instance
curl -X POST http://localhost:8080/api/policies \
-H "Content-Type: application/json" \
--data @my-first-policy.json
# 2. Redact text using the policy, referenced by name
curl "http://localhost:8080/api/filter?p=my-first-policy" \
--data "Reach me at jane@example.com." \
-H "Content-Type: text/plain"
The response replaces the email address:
Reach me at {{{REDACTED-email-address}}}.
You now have a working redaction policy. Everything from here is refinement.
Step 4: change how a value is redacted
REDACT is one of several strategies. Suppose you would rather mask the local part of the address than drop it entirely. Switch the strategy to MASK:
"emailAddressFilterStrategies": [
{
"strategy": "MASK",
"maskCharacter": "*"
}
]
Each strategy has its own supporting fields. The filter strategies section of the schema guide lists them all, including format-preserving encryption and hashing for when you need the output to stay usable downstream.
Step 5: add more identifiers
Detecting more entity types means adding more keys to identifiers. Here is the policy with phone numbers and SSNs, each with a strategy that suits it. We keep the last four digits of phone numbers and fully redact SSNs:
{
"identifiers": {
"emailAddress": {
"emailAddressFilterStrategies": [
{ "strategy": "MASK", "maskCharacter": "*" }
]
},
"phoneNumber": {
"phoneNumberFilterStrategies": [
{ "strategy": "LAST_4" }
]
},
"ssn": {
"ssnFilterStrategies": [
{ "strategy": "REDACT", "redactionFormat": "[SSN REMOVED]" }
]
}
}
}
Different entities can use different strategies in the same policy. That is the whole point: you decide, per type, how aggressive the handling should be.
Step 6: silence a false positive
Detection is never perfect. Suppose your text frequently mentions a product code that looks like an SSN but is not. Tell the policy to ignore it. The ignored list applies across the whole policy:
{
"ignored": [
{
"terms": ["TEST-00-0000"]
}
],
"identifiers": {
"ssn": {
"ssnFilterStrategies": [
{ "strategy": "REDACT", "redactionFormat": "[SSN REMOVED]" }
]
}
}
}
Now that term is never redacted, even though it matches the SSN detector. Use ignoredPatterns instead of ignored when you need to match a regex rather than a fixed list of terms.
Step 7: keep refining
A policy is something you tune over time as you see real text. Two habits help:
- Measure it. Philter Scope scores a policy on precision and recall against gold-standard data, so you can see exactly which detector needs work rather than eyeballing a few examples.
- Review it like code. A policy is a plain file. Keep it in version control, and review changes the way you review any other change.
Where to go next
- The policy schema guide is the full reference for every field and strategy.
- PhiSQL lets you write the same policies in a few readable lines that compile to this JSON.
- The policy library has production-ready HIPAA, PCI DSS, and other policies to adapt instead of starting from scratch.