A redaction policy is the configuration that tells Phileas (and therefore Philter, which embeds it) two things: which types of sensitive information to detect, and how to transform each type when it is found. A policy is a single JSON file. This guide explains how that file is structured so you can read, write, and tune one with confidence.
The schema is the single source of truth for the whole toolkit. The canonical definition is a JSON Schema file in the Phileas repository, and everything else tracks it: the policy library, the Redaction Policy Editor, and the PhiSQL query language all produce or validate against this same shape.
The smallest useful policy
A policy that detects email addresses and redacts them looks like this:
{
"identifiers": {
"emailAddress": {
"emailAddressFilterStrategies": [
{
"strategy": "REDACT",
"redactionFormat": "{{{REDACTED-%t}}}"
}
]
}
}
}
When an email address is detected, it is replaced with {{{REDACTED-email-address}}}. The %t placeholder expands to the filter type. That is the entire mental model: an identifiers object naming the things to detect, and for each one, a list of strategies describing what to do.
The anatomy of a policy
The top level of a policy has a small number of optional sections. In practice most policies only use identifiers.
| Field | What it does |
|---|---|
identifiers | The core of the policy. Defines which entity types to detect and how to handle each. |
config | Global settings: text splitting for large inputs, PDF rendering, post-filters, and analysis options. |
ignored | Lists of terms to ignore globally, so a known-safe word is never redacted. |
ignoredPatterns | Regex patterns to ignore globally. |
crypto | AES encryption settings used by the CRYPTO_REPLACE strategy. |
fpe | Format-preserving encryption settings used by the FPE_ENCRYPT_REPLACE strategy. |
graphical | Settings for redacting images and PDFs. |
Identifiers: what to detect
Each key inside identifiers is an entity type. Phileas ships with a long list, including ssn, creditCard, emailAddress, phoneNumber, date, age, ipAddress, firstName, surname, streetAddress, city, state, zipCode, and url, along with detection driven by AI models (pheyes) and custom dictionaries (dictionaries).
Every identifier shares a set of common controls:
enabledto turn the filter on or offpriorityto order it relative to other filtersignoredandignoredPatternsto suppress specific matches for that one identifierwindowSizeto tune how much surrounding context the detector considers
Alongside those, each identifier carries its own strategies array, named after the identifier: ssnFilterStrategies, creditCardFilterStrategies, and so on. That array is where you say what happens when the entity is found.
Filter strategies: how to redact
A filter strategy is a single rule for transforming a detected value. The strategy field selects the transformation:
| Strategy | Effect |
|---|---|
REDACT | Replace the value with a format string (the default). |
MASK | Replace characters with a mask character such as *. |
STATIC_REPLACE | Replace with a fixed string you choose. |
RANDOM_REPLACE | Replace with a realistic random value of the same type. |
LAST_4 | Keep only the last four characters (common for card numbers). |
TRUNCATE | Keep a leading portion and drop the rest. |
ABBREVIATE | Shorten the value. |
HASH_SHA256_REPLACE | Replace with a SHA-256 hash of the value. |
CRYPTO_REPLACE | Replace with an AES-encrypted value (uses the policy crypto block). |
FPE_ENCRYPT_REPLACE | Replace with a format-preserving encrypted value (uses the policy fpe block). |
Strategies also accept supporting fields, the most common being redactionFormat (the template for REDACT, supporting %t for type, %v for the original value, and %l for a label), maskCharacter, staticReplacement, and replacementScope (DOCUMENT or CONTEXT, which keeps the same value mapping consistent within a document or a context).
Because each identifier holds an array of strategies, you can apply different handling under different conditions. The first matching strategy wins, so a more specific conditional strategy can sit ahead of a general fallback.
A worked example
Here is a small but realistic policy. It keeps the last four digits of credit cards, masks phone numbers, and redacts SSNs with a labeled format:
{
"identifiers": {
"creditCard": {
"creditCardFilterStrategies": [
{ "strategy": "LAST_4" }
]
},
"phoneNumber": {
"phoneNumberFilterStrategies": [
{ "strategy": "MASK", "maskCharacter": "#" }
]
},
"ssn": {
"ssnFilterStrategies": [
{ "strategy": "REDACT", "redactionFormat": "[SSN REMOVED]" }
]
}
}
}
The same policy expressed in PhiSQL is a few readable lines that compile to exactly this JSON:
POLICY example;
REDACT CREDIT_CARD WITH LAST_4;
REDACT PHONE_NUMBER WITH MASK(character='#');
REDACT SSN WITH REDACT(format='[SSN REMOVED]');
How to get started
- Start from an existing policy. The policy library has ready-to-use files for HIPAA Safe Harbor, PCI DSS scope reduction, and more. Download one and adjust it.
- Or author it readably. Write PhiSQL and compile it, or build the policy visually in the Redaction Policy Editor.
- Validate against the schema. Point your editor or CI at the canonical redaction-policy-schema.json so malformed policies are caught before they ship.
- Apply it. Save the file and reference it by name from the Philter redaction API, or load it directly in an embedded Phileas instance.
For the exhaustive field-by-field reference and every supported filter strategy option, see the Phileas filter policy documentation.