The Phileas redaction policy schema

Q: "Where does the canonical schema live?"

"In the Phileas repository at \u003ca href=\"https://github.com/philterd/phileas/blob/main/policy-schema/redaction-policy-schema.json\"\u003epolicy-schema/redaction-policy-schema.json\u003c/a\u003e. That JSON Schema file is the single source of truth. Philter, Phileas, the policy library, and \u003ca href=\"/phisql/\"\u003ePhiSQL\u003c/a\u003e all track it."

Q: "Do I have to write policies by hand?"

"No. You can hand-write JSON, generate it from \u003ca href=\"/phisql/\"\u003ePhiSQL\u003c/a\u003e (a readable query language that compiles to this schema), build it visually with the \u003ca href=\"/redaction-policy-editor/\"\u003eRedaction Policy Editor\u003c/a\u003e, or start from a ready-made file in the \u003ca href=\"/policies/\"\u003epolicy library\u003c/a\u003e. Every path produces the same JSON."

Q: "What happens if a policy does not match the schema?"

"Philter and Phileas validate policies on load and reject malformed ones. The policy library also validates every contributed policy against a vendored copy of the schema in CI, so drift is caught before it ships."

Q: "How do I control which strategy applies to which entity?"

"Each identifier carries its own array of filter strategies. The first strategy whose condition matches is applied, so you can, for example, mask most credit cards but keep the last four digits when a condition holds. See the filter strategies section above."

A redaction policy is the configuration that tells Phileas (and therefore Philter, which embeds it) two things: which types of sensitive information to detect, and how to transform each type when it is found. A policy is a single JSON file. This guide explains how that file is structured so you can read, write, and tune one with confidence.

The schema is the single source of truth for the whole toolkit. The canonical definition is a JSON Schema file in the Phileas repository, and everything else tracks it: the policy library, the Redaction Policy Editor, and the PhiSQL query language all produce or validate against this same shape.

The smallest useful policy

A policy that detects email addresses and redacts them looks like this:

{
  "identifiers": {
    "emailAddress": {
      "emailAddressFilterStrategies": [
        {
          "strategy": "REDACT",
          "redactionFormat": "{{{REDACTED-%t}}}"
        }
      ]
    }
  }
}

When an email address is detected, it is replaced with {{{REDACTED-email-address}}}. The %t placeholder expands to the filter type. That is the entire mental model: an identifiers object naming the things to detect, and for each one, a list of strategies describing what to do.

The anatomy of a policy

The top level of a policy has a small number of optional sections. In practice most policies only use identifiers.

Field	What it does
`identifiers`	The core of the policy. Defines which entity types to detect and how to handle each.
`config`	Global settings: text splitting for large inputs, PDF rendering, post-filters, and analysis options.
`ignored`	Lists of terms to ignore globally, so a known-safe word is never redacted.
`ignoredPatterns`	Regex patterns to ignore globally.
`crypto`	AES encryption settings used by the `CRYPTO_REPLACE` strategy.
`fpe`	Format-preserving encryption settings used by the `FPE_ENCRYPT_REPLACE` strategy.
`graphical`	Settings for redacting images and PDFs.

Identifiers: what to detect

Each key inside identifiers is an entity type. Phileas ships with a long list, including ssn, creditCard, emailAddress, phoneNumber, date, age, ipAddress, firstName, surname, streetAddress, city, state, zipCode, and url, along with detection driven by AI models (pheyes) and custom dictionaries (dictionaries).

Every identifier shares a set of common controls:

enabled to turn the filter on or off
priority to order it relative to other filters
ignored and ignoredPatterns to suppress specific matches for that one identifier
windowSize to tune how much surrounding context the detector considers

Alongside those, each identifier carries its own strategies array, named after the identifier: ssnFilterStrategies, creditCardFilterStrategies, and so on. That array is where you say what happens when the entity is found.

Filter strategies: how to redact

A filter strategy is a single rule for transforming a detected value. The strategy field selects the transformation:

Strategy	Effect
`REDACT`	Replace the value with a format string (the default).
`MASK`	Replace characters with a mask character such as `*`.
`STATIC_REPLACE`	Replace with a fixed string you choose.
`RANDOM_REPLACE`	Replace with a realistic random value of the same type.
`LAST_4`	Keep only the last four characters (common for card numbers).
`TRUNCATE`	Keep a leading portion and drop the rest.
`ABBREVIATE`	Shorten the value.
`HASH_SHA256_REPLACE`	Replace with a SHA-256 hash of the value.
`CRYPTO_REPLACE`	Replace with an AES-encrypted value (uses the policy `crypto` block).
`FPE_ENCRYPT_REPLACE`	Replace with a format-preserving encrypted value (uses the policy `fpe` block).

Strategies also accept supporting fields, the most common being redactionFormat (the template for REDACT, supporting %t for type, %v for the original value, and %l for a label), maskCharacter, staticReplacement, and replacementScope (DOCUMENT or CONTEXT, which keeps the same value mapping consistent within a document or a context).

Because each identifier holds an array of strategies, you can apply different handling under different conditions. The first matching strategy wins, so a more specific conditional strategy can sit ahead of a general fallback.

A worked example

Here is a small but realistic policy. It keeps the last four digits of credit cards, masks phone numbers, and redacts SSNs with a labeled format:

{
  "identifiers": {
    "creditCard": {
      "creditCardFilterStrategies": [
        { "strategy": "LAST_4" }
      ]
    },
    "phoneNumber": {
      "phoneNumberFilterStrategies": [
        { "strategy": "MASK", "maskCharacter": "#" }
      ]
    },
    "ssn": {
      "ssnFilterStrategies": [
        { "strategy": "REDACT", "redactionFormat": "[SSN REMOVED]" }
      ]
    }
  }
}

The same policy expressed in PhiSQL is a few readable lines that compile to exactly this JSON:

POLICY example;

REDACT CREDIT_CARD WITH LAST_4;
REDACT PHONE_NUMBER WITH MASK(character='#');
REDACT SSN WITH REDACT(format='[SSN REMOVED]');

How to get started

Start from an existing policy. The policy library has ready-to-use files for HIPAA Safe Harbor, PCI DSS scope reduction, and more. Download one and adjust it.
Or author it readably. Write PhiSQL and compile it, or build the policy visually in the Redaction Policy Editor.
Validate against the schema. Point your editor or CI at the canonical redaction-policy-schema.json so malformed policies are caught before they ship.
Apply it. Save the file and reference it by name from the Philter redaction API, or load it directly in an embedded Phileas instance.

For the exhaustive field-by-field reference and every supported filter strategy option, see the Phileas filter policy documentation.

Frequently asked questions

Where does the canonical schema live?

In the Phileas repository at policy-schema/redaction-policy-schema.json. That JSON Schema file is the single source of truth. Philter, Phileas, the policy library, and PhiSQL all track it.

Do I have to write policies by hand?

No. You can hand-write JSON, generate it from PhiSQL (a readable query language that compiles to this schema), build it visually with the Redaction Policy Editor, or start from a ready-made file in the policy library. Every path produces the same JSON.

What happens if a policy does not match the schema?

Philter and Phileas validate policies on load and reject malformed ones. The policy library also validates every contributed policy against a vendored copy of the schema in CI, so drift is caught before it ships.

How do I control which strategy applies to which entity?

Each identifier carries its own array of filter strategies. The first strategy whose condition matches is applied, so you can, for example, mask most credit cards but keep the last four digits when a condition holds. See the filter strategies section above.