Reproducible benchmarks
Same test set, same metrics, every run. Two engineers comparing two policies see the same numbers — no more debates about whether the new rules are actually better.
Score your redaction policies
Philter Scope is a standalone audit tool that scores redaction policies against gold-standard test data. Stop guessing whether a policy change made the pipeline better — measure it, version it, and fail the build when it regresses.
Same test set, same metrics, every run. Two engineers comparing two policies see the same numbers — no more debates about whether the new rules are actually better.
Annotate a representative sample of your real text once. Philter Scope compares any policy output against that ground truth and reports precision, recall, and F1 per entity type.
Aggregate scores hide problems. Philter Scope reports per-entity-type metrics so you can see exactly which detectors are weakest and where the next tuning pass should land.
Run it as a step in your CI pipeline. Fail the build when precision or recall regresses below a threshold — catch policy regressions before they reach production.
The evaluation report is the artifact regulators and auditors actually want to see. Demonstrate that your redaction pipeline is verifiably correct, not just "trust us, it works."
Pair with Phileas and Philter, or use against any redaction output. The evaluation logic is open — your QA team can read every line of the code that scores them.
Three ways to get going — deploy the open source yourself, spin it up from a cloud marketplace, or work with our team directly. Pick the path that fits.