Introducing Arbiter: Human-in-the-Loop PII Redaction
Automated redaction handles most of the volume; humans handle the last few percent that automation can't. Arbiter is the open source review surface that bridges the two — built on Philter, designed for AI training data and regulated everyday workflows.
What is PII? A Practical Guide for Engineers and Compliance Teams
PII is the term everyone uses and few people define the same way. A practitioner's guide to what counts as PII, how to find it in real data, and how to handle it without breaking everything downstream.
The Hidden Difficulties of Redacting PDF Documents
PDFs leak redacted text in ways most people don't anticipate — invisible text layers, embedded files, attached portfolios, metadata, the works. A deep dive into why PDF redaction is harder than it looks, with famous failures and Philter's approach.
Redaction for Legal and E-Discovery: Privilege, Rule 9037, and the In-House Counsel's Pipeline
How automated redaction fits into legal workflows — court filings, e-discovery production, privilege review, and M&A due diligence. With identifier mappings and architectures for in-house counsel and legal-tech teams.
Redaction for Financial Services: PCI DSS, GLBA, and the Real-World Data Pipeline
A practitioner's guide to redacting NPPI and cardholder data in financial workflows — mapping PCI DSS, GLBA, and state requirements to the Philterd toolkit. With architecture patterns for call centers, KYC, and log streams.
PII vs PHI vs NPPI: An Engineer's Guide
Three acronyms that get used interchangeably and shouldn't be. A short, definitional reference for engineers and compliance leads, with the regulatory framework and the architectural implication for each.
Redaction for Insurance: Claims, Customer Data, and the State-by-State Patchwork
Insurance carriers sit at the intersection of GLBA, HIPAA, state insurance rules, and the NAIC Model Law. A practitioner's guide to redacting NPPI and PHI in claims data, adjuster notes, and customer correspondence.
Automating HIPAA Safe Harbor: A Blueprint for Healthcare Data Pipelines
How the Philterd suite maps directly to the 18 HIPAA Safe Harbor identifiers (45 CFR § 164.514(b)(2)) — with a deployment blueprint for patient data lakes, clinical research pipelines, and medical RAG systems.
Privacy Shouldn't Be a Guessing Game: Evaluating Redaction with Philter Scope
Stop hoping your redaction works. Philter Scope turns precision, recall, and F1 into a measurable, auditable health score for any redaction pipeline.
Why API-Based Redaction is a Security Antipattern
Sending sensitive data to a third-party redaction API creates the security holes you're trying to close. Here's why true data sovereignty requires a self-hosted engine — and how Philter delivers it.
Redaction for Education: FERPA, Student Records, and Research Data Pipelines
FERPA governs student records but rarely gets the architectural attention HIPAA does. A practitioner's guide for university IT, edtech vendors, and research-data teams managing student PII at scale.
What is Data Redaction? A Practical Guide
Data redaction is the process of removing sensitive information from documents and datasets — but the term covers more techniques than most people realize. A practical guide to the strategies, the trade-offs, and how to pick the right approach.
Using an LLM or Pattern-based Rules for PII/PHI Redaction
In our data-driven world, being able to protect Personally Identifiable Information (PII) and Protected Health Information (PHI) is imperative. Whether you’re securing customer data, complying with regulations like GDPR or HIPAA, or simply aiming for responsible data handling, the need to effectively redact sensitive information is crucial. Today, there are two primary approaches: leveraging the…
Shielding Your Search: Redacting PII and PHI in OpenSearch with Phinder
In today’s data-driven world, safeguarding Personally Identifiable Information (PII) and Protected Health Information (PHI) is paramount. When leveraging search platforms like OpenSearch, ensuring sensitive data remains confidential is crucial. Enter Phinder, an open-source OpenSearch plugin that leverages the power of the Phileas project to effectively redact and de-identify PII and PHI within your search results.…
Automatically Redacting PII and PHI from Files in Amazon S3 using Amazon Macie and Philter
Amazon Macie is “a data security service that discovers sensitive data using machine learning and pattern matching.” With Amazon Macie you can find potentially sensitive information in files in your Amazon S3 buckets, but what do you do when Amazon Macie finds a file that contains an SSN, phone number, or other piece of sensitive information?…
Redacting Text in Amazon Kinesis Data Firehose
Amazon Kinesis Firehose is a managed streaming service designed to take large amounts of data from one place to another. For example, you can take data from sources such as Amazon CloudWatch, AWS IoT, and custom applications using the AWS SDK to destinations Amazon S3, Amazon Redshift, Amazon Elasticsearch, and other services. In this post…
Phileas — The Open Source PII and PHI redaction engine
I am delighted to announce the project that provides the core PII and PHI redaction capabilities is now open source! Introducing Phileas, the PII and PHI redaction engine! Phileas is now available under the Apache license on GitHub. Both Philter and Phirestream use Phileas to identify and redact sensitive information like PII and PHI. Phileas does all of the heavy lifting,…
What is format-preserving encryption?
Format-preserving encryption (FPE) encrypts a value so the ciphertext looks like the same kind of value — same length, same character set — without breaking downstream systems that expect that shape. A practical guide with credit-card examples for Phileas and Philter.