Apache NiFi and Philter
How to use Philter to redact PII and PHI inside an Apache NiFi data flow, either through Philter's API or with an embedded NiFi processor.
Reference and how-to
Maintained guides to PII redaction with the Philterd toolkit: how redaction policies work, how to author them, and how the pieces fit together.
How to use Philter to redact PII and PHI inside an Apache NiFi data flow, either through Philter's API or with an embedded NiFi processor.
How to configure a Philter deployment for HIPAA: encryption of data at rest and in motion across AWS, Azure, and Google Cloud.
How to deploy Philter in AWS with a CloudFormation template: finding the Philter AMI, editing the template, and launching the stack.
How to replace Philter's default self-signed SSL certificate with a signed certificate from a trusted authority, using a Java keystore.
How to run an Apache reverse proxy in front of Philter for SSL termination, access control, and access logging.
How to manage Philter's configuration across instances in an auto-scaling environment, using a pre-baked machine image or an external properties file.
How to monitor a Philter deployment in AWS: application logs with CloudWatch Logs, availability via load balancer health checks, and metrics with CloudWatch Metrics.
Embeddings look like 'just numbers,' but recent research shows they're partially invertible. A practical defense guide for teams running vector stores against PII recovery attacks.
Three acronyms that get used interchangeably and shouldn't be. A short, definitional reference for engineers and compliance leads, with the regulatory framework and the architectural implication for each.
Every prompt sent to an LLM is a data egress point. Six concrete patterns for structuring prompts, redacting inputs, and scanning outputs so PII doesn't leak through the model.
Amazon Kinesis Firehose is a managed streaming service designed to take large amounts of data from one place to another. For example, you can take data from sources such as Amazon CloudWatch, AWS IoT, and custom applications using the AWS SDK to destinations Amazon S3, Amazon Redshift, Amazon Elasticsearch, and other services. In this post…
How to configure a Valkey cache so Philter maintains referential integrity (consistent replacement values) across documents and contexts in a cluster.
PDFs leak redacted text in ways most people don't anticipate: invisible text layers, embedded files, attached portfolios, metadata, the works. A deep dive into why PDF redaction is harder than it looks, with famous failures and Philter's approach.
What a redaction policy is, how the JSON schema is structured, and how to use it to control exactly which PII is detected and how each type is redacted.
In our data-driven world, being able to protect Personally Identifiable Information (PII) and Protected Health Information (PHI) is imperative. Whether you’re securing customer data, complying with regulations like GDPR or HIPAA, or simply aiming for responsible data handling, the need to effectively redact sensitive information is crucial. Today, there are two primary approaches: leveraging the…
How to call Philter from a Microsoft Power Automate (Flow) automation to redact PII and PHI from text, using a simple HTTP action.
Data redaction is the process of removing sensitive information from documents and datasets, but the term covers more techniques than most people realize. A practical guide to the strategies, the trade-offs, and how to pick the right approach.
Format-preserving encryption (FPE) encrypts a value so the ciphertext looks like the same kind of value (same length, same character set) without breaking downstream systems that expect that shape. A practical guide with credit-card examples for Phileas and Philter.
PII is the term everyone uses and few people define the same way. A practitioner's guide to what counts as PII, how to find it in real data, and how to handle it without breaking everything downstream.
A hands-on walkthrough from an empty file to a working redaction policy: detect an entity, apply it with Philter, change how it redacts, and handle false positives.