Philter
Turnkey, self-hosted PII redaction with a clean API. Drops into any pipeline that needs sensitive data removed from text — and runs entirely inside your cloud.
Star 41 Deploy →Open source PII redaction that runs entirely inside your cloud — for healthcare, finance, legal, and government workloads where a leak isn't an option.
In production since 2017. Or explore the toolkit ↓
A complete stack for finding, redacting, monitoring, and auditing sensitive data — from low-level libraries to turnkey services. Each project is released under the permissive and business-friendly Apache license and developed in the open on GitHub.
The engine and API that find and redact PII in text.
Turnkey, self-hosted PII redaction with a clean API. Drops into any pipeline that needs sensitive data removed from text — and runs entirely inside your cloud.
Star 41 Deploy →The core redaction, anonymization, masking, and replacement library underneath Philter. Available in Java, Python, .NET, and Go.
Star 97Trained models and drop-in proxies for AI workloads.
The trained AI and NLP models that find PII and PHI in text, plus the service that hosts them. Designed to plug directly into Phileas and Philter.
Star 1Drop-in proxy that redacts PII and PHI before prompts reach LLM providers like OpenAI and Anthropic Claude.
Star 2Find where PII lives and watch where it flows.
High-speed discovery scanner that crawls files and storage to map where sensitive information actually lives across your environment.
Star 1Intelligent monitoring that tracks PII flow across the organization and alerts on suspicious activity or unusual trends.
Star 2Human review, policy authoring, benchmarking, and privacy analytics.
Standalone audit tool that scores redaction policies on precision and recall, so policy changes can be measured rather than guessed at.
Star 1Human-in-the-loop PII redaction. Search, review, and override automated detection decisions with structured exemption codes — built for AI training-data prep and regulated everyday workflows.
Star 1Privacy-first analytics that applies differential privacy to PII counts, preserving statistical utility without exposing individuals.
Star 1Web console that lets non-technical users build and deploy redaction rules through a visual, no-code interface.
Star 1Not sure which one to start with? Walk through the PII redaction journey →
Client SDKs for Java, .NET, and Go are available alongside the rest of the toolkit at github.com/philterd.
Specialized Large Language Models trained exclusively for high-accuracy PII discovery, classification, and redaction. An effective PII strategy combines pattern matching for structured data with AI for the unstructured text where rigid patterns fall short.
Predefined character sequences detect structured data like credit-card numbers, SSNs, and email addresses. Fast, predictable, and lightweight to run — but rigid in the face of unstructured text.
Trained models read the linguistic context and intent around sensitive data. Highly accurate, adaptable across languages, and effective on the unstructured text where patterns alone fail.
Pattern matching provides a high-speed foundation; LLMs add the intelligent oversight unstructured text demands. The two methods complement each other inside every Philterd deployment.
Curious how the models are trained and benchmarked? Read about our hybrid approach →
Everything we build sits on top of three commitments: data stays with you, the source code stays open, and the AI underneath stays purpose-built for the job.
Philter and the rest of the Philterd toolkit run inside your cloud. Your data never leaves your perimeter, never reaches a third-party API, and never lands in someone else's logs.
Transparency is the only way to verify privacy software. Our core engine is Apache 2.0 licensed — your engineers can read every line, audit every decision, and extend the stack on their own terms.
Generic LLMs make poor privacy filters. We train and ship specialized NLP and deep-learning models built specifically for PII and PHI detection — accurate, tunable, and operationally affordable at scale.



Philterd provides a zero-trust architecture for HIPAA, GDPR, and CCPA compliance. The discovery engine operates entirely within your infrastructure — 100% data sovereignty, no external API dependencies, no third-party data training.
To satisfy HIPAA Safe Harbor requirements, we pair high-speed pattern matching for structured identifiers with specialized AI models for everything else, capturing all 18 protected identifiers under 45 CFR § 164.514. Healthcare and life-sciences organizations can automate de-identification across massive datasets while preserving the utility the data needs for research and innovation.
Need help mapping your HIPAA, GDPR, or PCI posture to a Philter deployment? Get an architecture review → · See the full compliance breakdown →
Same redaction engine, three paths. Pick the one that fits your team.
Free forever
$0 · Open source
Run the entire Philterd toolkit yourself. Full source on GitHub — no license keys, no usage caps, no commercial review.
Best for: Engineering-led teams who want to own every layer.
Per-hour billing
From $0.49/hr · ~$360/mo
Deploy Philter — our turnkey redaction API — into your VPC from the AWS, Google Cloud, or Azure marketplace. Production-ready in minutes; billed through your existing cloud account. The other Philterd tools are not yet on the cloud marketplaces.
Best for: Teams that want production-ready Philter without managing builds or ops.
Available on AWS, Google Cloud, and Azure. AWS Marketplace list price shown — see the full TCO comparison →
Engagement-based
Custom
Work directly with the people who built the toolkit. Custom NLP models, privacy architecture, embedded engineering, and production deployment with full handoff.
Best for: Healthcare, finance, and government workloads with custom requirements.
Practical posts on PII redaction, AI privacy, and self-hosted compliance.
· Philter, Redaction
Automated redaction handles most of the volume; humans handle the last few percent that automation can't. Arbiter is the open source review surface that bridges the two — built on Philter, designed for AI training data and regulated everyday workflows.
Read post →· AWS, Philter
Per-character SaaS pricing looks cheap at demo scale and gets eye-watering at production scale. A worked-example TCO comparison: AWS Comprehend, Google Cloud DLP, and self-hosted Philter on the marketplace.
Read post →· Redaction, Philter
PII is the term everyone uses and few people define the same way. A practitioner's guide to what counts as PII, how to find it in real data, and how to handle it without breaking everything downstream.
Read post →For integrators & system builders
Philter is the redaction layer integrators bundle into client deliverables. Deploys in the client's cloud, operated by you, no per-seat license, no vendor sub-license to negotiate. Reference architectures for the patterns clients actually buy.
New · Open Source
Arbiter is the newest addition to our open source toolkit for PII privacy — a human-in-the-loop review surface for redaction pipelines. Reviewers see every detection in context, accept or override automated decisions, and apply structured exemption codes that flow into your audit trail. Built on Philter; designed for AI training-data prep and regulated everyday workflows.
Accelerate compliance and reduce leak risk by working directly with the creators of Philter. We design, build, and deploy the privacy infrastructure your team will own — not a black box you have to renew every year.
We design end-to-end PII protection for your cloud and AI workloads — data flows, redaction layers, audit trails, and the guardrails that keep them aligned with HIPAA, GDPR, and CCPA.
Off-the-shelf models miss the entities that matter most in your domain. We train specialized PII/PHI detectors on your data, evaluated against precision and recall you can measure.
Pre-launch privacy review of generative AI and RAG systems. We trace PII through prompts, retrieval context, tool calls, logs, and vector stores — then ship the redaction and policy changes.
See all consulting services → · Have a specific project in mind? Schedule a 30-min call →
Tell us about your stack and the privacy problems you're trying to solve. We'll get back to you within one business day.