Talk to an Expert

Tell us about your stack and the privacy problems you're trying to solve. We typically respond within one business day.

Prefer to skip the form? Pick a time on our calendar →
or send a message

Your Cloud. Your Data.
Zero-Trust PII Redaction.

Philterd builds open source, self-hosted privacy software for the workloads where a PII leak isn't an option — healthcare, finance, legal, and government pipelines that need redaction wired in from day one.

Every tool we ship runs inside your cloud. Your data never leaves your perimeter, never reaches a third-party API, and never lands in someone else's logs.

New · Open Source

Introducing Arbiter

Arbiter is the newest addition to our open source toolkit for PII privacy — a human-in-the-loop review surface for redaction pipelines. Reviewers see every detection in context, accept or override automated decisions, and apply structured exemption codes that flow into your audit trail. Built on Philter; designed for AI training-data prep and regulated everyday workflows.

Trusted in
  • Healthcare
  • Finance
  • Government
  • Legal
  • Insurance
  • E-commerce
  • Data & AI

The World-Class Open Source Toolkit for PII Privacy

A complete stack for finding, redacting, monitoring, and auditing sensitive data — from low-level libraries to turnkey services. Each project is Apache 2.0 licensed and developed in the open on GitHub.

Redaction API Java

Philter

Turnkey, self-hosted PII redaction with a clean API. Drops into any pipeline that needs sensitive data removed from text — and runs entirely inside your cloud.

Redaction Library JavaPython.NETGo

Phileas

The core redaction, anonymization, masking, and replacement library underneath Philter. Available in Java, Python, .NET, and Go.

AI Models & Server Python

PhEye

The trained AI and NLP models that find PII and PHI in text, plus the service that hosts them. Designed to plug directly into Phileas and Philter.

Human review Java

Arbiter

Human-in-the-loop PII redaction. Search, review, and override automated detection decisions with structured exemption codes — built for AI training-data prep and regulated everyday workflows.

Client SDKs for Java, .NET, and Go are available alongside the rest of the toolkit at github.com/philterd.

Three ways to get started

Same redaction engine, three paths. Pick the one that fits your team.

Free forever

Open Source

$0 · Apache 2.0

Run the entire Philterd toolkit yourself. Full source on GitHub — no license keys, no usage caps, no commercial review.

  • All 9 tools, full source code
  • User's guides and reference docs
  • Community support via GitHub issues
  • Every update and new release

Best for: Engineering-led teams who want to own every layer.

Engagement-based

Engaged

Custom

Work directly with the people who built the toolkit. Custom NLP models, privacy architecture, embedded engineering, and production deployment with full handoff.

  • Custom NLP model training
  • Privacy architecture review
  • Embedded engineering
  • Deployment + knowledge transfer

Best for: Healthcare, finance, and government workloads with custom requirements.

The Three Pillars of Privacy

Everything we build sits on top of three commitments: data stays with you, the source code stays open, and the AI underneath stays purpose-built for the job.

Data Sovereignty

Philter and the rest of the Philterd toolkit run inside your cloud. Your data never leaves your perimeter, never reaches a third-party API, and never lands in someone else's logs.

Open Source Integrity

Transparency is the only way to verify privacy software. Our core engine is Apache 2.0 licensed — your engineers can read every line, audit every decision, and extend the stack on their own terms.

Purpose-Built AI

Generic LLMs make poor privacy filters. We train and ship specialized NLP and deep-learning models built specifically for PII and PHI detection — accurate, tunable, and operationally affordable at scale.

Privacy AI

Specialized Large Language Models trained exclusively for high-accuracy PII discovery, classification, and redaction. An effective PII strategy combines pattern matching for structured data with AI for the unstructured text where rigid patterns fall short.

Pattern-based identification

Predefined character sequences detect structured data like credit-card numbers, SSNs, and email addresses. Fast, predictable, and lightweight to run — but rigid in the face of unstructured text.

LLM-based identification

Trained models read the linguistic context and intent around sensitive data. Highly accurate, adaptable across languages, and effective on the unstructured text where patterns alone fail.

Hybrid by design

Pattern matching provides a high-speed foundation; LLMs add the intelligent oversight unstructured text demands. The two methods complement each other inside every Philterd deployment.

Creating PII-focused LLMs

Three deliberate stages turn general-purpose model architectures into specialized PII and PHI detectors — how the data is sourced, how the model is trained, and how the result is measured. Each stage is engineered so accuracy never comes at the cost of privacy.

Data curation and synthetic generation

We assemble high-fidelity datasets spanning global PII and PHI entities — tax identifiers, medical terminology, financial records. Where privacy demands it, we generate millions of realistic synthetic records so models never see real sensitive data during training.

Contextual fine-tuning

Models are fine-tuned to identify entities based on the words surrounding them, learning linguistic intent rather than surface patterns — so they can distinguish sensitive information from harmless data that happens to share a similar structure.

Gold-standard benchmarking

Every model release is evaluated against a gold-standard benchmark suite that measures precision, recall, and F1 on the entity types that matter — so policy and architecture decisions rest on measurable performance, not vendor claims.

Curious about the technical details? Read about our hybrid approach →

From the blog

Practical posts on PII redaction, AI privacy, and self-hosted compliance.

· Philter, Redaction

Introducing Arbiter: Human-in-the-Loop PII Redaction

Automated redaction handles most of the volume; humans handle the last few percent that automation can't. Arbiter is the open source review surface that bridges the two — built on Philter, designed for AI training data and regulated everyday workflows.

Read post →

Read all posts →

Compliance and Trust

  • HIPAA
  • EU GDPR Compliant
  • CCPA Compliant

Philterd provides a zero-trust architecture for HIPAA, GDPR, and CCPA compliance. The discovery engine operates entirely within your infrastructure — 100% data sovereignty, no external API dependencies, no third-party data training.

To satisfy HIPAA Safe Harbor requirements, we pair high-speed pattern matching for structured identifiers with specialized AI models for everything else, capturing all 18 protected identifiers under 45 CFR § 164.514. Healthcare and life-sciences organizations can automate de-identification across massive datasets while preserving the utility the data needs for research and innovation.

The Zero-Trust Architecture

Most redaction solutions require a trade-off between intelligence and privacy, forcing you to send sensitive data to third-party APIs for processing. We remove this risk with a privacy-first architecture designed for zero-trust environments.

Local Execution

Our AI models and processing engines run entirely within your own VPC or on-premise hardware. No sensitive data ever leaves your secure perimeter.

Air-Gapped Ready

Engineered for high-security sectors, the Philterd suite can operate in completely offline environments with no outbound internet dependency.

Zero Data Retention

We do not and cannot see your data. Our tools process information in-memory, ensuring that your raw inputs are never logged, stored, or used to train our models.

Immutable Compliance

By keeping the entire PII lifecycle — from discovery to redaction — local, you maintain a clean chain of custody that satisfies the most stringent global security audits.

Stateless by Design

Every API call is processed independently — no session state, no shared cache, no cross-request memory. One request can't leak information from a prior one, and a restarted instance is functionally identical to a fresh one.

Open Source Transparency

Every line of the redaction engine is Apache 2.0 licensed and inspectable on GitHub. No black-box AI, no proprietary binaries — your security team can audit the code that touches your data.

Model Integrity & Synthetic Data

We believe the tools used to protect privacy should be built with the highest privacy standards. Our AI model development process is designed to ensure the "brains" of our systems are powerful, ethical, and secure.

Privacy-First Training

We use high-fidelity synthetic data to train our models. By generating millions of realistic data scenarios — from medical records to financial statements — we train our AI models to recognize sensitive entities without ever exposing them to real-world PII.

Zero Leakage Risk

Because our models are trained on synthetic datasets, there is zero risk of model memorization — no chance an LLM accidentally reveals sensitive training data in its output.

Verified Benchmarking

Every model version is rigorously tested against Philter Scope to ensure it meets our strict standards for accuracy, recall, and the reduction of false positives before it is ever released to your environment.

Need help mapping your HIPAA, GDPR, or PCI posture to a Philter deployment? Get an architecture review →

Consulting Services

Accelerate compliance and reduce leak risk by working directly with the creators of Philter. We design, build, and deploy the privacy infrastructure your team will own — not a black box you have to renew every year.

Privacy Architecture

We design end-to-end PII protection for your cloud and AI workloads — data flows, redaction layers, audit trails, and the guardrails that keep them aligned with HIPAA, GDPR, and CCPA.

Custom NLP Models

Off-the-shelf models miss the entities that matter most in your domain. We train specialized PII/PHI detectors on your data, evaluated against precision and recall you can measure.

Compliance Audits

Full-scale evaluation of your existing privacy posture against the regulatory requirements you actually have to meet — and a prioritized remediation roadmap your team can execute.

PII Incident Response

Rapid triage when a privacy incident hits production. We scope exposure, contain the leak, instrument detection, and document the timeline for regulators and counsel.

Embedded Engineering

Work directly with the creators of Philter. We pair with your developers, contribute production-grade code to your repos, and leave behind systems your team owns.

Have a specific project in mind? Schedule a 30-min call →

About Philterd

Philterd was founded in 2017 by Jeff Zemerick (LinkedIn) on a single principle: your most sensitive data should never leave your control. Years later, we're still the people building the privacy software our clients run in production — every line of code, every model, every release.

Founded in 2017

Philterd was founded by Jeff Zemerick after watching commercial privacy tools turn into proprietary black boxes — APIs that required sending sensitive data to the cloud just to redact it. We believed there was a better way.

Phileas came first

We started by building Phileas as an open source library — auditable, embeddable, and free for anyone to use. It was the proof that privacy software didn't have to be opaque. The library quickly grew into the engine behind Philter, the enterprise-grade redaction API used today by healthcare, legal, and financial organizations.

Built, not bought

Unlike vendors that wrap third-party APIs and resell the result, we own the models, the runtime, and the policy engine. Every component of the Philterd ecosystem is engineered in-house and released under Apache 2.0 — code you can read, audit, and extend.

Maintained by the people who built it

When you email us, you reach the engineers who wrote the line of code in question. No outsourced support tier, no ticket triage gauntlet — just direct access to the maintainers.

Deep NLP roots

Led by the PMC Chair of Apache OpenNLP, an Apache Software Foundation Member, and 15+ years of production NLP work. The models behind Philterd are built by the people who build the frameworks underneath them.

Privacy by design

Every product we ship runs entirely inside your perimeter. No outbound API calls, no third-party data sharing, no surprise pricing changes. The architecture isn't a marketing choice — it's a structural commitment to the original principle.

Read the full story: From Phileas to Philter →

Ready to lock down your data?

Tell us about your stack and the privacy problems you're trying to solve. We'll get back to you within one business day.