Sensitive data discovery scanner

Phinder

Phinder is a high-speed discovery scanner that crawls files, object storage, and document repositories to map where sensitive information actually lives across your environment. It's the step that comes before redaction — you can't protect what you can't find.

View on GitHub

Point Phinder at a bucket. Find every PII entity.

$ phinder scan s3://patient-records/ \
    --policy healthcare.json --format json

[
  { "file": "intake-0942.pdf",   "SSN": 3, "NAME": 7, "DOB": 2 },
  { "file": "lab-report-1247.docx", "SSN": 1, "PHONE": 2, "MRN": 1 }
]

Documentation → Release Notes → GitHub →

Why Phinder

Built for scale

Designed for terabytes of unstructured storage. Parallel workers, streaming I/O, and bounded memory so a discovery job never takes down the host it's running on.

Storage-aware

Native crawlers for Amazon S3, Google Cloud Storage, Azure Blob, and local filesystems. Same policy, same output format, regardless of where the documents live.

Shared policies with Philter

Define a policy once. Phinder uses it to discover; Philter uses it to redact. The entity types you found are the entity types you redact — no drift between detection and action.

Audit-ready reports

JSON, CSV, or human-readable summaries. Inventory the entity types per file, per bucket, per pipeline — exactly the artifacts auditors ask for.

OpenSearch plugin

The companion Phinder PII Plugin for OpenSearch redacts sensitive information from search results before they leave the cluster — same engine, different surface.

Compounds with the rest of the toolkit

Discovery without redaction is just inventory. Pair Phinder with Philter (to remediate what was found) and Phield (to keep watching what was missed) for a complete PII lifecycle.

Ready to use Phinder?

Three ways to get going — deploy the open source yourself, spin it up from a cloud marketplace, or work with our team directly. Pick the path that fits.

See your options