Compliance as Code: Integrating Philter into Your CI/CD Pipeline

Engineering teams shifted security left a decade ago: SAST scanners, dependency audits, and IaC linters all run in CI now, blocking the merge button when something's off. Privacy is the next thing to shift.

Most organizations still treat PII leaks the same way they treat bugs in production — surfaced by an incident, triaged by an SRE, written up in a postmortem. That's the most expensive place to catch them. Every minute of triage, every regulator notification, every customer-trust call is downstream of a failure that should have failed a build.

This post lays out how to wire Philter into the development lifecycle so PII leaks fail in dev, not in production. Treat it like running tests — because that's exactly what it is.

Why privacy belongs in CI

A few patterns we see at clients before they adopt a "compliance-as-code" stance:

Test fixtures and seed data containing real customer SSNs because someone copy-pasted from a production dump "just to repro the bug."
Documentation strings, log statements, and exception messages that quietly leak email addresses or account numbers.
Synthetic data generators with predictable bugs — a "fake" SSN that matches a real one.
Migration scripts that dump customer text into a "denormalize_temp" table that someone forgot to drop.

Every one of those is catchable in CI, before the code merges. The cost of fixing a leak when the PR is still open is roughly zero. The cost of fixing it after production has run for six weeks — with logs, backups, and downstream copies — is six figures and an OCR notification.

The four-layer integration

You don't have to do all of these at once. We typically recommend starting with layer 1 and adding the others as the policy matures.

Pre-commit hook — catches PII in staged files before it ever reaches the remote.
PR check — runs in CI on every pull request; scans the diff plus any test fixtures.
Policy regression test — runs Philter Scope against a gold-standard set and fails if precision/recall drops.
Stage environment scan — nightly job that runs Phinder over stage data stores and reports anything sensitive that shouldn't be there.

Layer 1: pre-commit hook

The fastest signal you can give a developer is "this commit contains a phone number." A small pre-commit hook posts the staged content to a local Philter instance and rejects the commit on a hit:

#!/usr/bin/env bash
# .git/hooks/pre-commit  (or use https://pre-commit.com)
set -e

PHILTER="${PHILTER_URL:-http://localhost:8080}"
FAIL=0

for file in $(git diff --cached --name-only --diff-filter=ACM); do
  case "$file" in
    *.md|*.txt|*.json|*.yaml|*.yml|*.py|*.java|*.go)
      result=$(curl -s "$PHILTER/api/find?p=ci-strict" \
                --data-binary "@$file" \
                -H "Content-Type: text/plain")
      count=$(echo "$result" | jq 'length')
      if [ "$count" -gt 0 ]; then
        echo "✗ $file contains PII:"
        echo "$result" | jq -r '.[] | "  - \(.type): \(.text)"'
        FAIL=1
      fi
      ;;
  esac
done

exit $FAIL

Pair this with a strict policy that's tuned for CI — not your production policy, which may allow some entity types. The CI policy treats any PII as a failure.

Layer 2: PR check in CI

Pre-commit hooks are advisory (developers can --no-verify them). The PR check is the gate. Here's a GitHub Actions workflow that boots Philter as a service container and scans the diff:

# .github/workflows/pii-check.yml
name: PII check

on: [pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    services:
      philter:
        image: philterd/philter:latest
        ports: ['8080:8080']
        options: --health-cmd "curl -f http://localhost:8080/api/status"
                 --health-interval 10s --health-retries 5

    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 0 }

      - name: Wait for Philter
        run: |
          for i in {1..30}; do
            curl -sf http://localhost:8080/api/status && break
            sleep 2
          done

      - name: Scan changed files
        run: |
          git diff --name-only origin/${{ github.base_ref }}...HEAD \
            | xargs -I {} bash scripts/scan-file.sh {}
        env:
          PHILTER_URL: http://localhost:8080

Where scripts/scan-file.sh is the same logic as the pre-commit hook above, exiting non-zero on a hit. Now any PR that introduces PII into the repo gets blocked at the merge gate.

Layer 3: policy regression tests

Once you have a Philter policy in production, the policy itself becomes code that needs tests. A change to a regex pattern, a confidence threshold, or an entity list can quietly tank either precision or recall. Philter Scope turns that risk into a measurable CI check.

name: policy regression

on: [pull_request]

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run Philter Scope against gold-standard set
        run: |
          philter-scope evaluate \
            --policy policies/production.json \
            --gold   test/gold-standard.jsonl \
            --output metrics.json

      - name: Fail if recall regressed
        run: |
          recall=$(jq '.recall' metrics.json)
          echo "Recall: $recall"
          # Fail if recall < 0.95
          awk -v r="$recall" 'BEGIN { exit !(r >= 0.95) }'

Two thresholds are usually enough: recall >= 0.95 (you can't ship a policy that catches less than 95% of known PII) and precision >= 0.85 (you can't ship a policy that over-redacts so much it destroys data utility). Tune to your domain — we wrote separately about how the thresholds change between healthcare, marketing, and research workloads.

Layer 4: nightly stage-environment scan

Pre-commit, PR check, and policy regression all guard the code path. None of them guard the data. Data is the part most likely to drift out of compliance — new vendor feeds, schema changes, a developer pulling a "sanitized" extract that turned out to be less sanitized than expected.

Phinder runs as a scheduled job against your stage and dev data stores:

# nightly cron
phinder scan \
  s3://company-stage-uploads/ \
  --policy policies/production.json \
  --output-format json \
  > /var/log/phinder/$(date +%F).json

# Alert if any file in stage exceeds threshold
jq -e '.[] | select((.SSN // 0) > 0)' /var/log/phinder/*.json \
  && ops-alert "Stage bucket contains SSNs"

The result is a single log of what's where, refreshed every night. When a regulator asks "do you know where customer PII lives in non-production?", the answer is "yes, here's the report from last night."

Wiring it into the developer experience

A few patterns to make this stick:

Two policies, not one. A strict CI policy (zero tolerance for PII, broad detector coverage) and a production policy (tuned for the entity types and tolerances of the live workload). Engineering can iterate on the CI policy without touching production behavior.
Annotate, don't just block. When the PR check fails, post a PR comment with the offending file and entity type. A red X with no explanation just teaches developers to bypass the check.
Allow narrow waivers. Sometimes a test fixture intentionally contains a fake-but-realistic SSN to exercise a code path. A .pii-allow file with explicit, reviewed exceptions is better than a global skip.
Track the rate over time. Plot the number of PII-blocked PRs per week. If the number is steady, your developers are still careless. If it's trending toward zero, the culture is shifting.

The bigger picture

Treating compliance as code is the same shift the industry already made with security and infrastructure. The argument is the same too: catch problems where they originate, not after they've propagated through three production deploys.

You don't need a sweeping organizational change to start. A pre-commit hook this week, a PR check next week, a policy regression test the week after. Within a month, "did the build pass?" includes "did privacy pass?" — and the answer is automated.

If you want help wiring this into a specific pipeline (or want a CI policy tuned to your codebase), talk to our consulting team. We've done this for healthcare, financial services, and developer-tools companies — the patterns are the same; the policies and thresholds are what differ.