Redaction for Legal and E-Discovery: Privilege, Rule 9037, and the In-House Counsel's Pipeline
Legal work has more redaction in it than almost any other industry — and far less automation than the volume justifies. Court filings get hand-redacted by paralegals with black markers. Discovery productions get scrubbed in Relativity by associates billing $400/hour to draw rectangles over names. M&A due diligence rooms get sanitized one document at a time. The result is a category that spends enormous sums on a problem that's largely solvable with software.
This post is the third in our vertical series — after the HIPAA Safe Harbor blueprint and the financial-services breakdown. The legal context maps cleanly onto the same architectural patterns: identify what's sensitive, transform it consistently, audit the result. The specifics differ in three ways — the regulatory framework, the entity set, and the consequence of getting it wrong.
The legal redaction landscape
Four contexts dominate where redaction matters in legal work:
- Court filings. Most directly, FRCP Rule 5.2 and the parallel Bankruptcy Rule 9037 mandate redaction of specific identifiers (SSNs to last four, financial account numbers, birthdates to year only, minors' names to initials) before filing. State-court rules generally mirror this.
- E-discovery production. Under FRCP 26(b) and 34, parties produce documents responsive to discovery requests. Anything privileged or protected has to be withheld or redacted, and a privilege log has to justify each withholding. ESI production multiplies the volume; modern matters routinely involve millions of documents.
- Privilege review. Attorney-client and work-product privilege protections require redacting privileged content before disclosure. Inadvertent waiver is a real risk; FRE 502 provides a clawback safety net but doesn't excuse the original lapse.
- M&A due diligence. Buyers want to see operational data; sellers want to limit exposure pre-close. Data rooms get populated with redacted versions of customer lists, employee records, contracts, and financial statements.
Adjacent to these: settlement-agreement preparation (often public-records-eligible), FOIA response redaction (government agencies), and class-action notice preparation (named plaintiffs only; class members anonymized).
The identifier set, mapped
Legal redaction's identifier set partially overlaps with HIPAA's and GLBA's, with a few legal-specific additions:
| Identifier | Source rule | Philterd handling |
|---|---|---|
| Social Security Numbers (last 4 visible) | FRCP 5.2(a), FRBP 9037(a) | Built-in SSN detector with mask-to-last-4 strategy |
| Taxpayer ID numbers | FRCP 5.2(a), FRBP 9037(a) | Built-in TIN detector |
| Financial account numbers (last 4 visible) | FRCP 5.2(a), FRBP 9037(a) | Custom identifier filter with mask strategy |
| Birthdates (year only) | FRCP 5.2(a), FRBP 9037(a) | Date filter with year-only redaction |
| Minors' names (initials only) | FRCP 5.2(a), FRBP 9037(a) | NER + initials-replacement strategy; dictionary-driven for known minors |
| Home addresses | State court rules, privacy orders | NER + address detector with mask or replace |
| Personal email and phone | State court rules, sealed-document orders | Built-in detectors |
| Driver's license / passport | FRBP 9037(a)(5) | Built-in driver's license + passport detectors |
| Privileged communications | FRE 501, FRE 502, work-product doctrine | Custom dictionary (attorney names + privilege phrases); confidence-gated review queue |
| Trade secrets / confidential business info | Protective orders, NDAs | Custom dictionary; case-specific |
| Patient health information (in personal-injury cases) | HIPAA + protective order | Healthcare lens on PhEye; Safe Harbor patterns apply |
The structured identifiers (SSN, TIN, account numbers, dates) are essentially "solved" — Phileas's built-in detectors handle them with high accuracy. The harder problem is in the unstructured material: privileged communications, trade secrets, case-specific confidential information. That's where domain-specific tuning (and frequently a custom-trained lens) earns its keep.
Architecture 1: court-filing redaction
The simplest workflow. An attorney drafts a filing; before submission, it runs through a redaction pass to enforce Rule 5.2 / Rule 9037 compliance. Most law firms today do this manually with Acrobat's redaction tools; the volume is small enough per filing that automation feels like overkill — until you measure the partner time spent reviewing paralegal redactions and realize otherwise.
Attorney draft ──▶ Philter (Rule 5.2 / 9037 policy)
│
▼
Redacted draft + report
│
▼
Paralegal review of report
│
▼
Final filingThe key non-obvious feature: the report. Philter returns not only the redacted document but a structured list of what was redacted, where, and with what confidence. The paralegal review becomes "verify the 47 things the system caught and flag anything it missed" instead of "read the entire document looking for things to redact." That's an order-of-magnitude time reduction, and the audit trail goes with the filing.
For the bankruptcy-specific workflow, this maps onto exactly the case study already on the Philter product page — a legal firm processing Microsoft Word bankruptcy filings, with Philter wired to a Windows shared drive via the Philter Toolbox so saved documents get processed automatically. 100% identification of SSNs, TINs, financial account numbers, and birthdates in the sample data; juvenile names caught via dictionary + NER.
Architecture 2: e-discovery production
This is where the volume problem becomes serious. A typical commercial-litigation matter involves a few hundred thousand to tens of millions of documents pulled from custodian email accounts, file shares, chat platforms, and structured systems. After culling and relevance review, what remains has to be produced in a Bates-stamped, redacted form.
Most matters use Relativity, Logikcull, or one of the other e-discovery platforms. These tools handle the workflow (custodian management, search-term review, production tracking) extremely well, but their built-in redaction is shape-based — an associate draws a box on a TIFF page. For documents that need entity-aware redaction (SSNs, patient names in a healthcare matter, account numbers in a financial dispute), an automated pre-processing pass through Philter dramatically reduces what hits human review.
Custodian sources ──▶ Collection ──▶ Relativity (culling, review)
│
▼
Responsive document set
│
▼
Philter pre-redaction pass
(matter-specific policy)
│
▼
Pre-redacted documents + report
│
▼
Associate review of high-confidence
or low-confidence flags only
│
▼
Production set (Bates-stamped)The matter-specific policy is the part that takes domain expertise. A patent dispute redacts different things than a securities class action, which redacts different things than an employment matter. Most of our legal-vertical consulting work starts here: walk through the protective order, the production agreements, and the privilege framework with the case team, then encode it as a Philter policy file.
Architecture 3: privilege review
Privilege is the highest-stakes category in any production. Inadvertent privilege waiver can be catastrophic; FRE 502 provides a clawback safety net but doesn't excuse the original lapse. Traditional privilege review combines keyword-search-based culling with manual attorney review of every flagged document.
The automated layer here isn't a replacement for attorney judgment — it's a triage. A custom Phileas dictionary loaded with attorney names, law-firm names, and privilege-flag phrases ("attorney work product," "subject to attorney-client privilege," "for purposes of obtaining legal advice") identifies candidate-privileged documents with high recall. The output is a triage queue:
- High-confidence privileged. Multiple flags hit; almost certainly privileged. Withhold; log; brief partner review only.
- Possible privilege. Single flag or ambiguous context. Full associate review.
- No flags. Almost certainly not privileged. Spot-check only.
The point isn't to let the software make the call — it's to make sure no document reaches production without some attention paid to the privilege question. The audit trail (which documents were flagged, why, and who reviewed) is the deliverable when opposing counsel asks how you handled privilege review.
Architecture 4: M&A due diligence rooms
Sell-side counsel populates a virtual data room with operational data: customer lists, employee records, vendor contracts, financial statements. The buyer needs enough detail to evaluate; the seller wants to limit exposure of trade secrets and PII pre-close. The redaction pass between "internal documents" and "data-room versions" is exactly what Philter is built for.
Beyond the standard PII categories (employee SSNs in HR files, customer NPPI in CRM exports, patient data in healthcare contracts), the case-specific redaction is the harder part:
- Trade-secret terminology that shouldn't enter the data room until after the LOI
- Named customer references in contracts (sometimes specific marquee customers are the asset being sold; sometimes they're protected by NDA)
- Employee names in compensation data (often retained at director level and above, redacted below)
- Active matter references in litigation logs (privilege concerns)
The pattern is custom-dictionary-driven: per-deal Phileas configurations encode the specific terms, names, and identifiers to remove. The deal closes; the configuration retires. Repeatable across deals; tunable per deal.
What the audit looks like
For matters that go to trial or that face later scrutiny (post-close M&A disputes, opposing-counsel motions about discovery completeness), the audit story is the artifact that matters:
- The policy file — what entity types were redacted, with what strategy, under what conditions. Version-controlled.
- The discovery report — from Phinder, a per-source inventory of what entity types appeared in the source data.
- The redaction report — per-document, what was redacted and where. Generated automatically by Philter.
- The precision/recall validation — from Philter Scope against a representative gold-standard sample. Proves the policy was operating within documented tolerances.
- The privilege log — for any document withheld in whole, the legally-required justification.
This is the same artifact pattern we recommended for healthcare: continuous evidence of due care, not a moment-of-truth review.
What legal redaction usually doesn't need
One thing that comes up early in many legal-vertical conversations: image and video redaction. The Philterd toolkit is text-focused. Documents that arrive as images (scanned faxes, photographed exhibits) need OCR first; videos with sensitive content need separate tooling. We're happy to integrate with both as upstream steps, but Philter itself isn't an OCR engine or a video processor — it's the text-redaction step that comes after.
Similarly, automated privilege determination remains a human-judgment problem. Software can triage; software cannot decide. Most matters in 2026 are using ML-assisted review in the predictive-coding sense for relevance; the privilege question still requires associate eyes on every borderline document.
The bottom line
Legal redaction is a category where the volume of human time spent is genuinely disproportionate to the difficulty of the underlying problem. Rule 5.2 / 9037 identifier redaction is essentially solved by software; e-discovery pre-redaction reduces associate review time by an order of magnitude when configured well; privilege triage replaces full attorney review with prioritized attorney review. None of these replaces legal judgment — they free it up for the parts that actually require it.
If you're at a firm or in-house team that's running these workflows manually (or with the limited redaction tooling built into your e-discovery platform) and want to talk through the architecture for your specific practice, get in touch. Most engagements start with a precision/recall evaluation on a sample of your real documents — measured, not claimed — before committing to a deployment.