From Phileas to Philter: The Evolution of Our Open Source Engine

In software, there's a saying that "nothing is ever finished — it's just released." Looking back at the trajectory of our privacy engine, that sentiment couldn't be more accurate. What began as a focused open source experiment called Phileas has evolved into Philter — the core of a comprehensive enterprise PII suite used by healthcare providers, government agencies, and global tech firms.

Understanding the history of this engine isn't just a trip down memory lane — it's an explanation of why the software is as stable and context-aware as it is today. Here is how we moved from simple pattern matching to high-velocity, hybrid privacy intelligence.

The Phileas era: solving the "simple" problems

Every story has a beginning, and ours was a project named Phileas. At the time, the industry was struggling with a basic problem: how do we find PII in text without paying a fortune for black-box cloud services?

The early versions of Phileas were built primarily on deterministic logic. We focused on creating the most robust set of regular expressions and lookup dictionaries possible. It was fast and effective for structured data like Social Security Numbers — but we quickly hit the glass ceiling of regex. While patterns could find a 10-digit number, they couldn't tell us whether that number was a patient's phone or a serial number for a piece of equipment.

Milestone 1: the transition to context-aware NLP

As our users moved into more complex datasets — especially healthcare — we knew Phileas had to grow up. This led to the birth of Philter.

The major engineering shift was moving beyond the pattern and into the sentence. We integrated specialized NLP models that could read text in context, letting the engine differentiate between "Jordan" as a person's name and "Jordan" as a country. By combining our legacy regex patterns with these newer, context-aware models, we created the hybrid approach that Philter is known for today.

Milestone 2: optimizing for the enterprise pipe

As Philter began landing in major data pipelines, we faced a new challenge: performance at scale.

It's one thing to redact a single PDF; it's another to redact 50 million log lines per hour in a streaming Kafka environment. We spent months in a performance-first mindset, optimizing the Java-based engine to minimize memory overhead and maximize throughput. We moved away from heavy, general-purpose libraries and built custom, streamlined components designed for one job: finding and scrubbing PII as fast as humanly possible.

Phileas as a global dependency (the Graylog connection)

While Philter serves as the turnkey solution, the underlying Phileas engine remains a powerful, standalone Java library. Because it is open source, developers can include it as a direct dependency in their own software to build custom privacy features.

A prime example is its integration into the Graylog ecosystem. Security and operations teams use Phileas-powered logic to scrub PII from log messages before they're stored or indexed. By integrating the engine directly into log processing pipelines, organizations can ensure sensitive developer notes or user data never hit long-term storage — solving the privacy problem at the source. We covered this integration in more detail in Phileas in Graylog — Removing PII from Logs.

Looking ahead: a roadmap for both projects

One question we often get: "Now that Philter exists, what happens to Phileas?"

The answer is simple: they both continue to evolve. We have dedicated roadmaps for both projects, and we view them as complementary pieces of the same privacy mission.

Phileas will continue to grow as a high-performance Java library, with a focus on developer flexibility, new developer-centric features, and even deeper integrations for tools like Graylog.
Philter will continue to expand as a comprehensive enterprise suite, with plans for even more advanced AI models, broader regulatory support (like the EU AI Act), and enhanced reporting through tools like Philter Scope for policy benchmarking and Phield for real-time PII flow monitoring.

Whether you're a developer looking for a library dependency or a CISO looking for a zero-trust enterprise suite, you have access to the most advanced, battle-tested privacy tools on the market.

Why our history (and future) matters to you

When you deploy our software today, you aren't using "version 1.0" of a startup's new idea. You're using an engine that has been battle-tested through years of real-world application, millions of lines of data, and rigorous open source auditing.

We've already made the mistakes, found the edge cases, and optimized the bottlenecks. Both Phileas and Philter are products of evolution — shaped by the feedback of the open source community and the demands of enterprise security leads. And we're just getting started.

Curious about the engine powering Graylog and beyond? Check out Phileas on GitHub, or reach out to our team to see how our evolution can secure your data pipeline.