OpenAI Releases Open-Weight Model for Detecting Personal Data in Text
OpenAI on Thursday released an open-weight model called Privacy Filter that detects and redacts personally identifiable information from text, making the privacy-focused tool freely available to developers and enterprises.
The model is designed to identify PII — including names, addresses, phone numbers, email addresses and financial information — with what the company described as state-of-the-art accuracy. By releasing the model’s weights publicly, OpenAI is allowing organizations to run the tool on their own infrastructure without sending sensitive data to external servers.
The release marks a notable step for OpenAI, which has faced recurring criticism over its data handling practices. Privacy Filter addresses a growing need among enterprises deploying large language models in production environments, where inadvertent exposure of personal data remains a persistent concern.
PII detection and redaction has become a critical requirement as companies integrate AI systems into workflows that process customer records, legal documents, medical files and financial statements. Existing solutions have relied on rule-based pattern matching or proprietary APIs, both of which carry limitations in accuracy or data sovereignty.
By open-weighting the model, OpenAI enables organizations to deploy Privacy Filter within their own security perimeters — a key consideration for industries subject to strict data protection regulations including HIPAA in healthcare and GDPR in the European Union.
The move also positions OpenAI alongside competitors that have embraced open-weight releases as a strategy for building developer adoption and ecosystem trust. Meta, Mistral and other AI labs have released model weights for various applications, though few have targeted privacy compliance specifically.
Privacy Filter is available immediately through OpenAI’s public repositories. The company said the model can be integrated into existing data pipelines and used as a preprocessing step before text is sent to any language model, including those from other providers.
OpenAI did not disclose the model’s parameter count or the datasets used for training.
Source