Redaction | DSAR, FOI and eDiscovery

Redaction should remove data, not cover it.

Redaction looks simple, which is why it goes wrong so often. The same small set of mistakes can turn a redacted document into a data breach.

Simply Discover is built to prevent those failures across DSAR, FOI and eDiscovery workflows: text is removed, hidden OCR layers are stripped, image pixels are destroyed and conversion does not move the target.

Data removed, not covered Hidden text layers stripped Works on scans and photos Survives file conversion

Why it matters

Redaction fails invisibly.

A document can look redacted on screen while sensitive content remains recoverable from the raw file, the searchable text layer or the image pixels. The four failures below are real-world ways redaction leaks information. Our approach treats each one as a release-blocking failure mode.

Issue01

The black box is just a sticker

The original text and image are still in the file, underneath the rectangle.

Prevented
How we address it

We do not draw a box over the content. We rebuild the document without it. Redacted text characters are dropped entirely and image pixels are overwritten in the image data itself. Delete the overlay in a PDF editor and there is nothing beneath it; select the text and there is nothing to copy.

Original image
Pixels read back out of the redacted file

The redacted region is solid black in the raw pixel data while everything outside the box is preserved. The content is overwritten, not visually hidden.

Issue02

The hidden text layer survives

Scanned documents can carry invisible, searchable OCR text. Black out the picture and the text can still be copyable.

Prevented
How we address it

We remove both the visible pixels and the invisible searchable text behind them. A redaction that hides the image but leaves OCR text intact is a silent leak. Ours strips both, so the content cannot be searched, selected or copied out.

Before - secret visible and copyable

Scanned request record

SECRET-STRING-1439
After - burned out, text layer removed

Scanned request record

OCR layer: marker not found
Can redacted content be recovered from...BeforeAfter
the raw file bytesYesNo
the copy/paste text layerYesNo
the visible image pixelsYesNo - overwritten
Issue03

Scans and photos only get a cosmetic box

Many tools can only truly redact text-based PDFs. Images and scanned documents get a box that hides nothing in the underlying file.

Prevented
How we address it

Image and scanned documents are handled the same way as everything else: the pixels inside the redaction are destroyed in the image data, not merely hidden. Issue 01 proves the pixel-level burn-in, and Issue 04 proves the same model survives conversion.

This matters for DSAR and eDiscovery because disclosure packs often contain mixed material: native PDFs, scans, screenshots, photos and converted image uploads. The redaction method has to work across all of them.

Issue04

The redaction drifts after conversion

Uploaded images are converted to PDF first, which can rescale and reposition them so a box ends up covering the wrong area.

Prevented
How we address it

Our redaction is anchored to the document after conversion, so it tracks the content wherever conversion places it. In the test model, an upload is scaled and margined onto a page, yet the redaction still removes the exact target band and nothing else.

Uploaded photo - redact the middle band only
Keep
Secret
Keep
After - middle removed, top and bottom preserved
Keep
Removed
Keep

The sensitive strip is removed to the pixel while the content above and below remains intact. The redaction follows the converted document, not the upload's original coordinate assumptions.

How we keep it honest

Every release is held to the same bar.

The assurance model is recovery-based. It is not enough for a black rectangle to appear; the known marker must be unrecoverable from every layer that matters.

Recovery test

Tested by attempted recovery

We plant a known marker, redact it, then try to recover it from the bytes, the text layer and the pixels. The check only passes if every recovery attempt fails.

Real path

The real pipeline

Tests use the same upload, conversion, redaction and export path our customers use: real services, real storage and real files out, not a simplified stand-in.

Repeatability

Re-checked on every build

The proofs are automated and repeatable, so release confidence is tied to the current product rather than a one-off demo file.

Evidence in the source proof page is produced by Simply Discover's automated redaction test suite for the manual-redaction export path. Markers are synthetic test values. Recovery checks cover overlay removal, hidden text-layer extraction and pixel sampling of the redacted region.

Redaction you can defend in DSAR, FOI and eDiscovery work.

See how redaction decisions, review state, export packs and production controls sit inside the wider Simply Discover workflow.