What makes redaction fail?

Common failures include drawing a cosmetic box over text, leaving hidden OCR text layers behind, failing to destroy pixels in scanned documents and losing coordinate accuracy after file conversion.

How does Simply Discover prevent cosmetic redaction?

Simply Discover rebuilds the document without the redacted content. Text characters are dropped, hidden text layers are removed and image pixels are overwritten rather than merely covered.

Does Simply Discover handle scanned documents and photos?

Yes. Scans, photos and image-based documents are handled by destroying the underlying pixels in the redacted region, not by placing a rectangle over the visible page.

Redaction | DSAR, FOI and eDiscovery

Redaction should remove data, not cover it.

Redaction looks simple, which is why it goes wrong so often. The same small set of mistakes can turn a redacted document into a data breach.

Simply Discover is built to prevent those failures across DSAR, FOI and eDiscovery workflows: text is removed, hidden OCR layers are stripped, image pixels are destroyed and conversion does not move the target.

Data removed, not covered Hidden text layers stripped Works on scans and photos Survives file conversion

DSAR workflow eDiscovery workflow

Visible page

SECRET-STRING

Recovery check

Bytes, text layer and pixels checked after export No recovery

Why it matters

Redaction fails invisibly.

A document can look redacted on screen while sensitive content remains recoverable from the raw file, the searchable text layer or the image pixels. The four failures below are real-world ways redaction leaks information. Our approach treats each one as a release-blocking failure mode.

Issue01

The black box is just a sticker

The original text and image are still in the file, underneath the rectangle.

Prevented

How we address it

We do not draw a box over the content. We rebuild the document without it. Redacted text characters are dropped entirely and image pixels are overwritten in the image data itself. Delete the overlay in a PDF editor and there is nothing beneath it; select the text and there is nothing to copy.

Original image

Pixels read back out of the redacted file

The redacted region is solid black in the raw pixel data while everything outside the box is preserved. The content is overwritten, not visually hidden.

Issue02

The hidden text layer survives

Scanned documents can carry invisible, searchable OCR text. Black out the picture and the text can still be copyable.

Prevented

How we address it

We remove both the visible pixels and the invisible searchable text behind them. A redaction that hides the image but leaves OCR text intact is a silent leak. Ours strips both, so the content cannot be searched, selected or copied out.

Before - secret visible and copyable

Scanned request record

SECRET-STRING-1439

After - burned out, text layer removed

Scanned request record

OCR layer: marker not found

Can redacted content be recovered from...	Before	After
the raw file bytes	Yes	No
the copy/paste text layer	Yes	No
the visible image pixels	Yes	No - overwritten

Issue03

Scans and photos only get a cosmetic box

Many tools can only truly redact text-based PDFs. Images and scanned documents get a box that hides nothing in the underlying file.

Prevented

How we address it

Image and scanned documents are handled the same way as everything else: the pixels inside the redaction are destroyed in the image data, not merely hidden. Issue 01 proves the pixel-level burn-in, and Issue 04 proves the same model survives conversion.

This matters for DSAR and eDiscovery because disclosure packs often contain mixed material: native PDFs, scans, screenshots, photos and converted image uploads. The redaction method has to work across all of them.

Issue04

The redaction drifts after conversion

Uploaded images are converted to PDF first, which can rescale and reposition them so a box ends up covering the wrong area.

Prevented

How we address it

Our redaction is anchored to the document after conversion, so it tracks the content wherever conversion places it. In the test model, an upload is scaled and margined onto a page, yet the redaction still removes the exact target band and nothing else.

Uploaded photo - redact the middle band only

Keep

Secret

Keep

After - middle removed, top and bottom preserved

Keep

Removed

Keep

The sensitive strip is removed to the pixel while the content above and below remains intact. The redaction follows the converted document, not the upload's original coordinate assumptions.

How we keep it honest

Every release is held to the same bar.

The assurance model is recovery-based. It is not enough for a black rectangle to appear; the known marker must be unrecoverable from every layer that matters.

Recovery test

Tested by attempted recovery

We plant a known marker, redact it, then try to recover it from the bytes, the text layer and the pixels. The check only passes if every recovery attempt fails.

Real path

The real pipeline

Tests use the same upload, conversion, redaction and export path our customers use: real services, real storage and real files out, not a simplified stand-in.

Repeatability

Re-checked on every build

The proofs are automated and repeatable, so release confidence is tied to the current product rather than a one-off demo file.

Evidence in the source proof page is produced by Simply Discover's automated redaction test suite for the manual-redaction export path. Markers are synthetic test values. Recovery checks cover overlay removal, hidden text-layer extraction and pixel sampling of the redacted region.

Redaction you can defend in DSAR, FOI and eDiscovery work.

See how redaction decisions, review state, export packs and production controls sit inside the wider Simply Discover workflow.

Explore DSAR Explore eDiscovery