twarimitswe-aaron / gatekeeper-cdr
A zero-trust Content Disarm and Reconstruction (CDR) engine for multi-format file sanitisation.
Package info
github.com/Twarimitswe-Aaron/gatekeeper-cdr
Language:Rust
pkg:composer/twarimitswe-aaron/gatekeeper-cdr
Requires
- php: >=8.1
README
๐ก๏ธ Gatekeeper
A zero-trust Content Disarm and Reconstruction (CDR) engine written in pure, memory-safe Rust.
Strip every byte of hidden metadata, embedded exploits, steganographic payloads, and trailing attachments from incoming file streams โ and reconstruct a mathematically clean output from raw pixel data up.
Table of Contents
- What is Gatekeeper?
- Why CDR?
- Architecture
- Supported Formats
- Project Structure
- Getting Started
- Using Gatekeeper as a Library
- FFI Bindings (Planned)
- Roadmap
- Contributing
What is Gatekeeper?
Gatekeeper is a static library that accepts multi-format file byte streams, surgically removes all non-pixel content, and reconstructs an immaculate output binary from the raw colour matrix upward. It is designed to be embedded directly into application source repositories via native FFI bindings โ no infrastructure changes required.
It does not scrub files in place. The entire philosophy is:
Decode to naked pixels. Re-encode from zero. Share nothing with the original.
Why CDR?
A file that "looks" clean to a human viewer can carry:
| Threat Vector | Example |
|---|---|
| Steganographic payloads | Data hidden in JPEG DCT coefficient LSBs |
| Exploit shellcode | Embedded in APP0โAPP15 markers |
| Personal data leakage | EXIF GPS coordinates, device serial numbers |
| Tracking fingerprints | ICC profile unique identifiers |
| Polyglot containers | Executable bytes after the EOI/IEND marker |
| C2 callbacks | URLs encoded inside COM/XMP marker blocks |
Classic AV scanning misses all of these. CDR eliminates the attack surface entirely by making it structurally impossible for the output to contain anything other than colour values.
Architecture
Memory Model
Gatekeeper enforces a strict zero-copy architecture at the format-detection layer:
caller buffer (&[u8])
โ
โผ
sniff_format() โ direct slice equality payload[..N] == MAGIC, zero heap
โ
โผ
disarm() โ ZCursor borrows the slice; no copy until decode
โ
โผ
sanitizer โ one heap allocation for the decoded pixel buffer
โ
โผ
SanitizedOutput โ one heap allocation for the re-encoded PNG output
The sniffer compares magic bytes using direct subslice equality (payload[..2] == JPEG_SOI). No intermediate buffers or Vec are constructed during format detection โ the comparison resolves in a single register-level load.
Typestate Pipeline
Every sanitizer enforces its stage transitions at compile time using Rust's typestate pattern with newtype tuple structs. Calling stages out of order is a compile error, not a runtime panic. Passing raw bytes to a save routine is also a compile error โ only SanitizedOutput is accepted.
RawPayload<'a>(&'a [u8]) โ zero-copy borrow; no data written
โ .decode() โ zune-jpeg decodes; all APP/EXIF/COM discarded
โผ
DisarmedMatrix(PixelMatrix) โ opaque wrapper; only formal destructuring allowed
โ .reconstruct() โ png encoder writes IHDR + IDAT + IEND only
โผ
PristineStream(Vec<u8>) โ opaque wrapper; shares zero bytes with input
โ .into_sanitized()
โผ
SanitizedOutput(Vec<u8>) โ public token; only type a save routine may accept
โ .into_bytes()
โผ
Vec<u8> โ caller-owned, metadata-free PNG
Inside the crate, inner values are always extracted via the formal pattern:
let RawPayload(bytes) = stage; // not stage.bytes let DisarmedMatrix(mat) = stage; // not stage.0 or stage.pixels let PristineStream(buf) = stage; // not stage.output let SanitizedOutput(v) = output; // not output.0
Error Model
All errors are defined in src/errors.rs as a single CdrError enum backed by thiserror. No String allocations occur at any error variant โ every branch carries fixed-size typed data.
pub enum CdrError { PayloadTooShort { got: usize }, PayloadTooLarge { got: usize, limit: usize }, UnknownFormat { magic: [u8; 4] }, JpegMissingEoi, PngMissingIhdr, JpegDecodeFailed { source: zune_jpeg::errors::DecodeErrors }, PngDecodeFailed { source: png::DecodingError }, MissingImageInfo, DegenerateDimensions { width: u32, height: u32 }, DimensionTooLarge { dimension: u32, limit: u32 }, ImageTooLarge { bytes: usize, limit: usize }, PixelBufferMismatch { expected: usize, got: usize }, PngEncodeFailed { source: png::EncodingError }, Unimplemented { format: &'static str }, // stub โ fails closed }
Supported Formats
| Format | Detection | Sanitize | Re-encode | Status |
|---|---|---|---|---|
| JPEG | โ Magic + EOI check | โ zune-jpeg decode | โ PNG output | Phase 2 โ complete |
| PNG | โ Magic + IHDR check | โ png crate decode | โ PNG output | Phase 3 โ complete |
| GIF | โ Magic check | โ gif crate decode | โ PNG output | Phase 4 โ complete |
| WebP | โ RIFF+WEBP check | โ image-webp decode | โ PNG output | Phase 4 โ complete |
| Office | โ ZIP Magic check | โ
ZIP unwrap, drop .bin |
โ ZIP re-encode | Phase 6 โ complete |
โ
%PDF- check |
โ
lopdf AST load |
โ AST strip / re-encode | Phase 5 โ complete |
Project Structure
gatekeeper/
โโโ Cargo.toml # Manifest: cdylib + rlib targets, dependencies
โโโ LICENSE # AGPLv3
โโโ CONTRIBUTING.md # Contribution guide and PR workflow
โโโ README.md # You are here
โ
โโโ examples/
โ โโโ disarm_image.rs # CLI driver: run CDR against a real file
โ
โโโ src/
โโโ lib.rs # Public API surface + format sniffer + unit tests
โโโ errors.rs # CdrError โ strongly-typed, zero-alloc error enum
โโโ sanitizers/
โโโ mod.rs # Sanitizer module index
โโโ jpeg.rs # JPEG โ pixel matrix โ PNG pipeline
โโโ png.rs # PNG โ pixel matrix โ PNG pipeline
Getting Started
Prerequisites
- Rust 1.85+ (Edition 2024 requires Rust โฅ 1.85)
rustup update stable rustc --version
Build
git clone https://github.com/Twarimitswe-Aaron/gatekeeper-cdr.git
cd gatekeeper-cdr
cargo build
This produces:
target/debug/libgatekeeper.rlibโ Rust linkable librarytarget/debug/libgatekeeper.soโ Native shared library (cdylib)
For a release (optimised) build:
cargo build --release
Run Tests
# All unit tests + doc-tests cargo test # A specific test by name cargo test detects_jpeg_format # With output (useful for debugging) cargo test -- --nocapture
Expected output:
running 8 tests
test tests::boundary_at_min_sniff_len ... ok
test tests::detects_jpeg_format ... ok
test tests::detects_png_format ... ok
test tests::rejects_empty_slice ... ok
test tests::rejects_jpeg_without_eoi ... ok
test tests::rejects_png_without_ihdr ... ok
test tests::rejects_slice_shorter_than_min ... ok
test tests::rejects_unknown_magic ... ok
test result: ok. 8 passed; 0 failed
Run the CLI Example
The examples/disarm_image.rs driver lets you test the full pipeline against any real file:
# Auto-named output โ photo.sanitized.png cargo run --example disarm_image -- photo.jpg # Explicit output path cargo run --example disarm_image -- suspicious.jpg clean.png # Works on PNG input too (format sniffer validates first) cargo run --example disarm_image -- image.png stripped.png
Sample output:
โถ Reading : suspicious.jpg
Size : 204800 bytes (200.00 KB)
Format : Jpeg
โถ Disarming...
Output : 187392 bytes (183.00 KB)
โถ Writing : suspicious.sanitized.png
โ Done. Sanitized PNG written to: suspicious.sanitized.png
Using Gatekeeper as a Library
As a Rust Dependency
Add to your Cargo.toml:
[dependencies] gatekeeper = { git = "https://github.com/Twarimitswe-Aaron/gatekeeper-cdr.git" }
Or for a local checkout:
[dependencies] gatekeeper = { path = "../gatekeeper" }
API Reference
gatekeeper::disarm(payload: &[u8]) -> Result<SanitizedOutput, CdrError>
The primary entry point. Detects format, runs the full CDR pipeline, and returns a SanitizedOutput token โ a distinct type that can only be produced by a completed pipeline run.
use gatekeeper::disarm; let raw = std::fs::read("untrusted.jpg")?; let clean = disarm(&raw)?; // Returns SanitizedOutput, not Vec<u8> std::fs::write("clean.png", clean.into_bytes())?;
To enforce that a save function only ever accepts sanitised data:
use gatekeeper::{disarm, sanitizers::jpeg::SanitizedOutput}; fn save(file: SanitizedOutput) { // raw Vec<u8> cannot be passed here std::fs::write("out.png", file.into_bytes()).unwrap(); } let raw = std::fs::read("untrusted.jpg")?; save(disarm(&raw)?);
gatekeeper::sniff_format(payload: &[u8]) -> Result<FileFormat, CdrError>
Identify the format of a byte slice without modifying or decoding it. Useful for routing in larger pipelines.
use gatekeeper::{sniff_format, FileFormat}; match sniff_format(&bytes)? { FileFormat::Jpeg => println!("It's a JPEG"), FileFormat::Png => println!("It's a PNG"), }
gatekeeper::sanitizers::jpeg::sanitize_jpeg(input: &[u8]) -> Result<SanitizedOutput, CdrError>
Call the JPEG sanitizer directly, bypassing the format sniffer.
use gatekeeper::sanitizers::jpeg::sanitize_jpeg; let output = sanitize_jpeg(&jpeg_bytes)?; // Returns SanitizedOutput let clean_png = output.into_bytes();
FFI Bindings (Planned)
The cdylib target is already compiled and emits a native shared library (.so / .dll / .dylib).
The sections below show the planned import and usage API for each target language.
These bindings do not exist yet โ they are the design target for Phases 7โ11.
| Language | Bridge / tool | Install package | Status |
|---|---|---|---|
| Node.js | napi-rs |
npm install gatekeeper-cdr |
Phase 7 โ complete |
| Python | PyO3 |
pip install gatekeeper-cdr |
Phase 8 โ complete |
| PHP | ext-php-rs |
composer require gatekeeper/cdr |
Phase 9 โ complete |
| C / C++ | Raw extern "C" |
Link libgatekeeper.so |
Phase 9 โ complete |
| Go | CGo + extern "C" |
go get github.com/Twarimitswe-Aaron/gatekeeper-cdr/bindings/go |
Phase 10 โ complete |
| Java | JNI via jni crate |
Maven / Gradle dependency | Phase 11 โ complete |
Node.js (via napi-rs)
// Install: // npm install gatekeeper-cdr // yarn add gatekeeper-cdr const { disarm, sniffFormat } = require('gatekeeper-cdr'); // --- Detect format --- const fs = require('fs'); const raw = fs.readFileSync('suspicious.jpg'); const format = sniffFormat(raw); // Returns 'Jpeg' | 'Png' console.log('Detected:', format); // --- Sanitize (returns a Buffer containing a clean PNG) --- const clean = disarm(raw); fs.writeFileSync('clean.png', clean); // --- ES Module import (planned) --- // import { disarm, sniffFormat } from 'gatekeeper-cdr';
Python (via PyO3)
# Install: # pip install gatekeeper-cdr import gatekeeper_cdr # --- Detect format --- with open("suspicious.jpg", "rb") as f: raw: bytes = f.read() fmt: str = gatekeeper_cdr.sniff_format(raw) # Returns 'Jpeg' or 'Png' print(f"Detected: {fmt}") # --- Sanitize (returns bytes containing a clean PNG) --- clean: bytes = gatekeeper_cdr.disarm(raw) with open("clean.png", "wb") as f: f.write(clean) # --- Async variant (planned for Phase 10) --- # clean = await gatekeeper_cdr.disarm_async(raw)
PHP (via ext-php-rs)
<?php // Install: // Add the compiled libgatekeeper.so to your php.ini: // extension=/path/to/gatekeeper_cdr.so // // Or via Composer (planned): // composer require gatekeeper/cdr // --- Detect format --- $raw = file_get_contents('suspicious.jpg'); $format = gatekeeper_sniff_format($raw); // Returns "Jpeg" or "Png" echo "Detected: $format\n"; // --- Sanitize (returns a string of raw PNG bytes) --- $clean = gatekeeper_disarm($raw); file_put_contents('clean.png', $clean); ?>
C / C++ (Raw FFI)
// Link against: -L. -lgatekeeper -Wl,-rpath,. // Header: #include "gatekeeper.h" #include <stdio.h> #include <stdlib.h> #include "gatekeeper.h" int main(void) { /* Read file into buffer (caller-managed memory) */ FILE *f = fopen("suspicious.jpg", "rb"); fseek(f, 0, SEEK_END); size_t len = ftell(f); rewind(f); uint8_t *raw = malloc(len); fread(raw, 1, len, f); fclose(f); /* Sanitize โ returns a heap-allocated CdrResult */ CdrResult result = gatekeeper_disarm(raw, len); if (result.ok) { FILE *out = fopen("clean.png", "wb"); fwrite(result.data, 1, result.len, out); fclose(out); } else { fprintf(stderr, "CDR error code: %d\n", result.error_code); } /* Always free the CdrResult buffer through the library */ gatekeeper_free_result(result); free(raw); return 0; }
Go (via CGo)
// Install: // go get github.com/Twarimitswe-Aaron/gatekeeper-cdr/bindings/go package main import ( "fmt" "os" gatekeeper "github.com/Twarimitswe-Aaron/gatekeeper-cdr/bindings/go" ) func main() { raw, err := os.ReadFile("suspicious.jpg") if err != nil { panic(err) } // Detect format (does not allocate, stack-only in Rust) fmt, err := gatekeeper.SniffFormat(raw) if err != nil { panic(err) } fmt.Println("Detected:", fmt) // "Jpeg" or "Png" // Sanitize โ returns []byte containing a clean PNG clean, err := gatekeeper.Disarm(raw) if err != nil { panic(err) } os.WriteFile("clean.png", clean, 0644) }
Java (via JNI)
<!-- Maven (pom.xml) --> <dependency> <groupId>io.github.twarimitswe-aaron</groupId> <artifactId>gatekeeper-cdr</artifactId> <version>0.1.0</version> </dependency>
// Gradle (build.gradle) implementation 'io.github.twarimitswe-aaron:gatekeeper-cdr:0.1.0'
import io.github.gatekeeper.GatekeeperCdr; import io.github.gatekeeper.FileFormat; import java.nio.file.Files; import java.nio.file.Path; public class Main { public static void main(String[] args) throws Exception { byte[] raw = Files.readAllBytes(Path.of("suspicious.jpg")); // Detect format FileFormat fmt = GatekeeperCdr.sniffFormat(raw); System.out.println("Detected: " + fmt); // JPEG or PNG // Sanitize โ returns byte[] containing a clean PNG byte[] clean = GatekeeperCdr.disarm(raw); Files.write(Path.of("clean.png"), clean); } }
Roadmap
- Phase 1 โ Cargo manifest, error model, format sniffer
- Phase 2 โ JPEG sanitization pipeline (typestate + zune-jpeg + png)
- Phase 3 โ PNG sanitization pipeline
- Phase 4 โ GIF and WebP support
- Phase 5 โ PDF sanitization (remove embedded JavaScript, OLE streams)
- Phase 6 โ Office format sanitization (DOCX / XLSX / PPTX)
- Phase 7 โ
napi-rsNode.js bindings โ publish to npm - Phase 8 โ
PyO3Python bindings โ publish to PyPI - Phase 9 โ
ext-php-rsPHP bindings + C/C++ raw header โ publish to Packagist - Phase 10 โ CGo Go bindings โ publish Go module to pkg.go.dev
- Phase 11 โ JNI Java bindings โ publish to Maven Central / Gradle
- Phase 12 โ Async pipeline via Tokio for streaming large files
- Phase 13 โ WASM target for browser-side CDR
Contributing
Gatekeeper is open-source under AGPLv3 and actively welcomes contributions. Please read the full guide before opening a PR:
๐ CONTRIBUTING.md
Quick summary:
- Fork the repository
- Create a branch โ
git checkout -b feat/png-sanitizer - Write tests โ new code must include unit tests
- Check โ
cargo test && cargo clippy && cargo fmt --check - Open a PR against
mainusing the PR template
For larger changes (new format support, architectural changes), please open an issue first to discuss the approach before writing code.
License
Gatekeeper is licensed under the GNU Affero General Public License v3.0 (AGPLv3).
This means:
- โ You may use, modify, and distribute this code freely
- โ You may use it in commercial applications
- โ ๏ธ If you modify it and run it as a network service, you must publish your modifications under the same license
- โ ๏ธ All derivative works must carry the AGPLv3 license
See LICENSE for the full text.