opencat/workflow

Pipeline orchestration for the OpenCAT Framework — wires filter, segmentation, TM, MT, QA, and XLIFF into a single WorkflowRunner

Maintainers

Package info

github.com/shaikhammar/opencat-workflow

pkg:composer/opencat/workflow

Statistics

Installs: 0

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

dev-main 2026-05-09 00:57 UTC

This package is auto-updated.

Last update: 2026-05-09 00:58:00 UTC


README

Pipeline orchestration for the OpenCAT Framework.

WorkflowRunner wires filter, segmentation, TM, terminology, MT, QA, and XLIFF output into a single process() call. ProjectWorkflowBuilder constructs a fully configured runner from a ProjectManifest with no manual wiring.

Installation

composer require opencat/workflow

Quick start — from a project manifest

use CatFramework\Project\ProjectLoader;
use CatFramework\Workflow\FileFilterRegistry;
use CatFramework\Workflow\ProjectWorkflowBuilder;
use CatFramework\FilterDocx\DocxFilter;
use CatFramework\FilterPlaintext\PlainTextFilter;

$manifest = ProjectLoader::load('catproject.json');
$registry = new FileFilterRegistry();
$registry->register(new DocxFilter());
$registry->register(new PlainTextFilter());

$runner = (new ProjectWorkflowBuilder($manifest))->build('fr-FR', $registry);
$result = $runner->process('report.docx', 'fr-FR');

echo "Exact TM: {$result->matchStats->exact}" . PHP_EOL;
echo "Fuzzy TM: {$result->matchStats->fuzzy}" . PHP_EOL;
echo "MT filled: {$result->matchStats->mt}"   . PHP_EOL;
echo "XLIFF: {$result->xliffPath}"            . PHP_EOL;

Manual wiring

Build WorkflowRunner directly when you need finer control:

use CatFramework\Workflow\WorkflowRunner;
use CatFramework\Workflow\WorkflowOptions;
use CatFramework\Workflow\FileFilterRegistry;
use CatFramework\Segmentation\SrxSegmentationEngine;
use CatFramework\Xliff\XliffWriter;
use CatFramework\TranslationMemory\SqliteTranslationMemory;
use CatFramework\Mt\DeepL\DeepLAdapter;
use CatFramework\Qa\QualityRunner;

$options = WorkflowOptions::defaults();
$options->mtFillThreshold    = 0.75;   // use MT when best TM match < 75%
$options->autoConfirmThreshold = 1.0;  // auto-lock only exact TM matches
$options->autoWriteToTm      = true;   // feed MT output back into TM
$options->writeXliff         = true;
$options->qaFailOnSeverity   = 'error';

$runner = new WorkflowRunner(
    fileFilterRegistry: $registry,
    segmentationEngine: new SrxSegmentationEngine(),
    xliffWriter: new XliffWriter(),
    sourceLang: 'en-US',
    translationMemory: new SqliteTranslationMemory($pdo),
    mtAdapter: $deepLAdapter,
    qaRunner: $qaRunner,
    options: $options,
);

$result = $runner->process('report.docx', 'fr-FR');

Pipeline steps

WorkflowRunner::process() executes these steps in order:

Step What happens
1. Extract FileFilterRegistry selects the correct filter and calls extract()
2. Segment SrxSegmentationEngine splits multi-sentence structural units into individual sentences
3a. TM lookup Looks up each segment; auto-locks exact matches; marks fuzzy matches as Draft
3b. Terminology Calls TerminologyProvider::recognize() for timing and future highlight data
3c. MT fill For segments below $mtFillThreshold, calls the MT adapter
3d. Persist Stores each SegmentPair to SegmentStore if configured
3e. TM write-back If $autoWriteToTm, stores each translated pair back into TM
4. QA Runs all registered QA checks; throws WorkflowException if $qaFailOnSeverity is hit
5. XLIFF output Writes {source}.xlf + {source}.xlf.skl to $outputDir (if $writeXliff)
6. Skeleton store Persists skeleton to SkeletonStore if configured

Progress callback

Get notified after each segment is processed:

$runner->onSegmentProcessed(function ($pair, int $index, int $total) {
    echo "  [{$index}/{$total}] {$pair->source->getPlainText()}" . PHP_EOL;
});

WorkflowResult

process() returns a WorkflowResult:

$result->document;            // BilingualDocument with all segment pairs
$result->qaIssues;            // QualityIssue[]
$result->matchStats->exact;   // count of exact TM matches
$result->matchStats->fuzzy;   // count of fuzzy TM matches
$result->matchStats->mt;      // count of MT-filled segments
$result->matchStats->unmatched; // count with no TM or MT fill
$result->xliffPath;           // path to written XLIFF (null if writeXliff=false)
$result->storeFileId;         // UUID used as key in SegmentStore/SkeletonStore
$result->timings;             // ['extract', 'segment', 'tm', 'terminology', 'mt', 'qa', 'xliff', 'store']

FileFilterRegistry

Filters are selected by calling supports() on each registered filter in registration order. The first filter that returns true is used.

$registry = new FileFilterRegistry();
$registry->register(new DocxFilter());
$registry->register(new HtmlFilter());
$registry->register(new PlainTextFilter());  // fallback for .txt

$filter = $registry->getFilter('report.docx');   // returns DocxFilter

getFilter() throws WorkflowException if no filter supports the file.

MT fill threshold

$mtFillThreshold controls when MT kicks in:

  • 0.0 (default) — MT never runs
  • 0.75 — MT runs when the best TM match is below 75%
  • 1.0 — MT fills any segment without an exact TM match

Related packages