opencat / workflow
Pipeline orchestration for the OpenCAT Framework — wires filter, segmentation, TM, MT, QA, and XLIFF into a single WorkflowRunner
Requires
- php: ^8.2
- opencat/core: ^0.1
- opencat/mt: ^0.1
- opencat/project: ^0.1
- opencat/qa: ^0.1
- opencat/segmentation: ^0.1
- opencat/terminology: ^0.1
- opencat/translation-memory: ^0.1
- opencat/xliff: ^0.1
Requires (Dev)
- opencat/filter-plaintext: ^0.1
- phpunit/phpunit: ^11.0
This package is auto-updated.
Last update: 2026-05-09 00:58:00 UTC
README
Pipeline orchestration for the OpenCAT Framework.
WorkflowRunner wires filter, segmentation, TM, terminology, MT, QA, and XLIFF output into a single process() call. ProjectWorkflowBuilder constructs a fully configured runner from a ProjectManifest with no manual wiring.
Installation
composer require opencat/workflow
Quick start — from a project manifest
use CatFramework\Project\ProjectLoader; use CatFramework\Workflow\FileFilterRegistry; use CatFramework\Workflow\ProjectWorkflowBuilder; use CatFramework\FilterDocx\DocxFilter; use CatFramework\FilterPlaintext\PlainTextFilter; $manifest = ProjectLoader::load('catproject.json'); $registry = new FileFilterRegistry(); $registry->register(new DocxFilter()); $registry->register(new PlainTextFilter()); $runner = (new ProjectWorkflowBuilder($manifest))->build('fr-FR', $registry); $result = $runner->process('report.docx', 'fr-FR'); echo "Exact TM: {$result->matchStats->exact}" . PHP_EOL; echo "Fuzzy TM: {$result->matchStats->fuzzy}" . PHP_EOL; echo "MT filled: {$result->matchStats->mt}" . PHP_EOL; echo "XLIFF: {$result->xliffPath}" . PHP_EOL;
Manual wiring
Build WorkflowRunner directly when you need finer control:
use CatFramework\Workflow\WorkflowRunner; use CatFramework\Workflow\WorkflowOptions; use CatFramework\Workflow\FileFilterRegistry; use CatFramework\Segmentation\SrxSegmentationEngine; use CatFramework\Xliff\XliffWriter; use CatFramework\TranslationMemory\SqliteTranslationMemory; use CatFramework\Mt\DeepL\DeepLAdapter; use CatFramework\Qa\QualityRunner; $options = WorkflowOptions::defaults(); $options->mtFillThreshold = 0.75; // use MT when best TM match < 75% $options->autoConfirmThreshold = 1.0; // auto-lock only exact TM matches $options->autoWriteToTm = true; // feed MT output back into TM $options->writeXliff = true; $options->qaFailOnSeverity = 'error'; $runner = new WorkflowRunner( fileFilterRegistry: $registry, segmentationEngine: new SrxSegmentationEngine(), xliffWriter: new XliffWriter(), sourceLang: 'en-US', translationMemory: new SqliteTranslationMemory($pdo), mtAdapter: $deepLAdapter, qaRunner: $qaRunner, options: $options, ); $result = $runner->process('report.docx', 'fr-FR');
Pipeline steps
WorkflowRunner::process() executes these steps in order:
| Step | What happens |
|---|---|
| 1. Extract | FileFilterRegistry selects the correct filter and calls extract() |
| 2. Segment | SrxSegmentationEngine splits multi-sentence structural units into individual sentences |
| 3a. TM lookup | Looks up each segment; auto-locks exact matches; marks fuzzy matches as Draft |
| 3b. Terminology | Calls TerminologyProvider::recognize() for timing and future highlight data |
| 3c. MT fill | For segments below $mtFillThreshold, calls the MT adapter |
| 3d. Persist | Stores each SegmentPair to SegmentStore if configured |
| 3e. TM write-back | If $autoWriteToTm, stores each translated pair back into TM |
| 4. QA | Runs all registered QA checks; throws WorkflowException if $qaFailOnSeverity is hit |
| 5. XLIFF output | Writes {source}.xlf + {source}.xlf.skl to $outputDir (if $writeXliff) |
| 6. Skeleton store | Persists skeleton to SkeletonStore if configured |
Progress callback
Get notified after each segment is processed:
$runner->onSegmentProcessed(function ($pair, int $index, int $total) { echo " [{$index}/{$total}] {$pair->source->getPlainText()}" . PHP_EOL; });
WorkflowResult
process() returns a WorkflowResult:
$result->document; // BilingualDocument with all segment pairs $result->qaIssues; // QualityIssue[] $result->matchStats->exact; // count of exact TM matches $result->matchStats->fuzzy; // count of fuzzy TM matches $result->matchStats->mt; // count of MT-filled segments $result->matchStats->unmatched; // count with no TM or MT fill $result->xliffPath; // path to written XLIFF (null if writeXliff=false) $result->storeFileId; // UUID used as key in SegmentStore/SkeletonStore $result->timings; // ['extract', 'segment', 'tm', 'terminology', 'mt', 'qa', 'xliff', 'store']
FileFilterRegistry
Filters are selected by calling supports() on each registered filter in registration order. The first filter that returns true is used.
$registry = new FileFilterRegistry(); $registry->register(new DocxFilter()); $registry->register(new HtmlFilter()); $registry->register(new PlainTextFilter()); // fallback for .txt $filter = $registry->getFilter('report.docx'); // returns DocxFilter
getFilter() throws WorkflowException if no filter supports the file.
MT fill threshold
$mtFillThreshold controls when MT kicks in:
0.0(default) — MT never runs0.75— MT runs when the best TM match is below 75%1.0— MT fills any segment without an exact TM match
Related packages
opencat/core— all shared models and contractsopencat/project—ProjectManifest,ProjectWorkflowBuilder, segment/skeleton storesopencat/segmentation— sentence segmentationopencat/translation-memory— TM lookup and storageopencat/terminology— term recognitionopencat/mt— machine translation adaptersopencat/qa— quality checksopencat/xliff— XLIFF output