opencat / tmx
TMX 1.4b parser and writer for the OpenCAT Framework
Requires
- php: ^8.2
- ext-dom: *
- ext-libxml: *
- ext-xmlreader: *
- opencat/core: ^0.1
Requires (Dev)
- phpunit/phpunit: ^11.0
This package is auto-updated.
Last update: 2026-05-09 00:57:54 UTC
README
TMX 1.4b parser and writer for the OpenCAT Framework.
Converts between TMX files and TranslationUnit objects. Used by opencat/translation-memory for import and export. You can also use it standalone for TMX manipulation.
Installation
composer require opencat/tmx
Requires ext-dom, ext-libxml, and ext-xmlreader.
Reading TMX files
DOM mode — small files (up to ~10,000 TUs)
use CatFramework\Tmx\TmxReader; $reader = new TmxReader(); $units = $reader->read('memory.tmx'); // returns TranslationUnit[] foreach ($units as $unit) { echo $unit->sourceLanguage . ' → ' . $unit->targetLanguage . PHP_EOL; echo $unit->source->getPlainText() . PHP_EOL; echo $unit->target->getPlainText() . PHP_EOL; }
Streaming mode — large files (100k+ TUs)
Uses XMLReader internally; yields one TranslationUnit at a time without loading the full document into memory.
foreach ($reader->stream('large-memory.tmx') as $unit) { // process $unit — only one TU is in memory at a time $tm->store($unit); }
Writing TMX files
use CatFramework\Tmx\TmxWriter; use CatFramework\Core\Model\TranslationUnit; $writer = new TmxWriter(); $writer->write($units, 'exported.tmx'); // $units is TranslationUnit[]
Inline code support
TMX inline elements are mapped to InlineCode objects:
| TMX element | InlineCode type | Notes |
|---|---|---|
<bpt> |
OPENING |
Begin paired tag (bold open, link open, etc.) |
<ept> |
CLOSING |
End paired tag |
<ph> |
STANDALONE |
Standalone placeholder (line break, etc.) |
<it pos="begin"> |
OPENING, isolated |
Isolated tag (tag boundary crossed a TU) |
<it pos="end"> |
CLOSING, isolated |
Isolated tag |
The i attribute is used as the InlineCode::$id pairing key (equivalent to XLIFF's rid).
Language matching
The srclang attribute on the <header> determines which <tuv> is the source. Matching is:
- Exact case-insensitive match (
"en-US"=="EN-US") - Prefix match (
srclang="en"matchesxml:lang="en-US") - First
<tuv>if no match
TMX file structure
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE tmx SYSTEM "tmx14.dtd"> <tmx version="1.4"> <header srclang="en-US" .../> <body> <tu tuid="1" creationid="translator@example.com" creationdate="20240101T120000Z"> <prop type="x-domain">Legal</prop> <tuv xml:lang="en-US"><seg>Source text.</seg></tuv> <tuv xml:lang="fr-FR"><seg>Texte cible.</seg></tuv> </tu> </body> </tmx>
<prop> elements are stored in TranslationUnit::$metadata. The creationid attribute maps to TranslationUnit::$createdBy.
Related packages
opencat/core—TranslationUnit,Segment,InlineCodeopencat/translation-memory— usesTmxReader::stream()forimport()andTmxWriterforexport()