goaop/dissect

A set of tools for lexical and syntactical analysis written in pure PHP

Maintainers

Package info

github.com/goaop/dissect

pkg:composer/goaop/dissect

Fund package maintenance!

lisachenko

Statistics

Installs: 74 342

Dependents: 1

Suggesters: 0

Stars: 2

3.0.0 2024-02-19 21:35 UTC

README

A pure-PHP toolkit for building custom lexers and LALR(1) parsers — fast, type-safe, and dependency-free.

GitHub Actions Workflow Status PHPStan Badge Total Downloads Daily Downloads PHP Version GitHub License Sponsor

✨ What is Dissect?

Dissect is a pure-PHP library for lexical and syntactical analysis — the foundational building blocks for any language tooling: expression evaluators, template engines, DSL interpreters, query parsers, and more.

It powers the GoAOP framework, where it parses pointcut DSL expressions into an AST for aspect-oriented programming.

Data flow

Input String
    │
    ▼
┌─────────┐      ┌──────────────┐      ┌──────────────────┐
│  Lexer  │ ───▶ │ TokenStream  │ ───▶ │  LALR(1) Parser  │ ───▶ Result / AST
└─────────┘      └──────────────┘      └──────────────────┘
                                               ▲
                                         Grammar (rules
                                          + callbacks)

🚀 Key Features

🔤 Flexible Lexers

Lexer Description
SimpleLexer Fluent builder API — define tokens with strings or regex, mark skippable tokens
StatefulLexer Context-aware tokenization with explicit state transitions (e.g. for string interpolation)
RegexLexer Abstract base class adapted from Doctrine — ultra-fast single-pass regex lexing

📐 LALR(1) Parser

  • Full LALR(1) grammar support — handles the vast majority of real-world grammars
  • Fluent grammar API — define productions and semantic actions with readable PHP closures
  • Operator precedence & associativity — built-in left(), right(), nonassoc() declarations
  • Conflict resolution — configurable strategies: shift-wins, longer-reduce, earlier-reduce
  • Precomputed parse tables — analyze once, serialize to PHP file, load instantly in production

🌳 AST Construction

  • CommonNode — ready-to-use tree node with named children and arbitrary attributes
  • Countable & iterable — traverse subtrees with standard PHP constructs

🛠 Developer Experience

  • Zero runtime dependencies — only Symfony Console as an optional CLI dep
  • PHPStan level 10 — fully typed with generics, array shapes, and readonly properties
  • CLI tool — dump parse tables and visualize automaton states as Graphviz graphs

📦 Installation

composer require goaop/dissect

⚡ Quick Example

use Dissect\Lexer\SimpleLexer;
use Dissect\Parser\Grammar;
use Dissect\Parser\LALR1\Parser;

// 1. Define a lexer
$lexer = new SimpleLexer();
$lexer->regex('INT',   '/[0-9]+/')
      ->token('PLUS',  '+')
      ->token('MINUS', '-')
      ->regex('WS',    '/\s+/')
      ->skip('WS');

// 2. Define a grammar
$grammar = new Grammar();
$grammar('Expr')
    ->is('Expr', 'PLUS', 'Expr')
    ->call(fn($l, $_, $r) => $l + $r)

    ->is('Expr', 'MINUS', 'Expr')
    ->call(fn($l, $_, $r) => $l - $r)

    ->is('INT')
    ->call(fn($t) => (int) $t->getValue());

$grammar->operators('PLUS', 'MINUS')->left()->prec(1);
$grammar->start('Expr');

// 3. Parse!
$parser = new Parser($grammar);
$result = $parser->parse($lexer->lex('3 + 5 - 2')); // → 6

📖 Documentation

Topic Description
Lexical analysis SimpleLexer, StatefulLexer, RegexLexer, performance tips
Writing a grammar Productions, callbacks, operator precedence, conflict resolution
Building an AST CommonNode, tree traversal
Common patterns Lists, comma-separated sequences, expression grammars
CLI tool Precomputing parse tables, exporting automaton graphs

🧪 Testing & Quality

# Run tests
composer test

# Run tests with coverage
composer test-coverage

# Static analysis (PHPStan level 10)
composer phpstan

🙏 Credits

Originally created by @jakubledl, extended by @WalterWoshid, maintained by the GoAOP team.

Give a ⭐ if Dissect saved you from writing a parser by hand!