ptondereau/parquet-php

PHP extension for reading and writing Apache Parquet files, powered by arrow-rs

Maintainers

Package info

github.com/ptondereau/php-arrow-rs-ext

Language:Rust

Type:php-ext

Ext name:ext-parquet

pkg:composer/ptondereau/parquet-php

Statistics

Installs: 0

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

v0.1.0 2026-03-20 16:14 UTC

This package is auto-updated.

Last update: 2026-03-20 17:46:58 UTC


README

CI License: MIT

A PHP extension for reading and writing Apache Parquet files, powered by Rust and arrow-rs.

Supports all primitive types, nested types (lists, structs, maps), 6 compression codecs, column projection, and in-memory I/O. Requires PHP 8.2+.

Installation

PIE

pie install ptondereau/parquet-php

PIE downloads a prebuilt binary for your platform. If none is available, it builds from source (requires the Rust toolchain).

Prebuilt binaries

Prebuilt binaries are published on GitHub Releases for PHP 8.2–8.5 (NTS and ZTS):

OS Arch Libc
Linux x64 glibc, musl
Linux arm64 glibc, musl
macOS x64
macOS arm64

Build from source

# Requires Rust toolchain (https://rustup.rs)
cargo build --release
cp target/release/libparquet.so "$(php-config --extension-dir)/parquet.so"

Loading

; php.ini
extension=parquet.so

Usage

Defining a schema

use Parquet\Column;
use Parquet\Schema;

$schema = Schema::create([
    Column::int64('id')->required(),
    Column::string('name'),
    Column::double('price'),
    Column::boolean('active'),
    Column::date('created_at'),
    Column::decimal('amount', 10, 2),
]);

Writing a file

use Parquet\Writer;
use Parquet\WriterOptions;
use Parquet\Compression;

$writer = new Writer(
    WriterOptions::create()->withCompression(Compression::Snappy)
);

$writer->write('data.parquet', $schema, [
    ['id' => 1, 'name' => 'Alice', 'price' => 9.99, 'active' => true, 'created_at' => '2024-01-15', 'amount' => '123.45'],
    ['id' => 2, 'name' => 'Bob',   'price' => 4.50, 'active' => false, 'created_at' => '2024-02-20', 'amount' => '67.89'],
]);

Incremental writing

$writer = new Writer();
$writer->open('large.parquet', $schema);

foreach (array_chunk($rows, 10_000) as $batch) {
    $writer->writeBatch($batch);
}

$writer->close();

Reading a file

use Parquet\Reader;
use Parquet\ReaderOptions;

$reader = new Reader(
    ReaderOptions::create()
        ->withBatchSize(5000)
        ->withColumns(['id', 'name'])
);

$file = $reader->open('data.parquet');

echo $file->metadata()->rowCount(); // total rows

while ($batch = $file->readBatch()) {
    foreach ($batch as $row) {
        echo $row['name'];
    }
}

Nested types

$schema = Schema::create([
    Column::string('name')->required(),
    Column::list('tags', Column::string('element')),
    Column::struct('address', [
        Column::string('street'),
        Column::string('city'),
        Column::string('zip'),
    ]),
    Column::map('metadata', Column::string('key'), Column::int32('value')),
]);

$writer = new Writer();
$writer->write('nested.parquet', $schema, [
    [
        'name' => 'Alice',
        'tags' => ['admin', 'active'],
        'address' => ['street' => '123 Main St', 'city' => 'Paris', 'zip' => '75001'],
        'metadata' => ['login_count' => 42, 'score' => 100],
    ],
]);

In-memory I/O

$writer = new Writer();
$buffer = $writer->writeToString($schema, $rows);

$reader = new Reader();
$file = $reader->openString($buffer);

while ($batch = $file->readBatch()) {
    // ...
}

Compression

Six codecs are supported: Uncompressed, Snappy, Gzip, Brotli, Lz4Raw, Zstd.

use Parquet\Compression;
use Parquet\Encoding;

$options = WriterOptions::create()
    ->withCompression(Compression::Zstd)
    ->withColumnCompression('payload', Compression::Snappy)
    ->withColumnEncoding('id', Encoding::DeltaBinaryPacked);

Supported types

Type Column factory PHP representation
Boolean Column::boolean() bool
Int32 Column::int32() int
Int64 Column::int64() int
Float Column::float() float
Double Column::double() float
String Column::string() string
Date Column::date() string (Y-m-d)
DateTime Column::dateTime() string (ISO 8601)
Time Column::time() string (H:i:s.u)
Decimal Column::decimal() string
JSON Column::json() string
Enum Column::enum() string
UUID Column::uuid() string
Binary Column::binary() string
List Column::list() array
Struct Column::struct() array (assoc)
Map Column::map() array (assoc)

All columns are nullable by default. Call ->required() to enforce non-null.

For the full API, see parquet.stubs.php.

Contributing

See CONTRIBUTING.md.

License

MIT