jotaelesalinas/php-simple-mapreduce

A simple in-memory map/reduce engine for PHP iterables, without workers or configuration.

Maintainers

Package info

github.com/jotaelesalinas/php-simple-mapreduce

pkg:composer/jotaelesalinas/php-simple-mapreduce

Statistics

Installs: 0

Dependents: 0

Suggesters: 0

Stars: 11

Open Issues: 1

v3.0.0 2026-06-10 08:22 UTC

This package is auto-updated.

Last update: 2026-06-10 11:31:14 UTC


README

[!IMPORTANT] This is a breaking v3 release.

Latest Version on Packagist License CI

Simple in-memory map/reduce for PHP iterables.

This library is for local data processing when you want a small, readable API and do not need distributed workers, external storage, or tuning knobs. It is the lighter counterpart to heavier MapReduce-style systems.

Why this exists

  • Works with any iterable, including arrays, generators, and custom iterators.
  • Keeps all work inside one PHP process.
  • Exposes a small fluent API that is easy to test.
  • Lets you observe progress without coupling to a logger.

Install

composer require jotaelesalinas/php-simple-mapreduce

Quickstart

<?php

declare(strict_types=1);

use JLSalinas\SimpleMapReduce\MapReduce;

$result = MapReduce::create()
    ->input([1, 2, 3, 4, 5])
    ->map(static fn (mixed $item): mixed => $item * 2)
    ->reduce(static fn (mixed $carry, mixed $item): mixed => ($carry ?? 0) + $item)
    ->run();

var_dump($result);

If you prefer a reusable callable, the same pipeline works with any callable that matches the expected signature: built-ins, named functions, static methods, closures, and invokable objects:

$doublerFn = static fn (mixed $item): mixed => $item * 2;

final class Stats
{
    public static function max(?int $carry, int $item): int
    {
        return $carry === null
            ? $item
            : max($carry, $item);
    }
}

$result = MapReduce::create()
    ->input([1, 2, 3, 4, 5])
    ->map($doublerFn)
    ->reduce([Stats::class, 'max'])
    ->run();

Semantics

  • input() accepts one or more iterable sources.
  • The pipeline runs in this order: input, input filter, mapper, group key, mapped filter, reducer.
  • filterInput() receives the raw input item and decides whether it enters the mapper.
  • map() transforms each input item before reduction.
  • groupBy() can group by array key, object property, or callback.
  • filterMapped() receives the mapped item and, when grouping is enabled, the computed group key.
  • reduce() receives the previous carry value and the mapped item.
  • progress() receives the processed count, original item, and mapped item.
  • output() can write reduced results to one or more Writer instances.

Fluent API

$result = MapReduce::create()
    ->input($items)
    ->filterInput($inputFilter)
    ->map($mapper)
    ->groupBy($groupBy)
    ->filterMapped($mappedFilter)
    ->reduce($reducer)
    ->progress($progressCallback)
    ->output($writer)
    ->run();

When to use this

  • Use this library when you need a local, readable aggregation pipeline.
  • Use a distributed engine when you need parallel workers or external storage.
  • Use php-data-streams when you need specialized streaming readers and writers for formats such as CSV, JSON, XML, or xlsx.

Examples

To run the examples locally from the repository root:

composer install
php examples/pets.php
php examples/insurance.php
php examples/benchmark-big-dataset.php

Development

composer install
composer test
composer analyse
composer format

Updating from v2.x

If you are coming from v2.x, these are the main changes to review:

  • The namespace is now JLSalinas\SimpleMapReduce. If your code still has imports like:

    use JLSalinas\MapReduce\MapReduce;

    change them to:

    use JLSalinas\SimpleMapReduce\MapReduce;
  • The public API has been modernized around a fluent pipeline. If your v2 code used setters or explicit configuration methods, replace them with the current chainable methods. For example, code that used setInput(), setMapper(), and setReducer() should now use:

    MapReduce::create()
        ->input($items)
        ->map($mapper)
        ->reduce($reducer)
        ->run();
  • If your code used setPreFilter(), setPostFilter(), or setGroupBy(), re-check the current method names and the execution order in Semantics. These callbacks still exist conceptually, but the surrounding pipeline is now organized differently.

  • progress() and output() still exist, but you should re-test any code that depends on side effects, callback order, or the exact shape of the reduced output.

  • If you are still installing the old package name, switch Composer to the new package:

    composer remove jotaelesalinas/php-mapreduce
    composer require jotaelesalinas/php-simple-mapreduce