Transducers in PHP

Rich Hickey recently announced that transducers are going to be added to Clojure, and it prompted quite a bit of discussion. After a somewhat brief announcement, Hickey followed up with a couple videos that describe transducers in much more detail: Transducers and Inside Transducers + more.async

Transducers are a very powerful concept that can be utilized in almost any language. In fact, they have been ported to various other languages including JavaScript (1 and 2), Python, Ruby, Java, and Go.

And now…transducers are available in PHP via transducers.php!

So, erm… What’s a transducer?

From the Clojure documentation:

Transducers are composable algorithmic transformations. They are independent from the context of their input and output sources and specify only the essence of the transformation in terms of an individual element. Because transducers are decoupled from input or output sources, they can be used in many different processes - collections, streams, channels, observables, etc. Transducers compose directly, without awareness of input or creation of intermediate aggregates.

This is a fancy way of saying that you can use functions like map and filter on basically any type of data source (not just sequences), and by using a step function, transducers can directly output to any type of data structure like iterators, filling an array, writing to a stream, or even invoking a custom function.

Eager evaluation

Transducers can be applied eagerly.

<?php
use Transducers as t;

$xf = t\map(function ($x) { return $x + 1; };
$result = t\xform([1, 2, 3], $xf);
//> [2, 3, 4]

The above example creates a transformation function in $xf that increments each value passed through the transducer by 1. $xf is then applied to an array [1, 2, 3] using the transducers\xform() function. This returns the same type of supplied data structure with the transducer applied to it (so given an array it returns an array, given a string it returns a string, given a stream it returns a stream with an applied filter, etc.).

Eager transformations are kinda like this (you consume and emit all of the data):

Lazy evaluation

We can also use transducers to lazily transform data.

<?php
$forever = function () {
    $i = 0;
    while (1) yield $i++;
};

// Create a transformation function that multiplies each
// value by two and takes ony 100 values.
$xf = t\comp(
    t\map(function ($value) { return $value * 2; }),
    t\take(100)
);

// Lazily transform the given generator using t\xform()
foreach (t\xform($forever(), $xf) as $value) {
    echo $value;
}

The above example lazily applied a transducer to a generator by creating another generator that applies a reducing step to each value yielded by the generator.

Transducers are composable

Transducers are composable, allowing you to create a transformation pipeline of data. Take the following example:

<?php
use Transducers as t;

$xf = t\comp(
    t\drop(2),
    t\map(function ($x) { return $x + 1; },
    t\filter(function ($x) { return $x % 2; },
    t\take(3)
);

$data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
$result = t\xform($data, $xf); //> [5, 7, 9]

We just used the transducers\comp() to compose multiple transducers together into a single function. This composed function applies each transducer function, in the order they are defined, to a series of data. Each transducer performs its operation before deciding whether and how many times to call the next transducer in the pipeline. This means that the t\drop() transducer skips the first two elements of the input series and does not bother calling the subsequent transducers for the elements it skips.

To be clear, this means the pipeline order is as follows: drop -> map -> filter -> take

Available transducers

transducers.php comes with a number of built-in transducer functions.

function description
map(callable $f) Applies a map function $f to each value in a collection.
filter(callable $pred) Filters values that do not satisfy the predicate function $pred .
remove(callable $pred) Removes anything from a sequence that satisfied $pred .
cat() Transducer that concatenates items from nested lists.
mapcat(callable $f) Applies a map function to a collection and concats them into one less level of nesting.
flatten() Takes any nested combination of sequential things and returns their contents as a single, flat sequence.
partition($size) Partitions the source into arrays of size $size .
partition_by(callable $pred) Split inputs into lists by starting a new list each time the predicate passed in evaluates to a different condition (true/false) than what holds for the present list.
take($n) Takes $n number of values from a collection.
take_while(callable $pred) Takes from a collection while the predicate function $pred returns true.
take_nth($nth) Takes every nth item from a sequence of values.
drop($n) Drops $n items from the beginning of the input sequence.
drop_while(callable $pred) Drops values from a sequence so long as the predicate function $pred returns true.
replace(array $smap) Given a map of replacement pairs and a collection, returns a sequence where any elements equal to a key in $smap are replaced with the corresponding $smap value.
keep(callable $f) Keeps $f items for which $f does not return null.
keep_indexed(callable $f) Returns a sequence of the non-null results of $f($index, $input) .
dedupe() Removes duplicates that occur in order (keeping the first in a sequence of duplicate values).
interpose($separator) Adds a separator between each item in the sequence.
tap(callable $interceptor) Invokes interceptor with each result and item, and then steps through unchanged.
compact() Trim out all falsey values.
words($maxBuffer = 4096) Splits the input by words.
lines($maxBuffer = 10240000) Splits the input by lines.

Please consult the README for a full list of transducers and their descriptions.

Applying transducers

transducers.php offers several built-in ways of applying transducers to common data structures.

  • function transduce(callable $xf, array $step, $coll, $init = null) - Transform and reduce $coll by applying $xf($step)['step'] to each value.
  • function into($target, $coll, callable $xf) - Transduces items from $coll into the given $target, in essence “pouring” transformed data from one source into another data type.
  • function to_iter($coll, callable $xf) - Creates an iterator that lazily applies the transducer $xf to the $input iterator.
  • function to_array($iterable, callable $xf) - Converts a value to an array and applies a transducer function. $iterable is passed through vec() in order to convert the input value into an array.
  • function to_assoc($iterable, callable $xf) - Creates an associative array using the provided input while applying $xf to each value. Values are converted to arrays that contain the array key in the first element and the array value in the second.
  • function to_string($iterable, callable $xf) - Converts a value to a string and applies a transducer function to each character. $iterable is passed through vec() in order to convert the input value into an array.
  • function xform($coll, callable $xf) - Returns the same data type passed in as $coll with $xf applied.

Each of these functions apply transducers and provide results in different ways. You can find more information and examples in the README.

Applying to streams of data

Remember when we were talking about how transducers don’t know anything about the data source or the output destination?

Here’s a great example of where this comes into play: transducers can be applied to PHP streams using stream filters.

<?php
use transducers as t;

$f = fopen('php://temp', 'w+');
fwrite($f, 'testing. Can you hear me?');
rewind($f);

$xf = t\comp(
    // Split by words
    t\words(),
    // Uppercase/lowercase every other word.
    t\keep_indexed(function ($i, $v) {
        return $i % 2 ? strtoupper($v) : strtolower($v);
    }),
    // Combine words back together into a string separated by ' '.
    t\interpose(' ')
);

// Apply a transducer stream filter.
$filter = t\append_stream_filter($f, $xf, STREAM_FILTER_READ);
echo stream_get_contents($f);

// Be sure to remove the filter to flush out any buffers.
stream_filter_remove($filter);
echo stream_get_contents($f);
fclose($f);

// Outputs: "testing. CAN you HEAR me?"

Because transducers don’t know anything about the sequence of data being transformed, you can apply transducers in any way you want.

Creating transducers

To create a transducer, you need to create a function that returns the transducer (a transformation function). This function then returns another function that is the actual transducer. Let’s take a look at how map() is implemented.

<?php

function map(callable $f)
{
    return function (array $xf) use ($f) {
        return [
            'init'   => $xf['init'],
            'result' => $xf['result'],
            'step'   => function ($result, $input) use ($xf, $f) {
                return $xf['step']($result, $f($input));
            }
        ];
    };
}

Notice that map is a function that returns a transducer. A transducer is a function that accepts a reducing function array and returns a new reducing function array with added behavior. The above map function decorates the provided reducing function array step function by calling a map function on each value as it passes through.

Instead of using a 3 arity function like Clojure, transducers.php uses an associative array of functions for speed and clarity.

Reducing function array

Reducing function arrays are PHP associative arrays that contain a ‘init’, ‘step’, and ‘result’ key that maps to a function.

key arguments Description
init none Invoked to initialize a transducer. This function should call the ‘init’ function on the nested reducing function array $xf , which will eventually call out to the transducing process. This function is only called when an initial value is not provided while transducing.
step $result , $input This is a standard reduction function but it is expected to call the $xf['step'] function 0 or more times as appropriate in the transducer. For example, filter will choose (based on the predicate) whether to call $xf or not. map will always call it exactly once. cat may call it many times depending on the inputs.
result $result Some processes will not end, but for those that do (like transduce), the ‘result’ function is used to produce a final value and/or flush state. This function must call the $xf['result'] function exactly once.

Transducers vs generators

You might be thinking that generators in PHP offer similar functionality to transducers– and you’d be right! However, transducers offer a number of benefits over just decorating iterators with generators.

Transducers are more composable than generators

When composing generators, you need to wrap iterators from the inside out, which can lead to code that’s hard to read and deeply nested. Let’s compare transducer composition vs generator compositon (using to nikic’s iter library).

The following composed transducer is a function that creates a pipeline for transforming data: it skips the first two elements of a collection, adds 1 to each value, filters out even numbers, then takes 3 elements from the collection. This new transformation function can be used with various transducer application functions, including to_iter.

Assume both examples have access to a $data variable that is traversable.

<?php
$iter = t\to_iter($data, t\comp(
    t\drop(2),
    t\map(function ($x) { return $x + 1; },
    t\filter(function ($x) { return $x % 2; },
    t\take(3)
));

Note: to_iter() uses a generator under the hood to lazily apply the transducer

Here’s the same transformation using generators (notice it’s awkwardly composed from the inside out):

<?php
$iter = iter\take(3, iter\filter(
    function($x) { return $x % 2; },
    iter\map(
        function ($x) { return $x + 1; },
        iter\drop(2, $data)
    )
));

Transducers are more generic than generators

Using generators to transform a sequence of data requires that your input data is iterable and that your output is iterable. Generators have knowledge of the traversal, transformation, and building up of a result. Transducers on the other hand are only concerned with transforming data and as such can transform any type of data source including but not limited to iterable data. Transducers can perform an action on each item in a series of data (like writing to a stream) without creating an intermediate sequence whereas a generator would require you to foreach over an iterator and then perform the action. This flexibility is demonstrated in the transducer stream filter.

Try it out!

Transducers are an exciting new innovation from the Clojure community that is now available to the PHP community. Give it a try and let me know what you think.