mtdowling

A Case for Higher Level PHP Streams in PSR-7

There’s been a lot of talk lately about the PSR HTTP message proposal, PSR-7. The purpose of the proposal is to create a shared interface that can be used by projects to interact with HTTP messages for both clients and servers.

When I created the proposal, I envisioned the purpose is not to say projects that utilize HTTP messages need to make breaking changes to use the proposed interfaces, but rather give projects an interface for which they can create an adapter. For example, if there are swaths of changes to the proposal before it is accepted, then it is very unlikely for Guzzle (a very popular PHP HTTP client that I created) to utilize the interfaces directly. I also very much doubt that projects like Symfony or Zend Framework would update their HTTP message interfaces to match (at least in the near future).

Message Bodies

The biggest point of contention with the proposal so far has been deciding on how the body of an HTTP message will be represented. In the current proposal, the body of an HTTP message is exposed using a StreamInterface that provides the following methods:

<?php

namespace Psr\Http\Message;

interface StreamInterface
{
    public function __toString();
    public function close();
    public function detach();
    public function getSize();
    public function tell();
    public function eof();
    public function isSeekable();
    public function seek($offset, $whence = SEEK_SET);
    public function isWritable();
    public function write($string);
    public function isReadable();
    public function read($length);
    public function getContents($maxLength = -1);
}

As you can see, the StreamInterface provides methods that describe the stream’s capabilities (isReadable(), isWritable(), isSeekable()), can be cast to a string, and provides methods that allow you to read and write to the stream without having to load the entire stream into memory. Using the StreamInterface also consolidates all of the functions you need to interact with the data source in one easy to find place.

There are several other options that could have been utilized to represent an HTTP message:

  • string
  • iterators
  • PHP streams

Utilizing a string would have required that the entire contents of the message be loaded into memory. This won’t work when you’re interacting with web services like Amazon S3 where it is common to download objects storing gigabytes of data.

Utilizing iterators could work, but it would likely cause a significant performance penalty due to the fact that each call to next() would return only a single byte, resulting in a huge number of method calls to download large files. Furthermore, it would provide a read-only representation of a message body.

PHP Streams

PHP streams offer pretty powerful abstraction over streams of data. They allow you to implement custom stream protocols so that you can interact with various data sources as well as stream filters which allow you to add customizations to the way in which data is read or written to streams at runtime. In theory, it’s the obvious choice to use when representing HTTP message bodies. In practice, it suffers from various problems that make it an impractical choice as the data source for PSR-7.

No Auto-registering of stream protocols and filters

One of the great things about the StreamInterface approach is that you can easily implement StreamInterface to create stream decorators to add behavior to streams at runtime. This is an approach that I’ve utilized heavily in Guzzle:

To add behavior like this to PHP streams, you’d need to implement custom PHP stream wrappers and/or stream filters for each of the above decorators. The problem here is that you would need some kind of bootstrap script to register your named stream wrappers and filters before they can be utilized. There’s no way in PHP to automatically register stream filters or wrappers before they are used. However, PHP has long offered autoloading of classes, which is the method in which StreamInterface decorators would be implemented.

Exceptions cause warnings in stream wrappers and filters

I’ve been thinking lately that creating PHP resource stream decorators like I’ve listed above would be possible, and could be implemented relatively painlessly using PHP stream wrappers. It turns out it is!

I started a GitHub repository that makes it easy to create PHP stream wrapper decorators that can implement what I’ve listed above: https://github.com/mtdowling/streamer. It acts as a decorator over an existing PHP stream resource.

<?php
use GuzzleHttp\Streamer\Utils;
use GuzzleHttp\Streamer\BaseWrapper;
use GuzzleHttp\Streamer\NoSeek;

// Prevents an underlying stream from being seeked
class NoSeek extends \GuzzleHttp\Streamer\BaseWrapper
{
    public function stream_seek($offset, $whence)
    {
        return false;
    }
}

$base = Utils::create('foobar');
$f = NoSeek::wrap($base);
echo fread($f, 10); // Outputs "foobar"
fseek($f, 0);
echo fread($f, 10); // Outputs ""

That’s awesome!

The problem with this approach is that PHP streams suffer from PHP’s legacy of emitting errors and warnings. Let’s say you wanted to implement a message integrity check that creates a rolling checksum as data is read from a stream. When the last byte of data is read, the checksum is computed and validated against an expected value. If the checksums do not match, then you’d throw an exception.

<?php
class ThrowDecorator extends \GuzzleHttp\Streamer\BaseWrapper
{
    public function stream_read($count)
    {
        // Assume it's the last byte and there is a mismatch
        throw new \RuntimeException('Checksum mismatch!');
    }
}

$base = Utils::create('foobar');
$f = ThrowDecorator::wrap($base);
fread($f, 10);

Running the above example will always emit the following PHP warning followed by throwing the exception:

PHP Warning:  fread(): ThrowDecorator::stream_eof is not implemented! Assuming EOF in test.php

So even though stream_eof() is implemented, throwing an exception inside of stream_read() will always emit the above warning.

“This should be implemented as a stream filter!” you might say. Ok, let’s try that:

<?php

class ThrowFilter extends php_user_filter
{
    function filter($in, $out, &$consumed, $closing)
    {
        // Just throw immediately as an example
        throw new \RuntimeException('Checksum mismatch!');
    }
}

stream_filter_register('throws', 'ThrowFilter');
$fp = fopen('/tmp/foo', 'r');
stream_filter_append($fp, 'throws');
fread($fp, "Line1\n");

When the above example is run it emits a PHP warning before throwing the exception:

PHP Warning:  fread(): Unprocessed filter buckets remaining on input brigade in test.php on line 20

The fact that PHP streams do not allow you to throw exceptions in the various abstractions poses a huge usability issue that basically renders them useless if you were wanting to decorate the behavior at runtime without resorting to emitting PHP warnings and errors rather than utilizing PHP exceptions.

Cannot be cast to a string

Because it implements __toString(), the StreamInterface approach allows the convenience of being able to treat a stream of data as a string while still allowing for the flexibility of not loading a bunch of data all into memory. This feature will make libraries implementing this proposal much more accessible to new users without making any tradeoffs in terms of encapsulation or flexibility.

<?php
// Outputs the string representation
echo $message->getBody();

On the other hand, using PHP streams directly would require boilerplate code each time you want to convert a message body to a string:

<?php
$resource = $message->getBody();
rewind($resource);
echo stream_get_contents($resource);

Functionality is spread over many functions

To interact with PHP streams and get the same level of usability as the StreamInterface, you’d need to interact with many different methods that aren’t always easy to find (especially for a new user).

What if you wanted to know the amount of data in a PHP stream? When using the StreamInterface, it’s a simple call to getSize(). When using PHP streams, it requires the following code:

<?php
$stats = fstat($resource);

if (isset($stats['size'])) {
    $size = $stats['size'];
} else {
    $size = null;
}

What if you wanted to know whether or not a stream is seekable? Using the StreamInterface, it’s a simple call to isSeekable(). When using PHP streams, you need the following code:

<?php
$meta = stream_get_meta_data($resource);
$seekable = $meta['seekable'];

That wasn’t too bad. But what if you wanted to know whether or not a stream is readable? Using the StreamInterface: isReadable(). Using PHP streams:

<?php
$meta = stream_get_meta_data($resource);
$mode = $meta['mode'];
$readableModes = [
    'r', 'w+', 'r+', 'x+', 'c+', 'rb', 'w+b', 'r+b', 
    'x+b', 'c+b', 'rt', 'w+t', 'r+t', 'x+t', 'c+t', 'a+'
];
$isReadable = in_array($mode, $readableModes);

A similar approach would need to be taken to know if a stream is writable. As you can see attempting to get the same functionality of a StreamInterface using native PHP streams would require quite a bit of boilerplate code.

Just for reference, here are links to PHP stream related functions and filesystem functions that can be used with PHP streams:

StreamInterface concerns

Even with the all of the problems that come with using PHP streams directly, various members of the mailing list has raised concerns about the StreamInterface approach. The most common concern is that there is no way to get a native PHP stream from a StreamInterface.

The biggest thing I'm missing from the stream api, is to get to the
underlying stream.

--

We have to ensure that the common case is performant, and that it's
possible to take advantage of PHP's (fairly robust) stream handling
capabilities when feasible.  I totally get the portability and
simplicity benefits of abstracting it away from stream API, but there
needs to be a way to use, well, PHP, and do so performantly.

You want a way to represent a stream abstraction that does not necessarily actually wrap a PHP stream to be able to be used like a PHP stream. This can be achieved in a number of different ways.

Creating a PHP stream from a StreamInterface

When using Guzzle streams (a 1:1 implementation of the proposed StreamInterface), you can convert any StreamInterface instance to a PHP stream using a custom PHP stream wrapper: https://github.com/guzzle/streams/blob/master/src/GuzzleStreamWrapper.php:

<?php
use GuzzleHttp\Stream;
use GuzzleHttp\Stream\GuzzleStreamWrapper;

// Create a Guzzle stream
$stream = Stream\create('this is a string body');

// Create a PHP stream from the Guzzle stream that just wraps the Guzzle stream
$resource = GuzzleStreamWrapper::getResource($stream);
echo fread($resource, 4);
// Outputs "this"

// Notice that the original stream is still in a consistent state
echo $stream->tell(); // 4
echo ftell($resource); // 4

The great thing about this approach is that it does not break the abstraction of StreamInterface. For example, if you’ve decorated the stream to add various capabilities (e.g., checksum validation when the last byte is read), those same capabilities will still be utilized when using the native PHP stream resource that wraps the Guzzle stream. This would be the ideal approach to utilize when trying to use a StreamInterface as a native PHP stream.

Getting the underlying PHP stream from a StreamInterface

While there’s no requirement that a StreamInterface actually wraps a PHP stream resource (e.g., reading from strings, reading mocked data, etc.), it will often be the case when interacting with things like files on disks or remote sockets. In these cases, you might want to get the actual underlying stream resource from the StreamInterface.

If the StreamInterface were able to return an underlying resources that can be mutated, then you would be breaking the abstraction by abandoning any decorators and you would be leaving the StreamInterface in an inconsistent state. Because of this, the only way that the StreamInterface should return an actual underlying PHP stream resource should be when the detach() method is called (a method that already promises to leave a StreamInterface in an inconsistent state). Calling StreamInterface::detach() would return null if the StreamInterface doesn’t actually wrap an underlying resource, or would return a PHP stream resource if one is utilized.

As a matter of fact, I recently pushed a change to Guzzle’s streams API that implements this change: https://github.com/guzzle/streams/commit/368ee042ef5d88ffbc19632e0c114a54a3ac45b2. While detaching the underlying resource will not utilize any decorators that have been attached to the StreamInterface, it will provide a native PHP stream resource that does not need to utilize a custom Guzzle stream wrapper.

<?php
use GuzzleHttp\Stream;

$stream = Stream\create(fopen('/tmp/foo', 'r'));
$resource = $stream->detach();

I think that a similar change to the StreamInterface proposed in in PSR-7 should be made.

Summary

I’ve outlined the different approaches that can be taken to represent HTTP message bodies in PSR-7 and provided more motiviation as to why I proposed the StreamInterface solution. Using PHP streams directly does not allow for a robust stream decoration strategy due to the fact that PHP streams suffers from legacy PHP warnings and errors. By showing how you can use a custom stream wrapper to convert a StreamInterface to a PHP stream and by returning the underlying PHP stream resource when the StreamInterface::detach() method is called, I’ve addressed the concern that you cannot utilize native PHP streams when using a StreamInterface abstraction.

Comments