Michael Dowling

Requiring cURL in Your PHP Library

| Comments

I sometimes hear that people don’t want to use Guzzle (a PHP HTTP client) because it requires cURL and they want their library to be “portable”. In this post, I’ll attempt to convince you that cURL is the best option for sending HTTP requests in PHP, compare cURL against more “portable” PHP alternatives, and prove that your users will probably already have cURL installed on their systems.

“Portable” cURL alternatives

First off, let’s define “portable”. Most of the people that throw this word around imply that if it isn’t part of PHP’s core then it isn’t portable. Ok, so what are the different ways to send HTTP requests using only things provided in PHP’s core distribution?

HTTP stream wrapper

The most common alternative to requiring cURL in a PHP application is to rely on PHP’s HTTP stream wrapper. If your requirements are very limited, then this might be an OK alternative for you. There are some drawbacks to using the PHP HTTP stream wrapper that you should know about before ditching cURL for it:

  • Does not support HTTP 1.1
  • Does not support streaming uploads. Uploads for POST, PUT, etc must be from a string loaded into memory. Try uploading a 2GB file with fopen().
  • Does not support persistent HTTP connections. Opening and closing TCP connections over and over can be a massive performance penalty to an application that makes several requests to the same server.
  • Does not support the fine-grained timeout and speed limit options of cURL
  • Does not maintain cookies between requests. You would need to implement cookie management manually.
  • Does not support sending requests in parallel. Sending requests in parallel can provide significant performance gains to applications that need to send many requests at once.
  • It’s slower than cURL

Sockets

Creating a PHP HTTP client using sockets is another alternative. When I hear someone is writing a socket based HTTP client from scratch, my first thought is, “have you seen RFC 2616?!” There’s an awful lot of context, state transitions, and edge cases to consider when implementing a socket based HTTP client. Because of the complexity, developers either rarely get it right or omit swaths of HTTP/1.1 features because they are hard to implement (e.g. Expect: 100-Continue, trailing headers, multi-part messages, etc). The one PHP HTTP/1.1 socket based client that actually does a decent job sadly lacks sending requests in parallel.

There are a great deal of edge cases to consider when implementing a socket based HTTP client:

  • What if the remote server is not listening on the specified port?
  • What if the remote server takes too long to respond?
  • What if the HTTP response message is not a valid HTTP response?
  • What if the HTTP response states that it will send more data than it actually sends?
  • What if the connection is severed in the middle of the request?
  • What if a request with an Expect: 100-Continue header never receives a 100 Continue response?
  • What if you need to implement persistent connections? Did the remote server send back a Connection: close header? Did the connection close unannounced?
  • What if the remote server responds with chunked Transfer-Encoding?
  • Will you implement 300 level redirects? Will you gracefully handle Location headers that use relative URLs?
  • What if you need to maintain a cookie session between requests?
  • What if you need to use Digest authentication?
  • How will you implement persistent connections?
  • Will you support sending requests in parallel?

Need more examples that prove HTTP/1.1 is complicated? Check this out.

Trying to implement a socket based HTTP client in PHP that has a comparable feature set to cURL is a monumental undertaking. If you choose to go down this path, then Godspeed.

How ubiquitous is cURL?

Daniel Stenberg, the author of libcurl, recently wrote about this very topic. Daniel estimates that there are probably around 550,000,000 direct or indirect libcurl users. His first number was around 300M, but he realized libcurl is installed on all iOS devices. That number is just an estimate of the number of unique cURL users, so let’s narrow this down to more PHP specific data.

Shared hosts that provide cURL by default

Creating a library that requires cURL expects that your users either already have cURL installed or can install PHP’s cURL extension. The scariest part about requiring users to install cURL is that some of your users will be on a shared server without shell or root access.

With that in mind, I quickly compiled a list of shared hosting companies that include PHP’s cURL extension by default on all of their servers.

Let me know in the comments if you can think of other shared hosting providers that provide PHP’s cURL extension by default.

Who uses cURL?

PHP’s cURL extension is utilized by countless PHP libraries. Each and every one of these library authors decided that requiring cURL for their library was an acceptable requirement.

API clients that require cURL

Lots of large companies offer PHP SDKs for their web services that require cURL. Any developer that utilizes any of the following libraries have installed cURL on their system:

Who else uses cURL?

Some really popular frameworks and libraries require cURL. Developers that use or will use any of the following libraries will likely have cURL installed on their system:

Installing php-curl

Installing cURL is usually really, really easy.

  • Ubuntu: apt-get install php5-curl
  • Fedora / Amazon Linux: yum install php-curl
  • WAMP
    1. Left-click on the WAMP server icon in the bottom right of the screen
    2. PHP -> PHP Extensions -> php_curl

Conclusion

cURL is everywhere. Extremely popular PHP libraries already require cURL. Most shared PHP hosts support cURL by default. Requiring cURL in your PHP library will not detract a statistically significant number of users to the point of justifying resorting to the underpowered PHP HTTP stream wrapper or the frivolous wheel-reinvention that is creating a socket based HTTP client.

Guzzle 3.0: Better Service Descriptions and More Modular

| Comments

Guzzle 3.0 was released tonight, bringing with it an enormous number of improvements to an already feature-rich PHP HTTP client and web service framework.

Advanced service descriptions

Hands down, the biggest changes in Guzzle are found in Guzzle’s service descriptions. Guzzle’s new service description format is heavily inspired by Swagger, and now support complex request bodies and modeled responses.

Guzzle 2.x only supported top-level parameters, so if a web service operation needed a complex request body, you had to create a concrete class. Concrete classes are still supported, but you can now model nested XML and JSON input structures in service descriptions without needing to write a line of PHP. Simply create the necessary parameters of your web service using JSON schema syntax.

Guzzle 3.0 now introduces the concept of response models. Any operation with a responseClass attribute that matches the name of a model defined in a service description will return a Guzzle\Service\Resource\Model object when executed. This object looks like an array and contains the structural definition of the model. Using JSON schema definitions, you can now define the hash-like structure of a model and provide an easier to use abstraction over SimpleXMLElements or raw arrays.

Let’s work through an example. Take the following imaginary web service:

GET/POST   /users
GET/DELETE /users/:id

This web service can be implemented using the following service description:

{
    "name": "Foo",
    "apiVersion": "2012-10-14",
    "baseUrl": "http://api.foo.com",
    "description": "Foo is an API that allows you to Baz Bar",
    "operations": {
        "GetUsers": {
            "httpMethod": "GET",
            "uri": "/users",
            "summary": "Gets a list of users",
            "responseClass": "GetUsersOutput"
        },
        "CreateUser": {
            "httpMethod": "POST",
            "uri": "/users",
            "summary": "Creates a new user",
            "responseClass": "CreateUserOutput",
            "parameters": {
                "name": {
                    "location": "json",
                    "type": "string"
                },
                "age": {
                    "location": "json",
                    "type": "integer"
                },
                "tags": {
                    "type": "array",
                    "location": "json",
                    "items": {
                        "type": "string"
                    }
                }
            }
        },
        "GetUser": {
            "httpMethod": "GET",
            "uri": "/users/{id}",
            "summary": "Retrieves a single user",
            "responseClass": "GetUserOutput",
            "parameters": {
                "id": {
                    "location": "uri",
                    "description": "User to retrieve by ID",
                    "required": "true"
                }
            }
        },
        "DeleteUser": {
            "httpMethod": "DELETE",
            "uri": "/users/{id}",
            "summary": "Deletes a user",
            "responseClass": "DeleteUserOutput",
            "parameters": {
                "id": {
                    "location": "uri",
                    "description": "User to delete by ID",
                    "required": "true"
                }
            }
        }
    },
    "models": {
        "User": {
            "type": "object",
            "properties": {
                "name": {
                    "location": "json",
                    "type": "string"
                },
                "age": {
                    "location": "json",
                    "type": "integer"
                },
                "tags": {
                    "type": "array",
                    "location": "json",
                    "items": {
                        "type": "string"
                    }
                }
            }
        },
        "GetUsersOutput": {
            "type": "array",
            "items": {
                "$ref": "User"
            }
        },
        "CreateUserOutput": {
            "type": "object",
            "properties": {
                "id": {
                    "location": "json",
                    "type": "string"
                },
                "location": {
                    "location": "header",
                    "sentAs": "Location",
                    "type": "string"
                }
            }
        },
        "GetUserOutput": {
            "$ref": "User"
        },
        "DeleteUserOutput": {
            "type": "object",
            "properties": {
                "status": {
                    "location": "statusCode",
                    "type": "integer"
                }
            }
        }
    }
}

You can attach the above service description to any Guzzle\Service\Client object and the client will then be completely configured to use the service:

<?php

require __DIR__ . '/vendor/autoload.php';

use Guzzle\Service\Description\ServiceDescription;
use Guzzle\Http\Message\Response;
use Guzzle\Service\Client;
use Guzzle\Plugin\Mock\MockPlugin;

$client = new Client();
$client->setDescription(ServiceDescription::factory('/path/to/service.json'));

// Because this web service does not exist, let's mock the response
$mock = new MockPlugin();
$mock->addResponse(new Response(200, array(
    'Location'     => 'http://foo.com/user/123',
    'Content-Type' => 'application/json'
), '{"id": "123"}'));
$client->addSubscriber($mock);

// Create the command and supply the input
$command = $client->getCommand('CreateUser', array(
    'name' => 'Michael',
    'age'  => 27,
    'tags' => array('PHP', 'HTTP')
));

// Execute the command and retrieve the model object
$result = $command->execute();

echo "\n# Sent the following request: \n" . $command->getRequest() . "\n\n";
echo "# Command result: \n";
var_export($result->toArray());

The above script will output the following:

# Sent the following request:
POST /users HTTP/1.1
Host: api.foo.com
User-Agent: Guzzle/3.0.0 curl/7.21.4 PHP/5.3.15
Content-Length: 49

{"name":"Michael","age":27,"tags":["PHP","HTTP"]}

# Command result:
array (
  'id' => '123',
  'location' => 'http://foo.com/user/123',
)

There are so many features in service descriptions that it’s a daunting task to attempt to completely document them. I definitely owe you much more documentation around these features.

Broken into components

In response to feedback from various developers and interested third-parties, I’ve reorganized Guzzle’s namespaces into a more modular and shallow structure. This allows consumers of Guzzle to require only the specific parts of Guzzle needed by their project. Guzzle is already fairly large for an HTTP client, and I plan on adding more features to make it more awesome-r… It makes a lot of sense to eliminate the worry of “does adding this feature make the library too big for project X”? Project X can now configure its requirements to be as granular as needed. Guzzle 3.0 now provides the following components hosted on Packagist, with PEAR subpackages soon to follow (thanks, Clay Loveless!):

  • guzzle/guzzle: Guzzle is a PHP HTTP client library and framework for building RESTful web service clients
  • guzzle/batch: Guzzle batch component for batching requests, commands, or custom transfers
  • guzzle/cache: Guzzle cache adapter component
  • guzzle/common: Common libraries used by Guzzle
  • guzzle/http: HTTP libraries used by Guzzle
  • guzzle/inflection: Guzzle inflection component
  • guzzle/iterator: Provides helpful iterators and iterator decorators
  • guzzle/log: Guzzle log adapter component
  • guzzle/parser: Interchangeable parsers used by Guzzle
  • guzzle/plugin: Guzzle plugin component containing all Guzzle HTTP plugins (replaces the following, more granular plugins)
    • guzzle/plugin-async: Guzzle async request plugin
    • guzzle/plugin-backoff: Guzzle backoff retry plugins
    • guzzle/plugin-cache: Guzzle HTTP cache plugin
    • guzzle/plugin-cookie: Guzzle cookie plugin
    • guzzle/plugin-curlauth: Guzzle cURL authorization plugin
    • guzzle/plugin-history: Guzzle history plugin
    • guzzle/plugin-log: Guzzle log plugin for over the wire logging
    • guzzle/plugin-md5: Guzzle MD5 plugins
    • guzzle/plugin-mock: Guzzle Mock plugin
    • guzzle/plugin-oauth: Guzzle OAuth plugin
  • guzzle/service: Guzzle service component for abstracting RESTful web services
  • guzzle/stream: Guzzle stream wrapper component

A better HTTP client

A number of improvements were made in Guzzle 3 that makes Guzzle a better HTTP client:

  • No longer sending an Accept or Accept-Encoding header by default
  • Only sending an Expect: 100-Continue header when the payload of a request is greater than 1 MB
  • Guzzle now ships with and uses cURL’s CA certs extracted from mozilla.org
  • Guzzle\Http\Client::setSslVerification() now makes it easier to fine-tune the SSL behavior of a client

Better plugins

Plugins are now generally more awesome because the directory structure is more modular, plugins have their own Packagist packages, and we no longer need to worry about the overall size of the library. The plugins that ship with Guzzle itself will still be fairly selective, but the plugins themselves are now free to be as robust as needed. Three plugins specifically got a lot of love in the 3.0 release:

  • CachePlugin
  • BackoffPlugin
  • LogPlugin

CachePlugin

The CachePlugin has been mostly rewritten to be much more flexible. The default implementation of the plugin is basically the same, but allows you to diverge from a basic RFC 2616 compliant cache and now allows you to cache any request that is acceptable according to a custom Guzzle\Plugin\Cache\CanCacheInterface object. After you know whether or not an object should be cached, you can create a custom cache key by implementing a custom Guzzle\Plugin\Cache\CacheKeyProviderInterface. Custom cache key providers are useful when caching web service requests that might include things like signatures and constantly changing date-based headers.

BackoffPlugin

The BackoffPlugin now replaces the old Guzzle\Http\Plugin\ExponentialBackoffPlugin. The BackoffPlugin can be used to implement any sort of retry logic while still utilizing common retry strategies provided by Guzzle. If you still want to use a basic exponential backoff plugin, just use the static factory method, getExponentialBackoff():

<?php
use Guzzle\Http\Client;
use Guzzle\Plugin\Backoff\BackoffPlugin;

$client = new Client('http://www.test.com/');
// Use a static factory method to get a backoff plugin using the exponential backoff strategy
$backoffPlugin = BackoffPlugin::getExponentialBackoff();

// Add the backoff plugin to the client object
$client->addSubscriber($backoffPlugin);

Actually, the getExponentialBackoff() method is a good demonstration of how to create a custom retry strategy. Here’s a modified version that makes for a good example:

<?php

return new BackoffPlugin(new HttpBackoffStrategy($httpCodes,
    new TruncatedBackoffStrategy($maxRetries,
        new CurlBackoffStrategy($curlCodes,
            new ExponentialBackoffStrategy()
        )
    )
));

LogPlugin

The LogPlugin was simplified a bit and now relies on a Guzzle\Log\MessageFormatter to format log messages. The MessageFormatter object uses a variable substitution template that allows for ver customized log messages. You can find a complete list of variables in the LogPlugin’s documentation.

Use the getDebugPlugin() static factory method of the LogPlugin if you want to attach a plugin to a client that sends wire logs to STDERR. This plugin uses the following template to write the HTTP request, response, and any cURL errors to STDERR:

# Request:
{request}

# Response:
{response}

# Errors: {curl_code} {curl_error}

Migrating from Guzzle 2.0

With Guzzle 3.0 comes a number of breaking changes. I plan on releasing a fairly in-depth guide that will describe the process of migrating from Guzzle 2.0 to 3.0. In the meantime, the following bash script should help get you started. Run this script on your project’s src/ directory to rename old class paths to the new modular structure (this is probably missing some easy ones – let me know in the comments!):

#!/bin/bash
find ./ -type f -exec sed -i '' 's/Guzzle\\Common\\Cache/Guzzle\\Cache/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle.Common.Cache/Guzzle.Cache/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Common\\Log/Guzzle\\Log/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Common\\Inflection/Guzzle\\Inflection/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Common\\Validation/Guzzle\\Validation/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Common\\Batch/Guzzle\\Batch/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Common\\Exception\\BatchTransferException/Guzzle\\Batch\\Exception\\BatchTransferException/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Http\\Plugin\\AsyncPlugin/Guzzle\\Plugin\\Async\\AsyncPlugin/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Http\\Plugin\\CachePlugin/Guzzle\\Plugin\\Cache\\CachePlugin/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Http\\Plugin\\CookiePlugin/Guzzle\\Plugin\\Cookie\\CookiePlugin/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Http\\Plugin\\CurlAuthPlugin/Guzzle\\Plugin\\CurlAuth\\CurlAuthPlugin/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Http\\Plugin\\ExponentialBackoffLogger/Guzzle\\Plugin\\Backoff\\BackoffLogger/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Http\\Plugin\\ExponentialBackoffPlugin/Guzzle\\Plugin\\Backoff\\BackoffPlugin/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Http\\Plugin\\HistoryPlugin/Guzzle\\Plugin\\History\\HistoryPlugin/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Http\\Plugin\\LogPlugin/Guzzle\\Plugin\\Log\\LogPlugin/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Http\\Plugin\\Md5ValidatorPlugin/Guzzle\\Plugin\\Md5\\Md5ValidatorPlugin/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Http\\Plugin\\MockPlugin/Guzzle\\Plugin\\Mock\\MockPlugin/g' {} \;
find ./ -type f -exec sed -i '' 's/Guzzle\\Http\\Plugin\\OauthPlugin/Guzzle\\Plugin\\Oauth\\OauthPlugin/g' {} \;

Apologies for the breaking changes, but they were necessary to ensure the future of the project. You should expect the API going forward to be much more stable.

That about wraps it up for 3.0! Let me know what you think in the comments.

Cron Expression Parsing in PHP

| Comments

As a PHP developer, I’ve often been faced with the task of ensuring something happens on a recurring schedule or determining the next date in time an event will occur. At my previous job, we needed to run scheduled Gearman jobs on a recurring basis. We chose to use cron as the serialization format of our schedules, and implemented a database driven system for storing these schedules. Storing the cron schedules for these recurring jobs in the database allowed us to have an easy to maintain and durable data store for our recurring jobs, and it allowed us to deploy a very simple crontab to our job servers. The crontab contained a single cron job that ran a Gearman job every minute that checked if any of the recurring cron schedules matched the current time. When a schedule matched the current time, the job would run. This proved to be a very easy to maintain solution and allowed us to easily enable and disable recurring jobs if we were in a maintenance window, a job was failing, or if a job was producing erroneous results.

When faced with the task of creating the cron expression parsing part of this system, I searched high and low for an existing implementation in PHP that implemented the full feature set of a modern cron expression. Based on the context of this article, you probably guessed that I didn’t find one. I posted the original code I came up with to StackOverflow and eventually open sourced the project.

Cron-Expression, a cron expression library for PHP

The PHP cron expression parser I wrote can parse a CRON expression, determine if it is due to run, calculate the next run date of the expression, and calculate the previous run date of the expression. You can calculate dates far into the future or past by skipping n number of matching dates.

The parser can handle increments of ranges (e.g. */12, 2-59/3), intervals (e.g. 0-9), lists (e.g. 1,2,3), W to find the nearest weekday for a given day of the month, L to find the last day of the month, L to find the last given weekday of a month, and hash (#) to find the nth weekday of a given month.

You can clone the cron-expression library from the github page, install it with composer, or simply download the phar file and include it in your scripts.

Brief introduction to cron expressions

Cron utilizes cron expressions for representing recurring schedules. Cron expressions are made up of several fields, and each field represents a measurement of time. The fields in a cron expression are as follows: minute, hour, day of month, month, day of week, and an optional year. Here’s a an example cron expression that runs every minute, and below the expression are the positional fields.

*    *    *    *    *    *
-    -    -    -    -    -
|    |    |    |    |    |
|    |    |    |    |    + year [optional]
|    |    |    |    +----- day of week (0 - 7) (Sunday=0 or 7)
|    |    |    +---------- month (1 - 12)
|    |    +--------------- day of month (1 - 31)
|    +-------------------- hour (0 - 23)
+------------------------- min (0 - 59)

There are several special characters that modify the schedule of a cron expression, and some modifiers behave differently in different fields. You can find a list of all of the available special characters on cron’s Wikipedia page.

Cron expression use cases

Let’s say that you’re building a special promotion system into your e-commerce website. You want a special promotion to occur on a schedule. For the sake of this example, let’s say you want the promotion to occur every second Friday of every other month. This cron expression can be represented using 0 0 0 ? 1/2 FRI#2 *.

Calculate the next run date of a cron expression

So now that we’ve determined the schedule of our promotion, let’s write a snippet of code to check and see if the promotion should be in effect for the current date. This example assumes that you are using a phar file to include the library.

<?php

require 'cron.phar';

$cron = Cron\CronExpression::factory('0 0 0 ? 1/2 FRI#2 *');

if ($cron->isDue()) {
    // The promotion should be enabled!
}

Awesome! Now we know when the promotion should be enabled. But now our buyers are complaining that they have no idea when the promotion will run next. They suggest that you build an admin page that will show them the next 5 dates that the promotion will run.

Calculate the next X run dates of a cron expression

You can calculate the next run date of a cron expression using the cron-expression library using the getNextRunDate() method:

<?php

require 'cron.phar';

$cron = Cron\CronExpression::factory('0 0 0 ? 1/2 FRI#2 *');

// The getNextRunDate() method returns a \DateTime object
echo $cron->getNextRunDate()->format('Y-m-d H:i:s');

This will show the buyers the next date that the promotion will be enabled. But our buyers want to be able to plan a little in advance, so they need to know the next 5 dates that the promotion will run. You can get multiple next run dates using the getMultipleRunDates() method.

<?php

require 'cron.phar';

$cron = Cron\CronExpression::factory('0 0 0 ? 1/2 FRI#2 *');

foreach ($cron->getMultipleRunDates(5) as $date) {
    echo $date->format('Y-m-d H:i:s') . PHP_EOL;
}

Great! Now we know if the promotion should run on the current date and the next 5 times the promotion will run. But now our buyers are complaining that they need to know the previous dates that the promotion ran so that they can figure out all the fancy number projections they do when determining the sell-through of a product.

Calculate the last run date of a cron expression

You can get the last run date of a cron expression using the getPreviousRunDate() method.

<?php

require 'cron.phar';

$cron = Cron\CronExpression::factory('0 0 0 ? 1/2 FRI#2 *');

// Remember, most methods return a DateTime object
echo $cron->getPreviousRunDate()->format('Y-m-d H:i:s');

Awesome! Our buyers still need to know the last 5 times the promotion ran.

If you want to know the last 5 run dates, you can use the getMultipleRunDates() method and set the $invert argument to true:

<?php

require 'cron.phar';

$cron = Cron\CronExpression::factory('0 0 0 ? 1/2 FRI#2 *');

foreach ($cron->getMultipleRunDates(5, 'now', true) as $date) {
    echo $date->format('Y-m-d H:i:s') . PHP_EOL;
}

That will display a list of 5 previous run dates, each going further back in time.

Now our buyers are asking if the promotion ran or will run on a specific day. Instead of having to field there emails every day and tell them whether or not the promotion ran, you decide to create an admin page so that they can enter a date and the page will tell the buyer if the promotion ran that day.

Check if the cron expression matches a specific date

You can see if a cron expression matches a specific date by calling isDue() with a specific date.

<?php

require 'cron.phar';

$cron = Cron\CronExpression::factory('0 0 0 ? 1/2 FRI#2 *');

if ($cron->isDue('January 5, 2012')) {
    echo 'The cron expression ran on this date :)';
} else {
    echo 'The cron expression did not run on this date :(';
}

Celebrate!

You’ve now successfully implemented all of the scheduling needs of your promotion! You can determine whether or not the promotion should be running, buyers can see when the promotion last ran, determine if the promotion will run on a specific date, and they can plan out their buying strategies based on when the promotion will run next.

Conclusion

As you can see, cron expressions are very useful for scheduling events. With the PHP Cron-Expression parser library, PHP developers now have access to the advanced scheduling capabilities of cron.

Chunked Transfer-Encoding in PHP With Guzzle

| Comments

The problem with Content-Length

HTTP/1.0 requires a client to specify a Content-Length header before sending a request to a server. This means that requests can not be sent with a dynamically created entity body until the entire length of the entity body is known. A lot of HTTP clients that support HTTP/1.1 still require that the data sent over the wire is sent through a string or includes a Content-Length header. Why is this important? Streaming.

The example

Let’s imagine that you are building a website that allows users to submit an image URL to use as the background on their profile page (this is a very simple example, but it’s the first one I came up with). At a high level, after a URL is submitted, the application will need to download the image from the URL, upload the image to the application’s file server (whether it be FTP, S3, whatever), and then perform any required image processing. You’ll often see this scenario implemented like this:

<?php

$entireImage = file_get_contents($url);
file_put_contents($fileServer, $entireImage);

Success! The above code sample will download the image and then upload the image to the file server. However, there are a couple problems with this: - You have to store the entire image in a string which consumes a great deal of memory. - The entire image needs to be downloaded before it can be uploaded to the file server.

Chunked Transfer-Encoding to the rescue!

Chunked transfer encoding allows a client or server to begin transmitting a message before the Content-Length of the entity body is known. At a high level, when using chunked transfer encoding, a client sends the content length of a small chunk of the entity body followed by the small chunk. A server would then continue to read from the client request’s entity body until a 0 length chunk is received. You can find out more about how chunked transfer encoding works by reading RFC2616 section 3.6.1.

So now that we know about chunked encoding, let’s send a request. Unfortunately, PHP doesn’t have built in support to send a PUT or POST entity body using anything other than a string (see docs). PHP added support for chunked encoding after the 5.3 release, but that doesn’t help much if our data has to be sent from a string that holds the entire entity body in memory.

You can use sockets to send your HTTP request, but you’ll need to account for all of the nuances of HTTP: redirects, 100-Continue responses, parsing and reading responses, etc. It’s kind of fun to write though; I once wrote a simple HTTP/1.1 client using sockets and a finite state machine. But that takes quite a bit of time and testing, and you have an application to write!

Lucky for us, PHP has support for curl, and curl can do anything when it comes to HTTP.

cURL to the rescue!

Let’s implement the above application using a PHP stream to download the image and curl to upload the data to the file server:

<?php

// Open a stream so that we stream the image download
$stream = fopen($url, 'r');

// Create a curl handle to upload to the file server
$ch = curl_init($fileServer);
// Send a PUT request
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'PUT');
// Let curl know that we are sending an entity body
curl_setopt($ch, CURLOPT_UPLOAD, true);
// Let curl know that we are using a chunked transfer encoding
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Transfer-Encoding: chunked'));
// Use a callback to provide curl with data to transmit from the stream
curl_setopt($ch, CURLOPT_READFUNCTION, function($ch, $fd, $length) use ($stream) {
    return fread($stream, $length) ? '';
});

curl_exec($ch);

The above code will upload the image to the file server by reading a small amount of the image into a string, transferring the data to the server, then repeating until the entire image has been downloaded and uploaded to the file server. This is accomplished using a PHP stream to download the image and a curl read callback to provide curl with a custom data source for the entity body of the PUT request.

That code is a bit verbose. If you need to send HTTP requests in other parts of your application, then it might be a good idea to use a layer on top of curl to make it easier to get curl to do your bidding.

Guzzle to the rescue!

Guzzle is an HTTP client for PHP that makes this exercise extremely easy.

<?php

$client = new Guzzle\Http\Client($fileServer);
$client->put('/', null, fopen($url, 'r'))->send();

The above code will do exactly the same thing as our curl example, except it does it in just two lines of code.

Guzzle always uses streams

Straight from the Guzzle documentation:

Entity body is the term used for the body of an HTTP message. The entity body of requests and responses is inherently a PHP stream in Guzzle. The body of the request can be either a string or a PHP stream which are converted into a Guzzle\Http\EntityBody object using its factory method. When using a string, the entity body is stored in a temp PHP stream. The use of temp PHP streams helps to protect your application from running out of memory when sending or receiving large entity bodies in your messages. When more than 2MB of data is stored in a temp stream, it automatically stores the data on disk rather than in memory.

Guzzle will automatically detect if the Content-Length of a stream can be easily determined based on the stream wrapper. If the stream is a local stream (file, php://temp, etc) and seekable, then Guzzle will always send a Content-Length with the request. If a stream is not seekable or is a remote stream (FTP, HTTP, sockets), then Guzzle will not attempt to determine the Content-Length and will transfer the request using Transfer-Encoding: chunked.

Content-Length is sometimes required

If you are interacting with a web service that requires a Content-Length header, then you will need to determine the Content-Length of the remote resource and explicitly set the Content-Length header in your request. Alternatively, you can download the remote resource to a temp stream, have Guzzle automatically determine the Content-Length, do any sort of required validation, then upload the image to the file server:

<?php

$client = new Guzzle\Http\Client();

// Download the image to a PHP temp stream
$imageResponse = $client->get($url)->send();
// Be sure to seek to the beginning of the entity body
$imageResponse->getBody()->seek(0);

// Upload the image to the file server using the previously
// downloaded image and the Content-Length of the image
$uploadRequest = $client->put($fileServer, null, $imageResponse->getBody());
$uploadResponse = $uploadRequest->send();

A better way of going about this might be to send a HEAD request to get the Content-Length of the remote resource, and then send a PUT request using a PHP stream and the known Content-Length:

<?php

$client = new Guzzle\Http\Client();

// Get information about the image without downloading the entity body
$imageInfo = $client->head($url)->send();
// Create an entity body using a stream and the Content-Length header from the HEAD response
$body = Guzzle\Http\EntityBody::factory(fopen($url, 'r'), $imageInfo->getContentLength());

$uploadRequest = $client->put($fileServer, null, $body);
$uploadResponse = $uploadRequest->send();

After retrieving the image information using a HEAD request, this example will perform the following steps:

  1. Open a connection to the file server.
  2. Send the headers of the data you will be uploading using an HTTP PUT request.
  3. Open a connection to the image URL using a PHP stream.
  4. Initiate a GET request.
  5. Curl will read small chunks from the image stream while simulataneously writing the chunks to the file server until the entire image stream has been read.

Start using Guzzle

That wraps up our look at chunked transfer encoding. You can find out more about Guzzle at http://guzzlephp.org.

What’s New in Guzzle 2.1

| Comments

Guzzle There were some major improvements added to Guzzle in the last week. Guzzle is now more flexible, easy to use, and more powerful than ever. Here’s a list of the major features introduced in the 2.x series:

  • Guzzle now uses the Symfony2 EventDispatcher component
  • Guzzle now uses the Symfony2 Validator component
  • Persistent connections are now shared between single requests and requests sent in parallel
  • Added an OAuth 1.0 plugin
  • Uses Composer for dependency management
  • Added a BatchQueuePlugin for sending a large number of requests in parallel
  • It’s now easier to build HTTP requests dynamically and still implement complex response parsing (extend Guzzle\Service\Command\DynamicCommand and extend the process() method)
  • Dynamically generated HTTP requests now support parameter filters
  • application/json responses are automatically converted into arrays for command results
  • Service descriptions can now be written in JSON and supports including other files
  • Can use Zend Framework 1.0 or 2.0 cache adapters

What is Guzzle?

Guzzle is a PHP HTTP client and framework for building RESTful web service clients. Guzzle allows you to truly reap the benefits of the HTTP/1.1 spec by providing managed persistent connections and the ability to easily send requests in parallel. In addition to taking the pain out of HTTP, Guzzle provides a lightweight framework for creating web service clients. With Guzzle’s built in error handling, OAuth support, and dynamically generated HTTP requests, building your next web service client on top of Guzzle will save you a ton of time.

Symfony2 EventDispatcher

The event system is the foundation of Guzzle’s flexibility. Using the Symfony2 EventDispatcher ensures that a well-tested and broadly adopted event framework powers the most critical aspect of Guzzle. Implementing the Symfony2 EventDispatcher helps to make the intent of event subscribers more explicit; instead of a single callback receiving all events dispatched from a subject, callbacks are registered individually for each event they subscribe to.

Listening to events

All classes in Guzzle that emit events implement the Guzzle\Common\HasDispatcherInterface. Any object that implements this interface has a getEventDispatcher() method to retrieve the EventDispatcher for that object.

In the following example, we are transparently retrying all 401 responses with an updated authorization token:

<?php

use Guzzle\Common\Event;

$client = new Guzzle\Http\Client('http://www.example.com/api/v1');

// Add custom error handling to any request created by this client
$client->getEventDispatcher()->addListener('request.error', function(Event $event) {

    if ($event['response']->getStatusCode() == 401) {

        $newRequest = clone $event['request'];
        $newRequest->setHeader('X-Auth-Header', MyApplication::getNewAuthToken());
        $newResponse = $newRequest->send();

        // Set the response object of the request without firing more events
        $event['response'] = $newResponse;

        // You can also change the response and fire the normal chain of
        // events by calling:
        // $event['request']->setResponse($newResponse);

        // Stop other events from firing when you override 401 responses
        $event->stopPropagation();
    }
});

$response = $client->get('restricted-resource.json')->send();
echo $response;

Building plugins

Guzzle ships with quite a few plugins out of the box: Over the wire logging, Caching forward proxy, Truncated exponential backoff, OAuth, Cookies, MD5 hash validation, Mock response queue, History, Basic authorization, Batch queue

If you need to extend Guzzle to support your web service, you can create a Symfony2 event subscriber. You’ll need to subscribe to specific events in the request cycle to extend Guzzle’s behavior.

Let’s say you’re building a client for a web service that requires a custom authorization header for every request. This fictional authorization plugin could be implemented like so:

<?php

namespace Guzzle\Foo;

use Guzzle\Common\Event;

class FooAuthPlugin implements Symfony\Component\EventDispatcher\EventSubscriberInterface
{
    private $secret;

    public function __construct($secret)
    {
        $this->secret = $secret;
    }

    public static function getSubscribedEvents()
    {
        return array('client.create_request' => 'onRequestCreate');
    }

    public function onRequestCreate(Event $event)
    {
        $request = $event['request'];

        $timestamp = time();
        $signature = hash_hmac('sha1', $request->getMethod() . '&'
            . rawurlencode($request->getResourceUri()) . '&' . $timestamp, $this->secret);

        $request->setHeader('Authorization', "FOO signature=\"{$signature}\", timestamp=\"{$timestamp}\"");
    }
}

Symfony2 Validator

Validating user data is a problem that has been solved by many other developers. Guzzle adopted the Symfony2 Validator component to leverage Symfony2’s robust validation system.

Guzzle uses the Symfony2 validator component when executing web service commands. Web service commands in Guzzle are basically collection of parameters that are turned into HTTP requests. You can enforce that a parameter is of a certain type before sending an HTTP request by utilizing a type attribute.

After generating a project skeleton and creating a client, you can start creating commands. The following example shows how you might create a command to create a new user.

<?php

namespace Guzzle\Foo\Command;

/**
 * Create a new user for the Foo web service
 *
 * @guzzle email      type="regex:/^[a-zA-Z0-9]{3,10}$/" required="true" doc="User email address"
 * @guzzle password   type="string" required="true" min_length="6" doc="Password"
 * @guzzle newsletter type="boolean" default="true" doc="Is the user subscribed to the newsletter?"
 */
class CreateUser extends Guzzle\Service\Command\AbstractCommand
{
    protected function build()
    {
        $this->request = $this->client->post('/users');
        $this->request->setHeader('Accept', 'application/json');
        $this->request->setBody(json_encode(array(
            'username' => $this->get('username'),
            'password' => $this->get('password'),
            'newsletter' => (bool) $this->get('newsletter')
        )), 'application/json');
    }
}

Guzzle uses DocBlock annotations to make creating commands easier. A number of default types are registered with the Guzzle\Service\Inspector to ensure that values passed into a command validate with the associated Symfony2 validator constraints. You can implement your own Symfony\Component\Validator\Constraint class to add custom validation to your web service client.

Persistent connections

Persistent HTTP connections are an extremely important aspect of the HTTP/1.1 protocol that is often overlooked by PHP HTTP clients. Persistent connections allows data to be transferred between a client and server without the need to reconnect each time a subsequent request is sent, providing a significant performance boost to applications that need to send many HTTP requests to the same host. Guzzle implicitly manages persistent connections for all requests.

All HTTP requests sent through Guzzle are sent using the same cURL multi handle. cURL will maintain a cache of persistent connections on a multi handle. As long as you do not override the default Guzzle\Http\Curl\CurlMulti object in your clients, you will benefit from application-wide persistent connections. More information about cURL’s internal design and persistent connection handling can be found at http://curl.haxx.se/dev/internals.html.

OAuth plugin

Guzzle now supports OAuth out of the box. Quit worrying about signing OAuth requests and start building your web service client with Guzzle!

<?php

$client = new Guzzle\Http\Client('http://api.twitter.com/1');
$oauth = new Guzzle\Http\Plugin\OauthPlugin(array(
    'consumer_key'    => 'my_key',
    'consumer_secret' => 'my_secret',
    'token'           => 'my_token',
    'token_secret'    => 'my_token_secret'
));
$client->getEventDispatcher()->addSubscriber($oauth);

$response = $client->get('statuses/public_timeline.json')->send();
var_export(json_decode((string) $response->getBody()));

Composer for dependency management

Guzzle has switched from git submodules to using Composer for dependency managment. You can add Guzzle to your composer enabled project by adding the following to your composer.json file:

{
    "require": {
        "guzzle/guzzle": "*"
    }
}

You then just call php composer.phar install, and you’re done! You can learn more about installing Guzzle using Composer, PHAR, PEAR, or GIT by reading the installation instructions.

Easier to dynamically generate HTTP requests

You can leverage Guzzle’s dynamically generated HTTP requests if the web service you are interacting with has a ton of very similar commands. All you need to do is create a Guzzle service description that describes the different commands supported by the web service.

You can implement the previously created CreateUser command using a JSON service description:

{
    "commands": {
        // Define an abstract command and extend it for each command
        "abstract": {
            // You would need to create this default command that would convert
            // all of the JSON parameters into a JSON entity body for requests
            "class": "Guzzle\Foo\JsonFooCommand",
            "params": {
                "headers": {
                    "Accept": "application/json"
                }
            }
        },
        "create_user": {
            // Extend the abstract command defined above
            "extends": "abstract",
            // Use a path relative to the base URL of the client
            "path": "users",
            "method": "POST",
            "params": {
                "username": {
                    "type": "regex:/^[a-zA-Z0-9]{3,10}$/",
                    "required": true,
                    "filter": "trim",
                    "location": "json"
                },
                "password": {
                    "type": "string",
                    "required": true,
                    "min_length": 6,
                    "location": "json"
                },
                "newsletter": {
                    "type": "boolean",
                    "default": true,
                    "location": "json"
                }
            }
        }
    }
}

The above JSON describes the commands that can be executed on the Foo web service. This JSON is roughly equivalent to the concrete command class we created earlier. I threw in a filter parameter that allows you to pass the input of the parameter through a series of functions that accepts a variable and returns the filtered variable. In the above example, we are running anything entered into the username parameter through the trim() function. You can specify multiple filters by separating them with a comma.

Assuming you saved the above JSON as foo.json, you can send validated HTTP requests to the web service by attaching a service description to your client:

<?php

use Guzzle\Service\Client;
use Guzzle\Service\Description\JsonDescriptionBuilder;

$description = JsonDescriptionBuilder::build('foo.json');
$client = new Client('http://foo.com/api/v1');
$client->setDescription($description);

$command = $client->getCommand('create_user', array(
    'username' => 'michael',
    'password' => 'foobar'
));

$jsonResult = $client->execute($command);
$request = $command->getRequest();
$response = $command->getResponse();

Going forward

The release of Guzzle 2.0 is a huge milestone for the project. As the project continues to mature, we hope to make it even easier to build web service clients. Have a suggestion on how we can make Guzzle better? Submit an issue to Guzzle on github.