20 Years of PHP

Ben Ramsey shared how he got started with PHP and had the great idea of asking others to write about their stories and tagging it as #20yearsofphp. This is my story.

When I graduated from college in 2000, I began looking for a job without a clear idea of what I wanted to do. In grad school I had done some projects using HTML, ASP, and ShockWave for various professors and figured I could get a job building web sites until I decided on something. I replied to a job posting (I think it was on hotjobs.com) and in September 2000 I started working as a web developer at Forum One. Thanks to that job, I spent a week working in San Francisco after meeting my (future) wife on a previous trip to California. We’d get married in 2004.

At the time, PHP4 had just been released. I worked on projects which still used PHP3, or interfaced via Perl CGI scripts to save data in a custom-build in-house CMS. I think my first actual PHP project was for a local Jewish Temple. Like other junior devs at that job, I took a shot at replacing the Perl scripts with my own PHP versions. Luckily, I never inflicted them on my colleagues.

From there, PHP was a gateway to learning about Linux, web servers, databases & SQL, and so much more. Thanks to PHP (and Drupal) I worked for my favorite soccer team, D.C. United. Today I’m grateful that, through running php[architect] I get to work not only with Eli, Kevin, Sandy, and Heather on daily basis but also with the wider PHP community through php[architect]’s magazine, books, and conferences.

I don’t think I could have planned the last 15 years better. Here’s to the next 20!

 

Extract images from an HTML snippet

The function here will take an HTML fragment and return an array of useful images it finds.

<?php
/**
 * extractImages
 *
 * @param $text
 * @return array|bool
 */
function extractImages($text)
{
    $header = '<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />';
    $text = $header . $text;
    $dom = new DOMDocument();
    if (@$dom-&gt;loadHTML($text)) {
        $xpath = new DOMXpath($dom);
        if ($images = $xpath-&gt;evaluate("//img")) {
            $result = array();
            foreach ($images as $i =&gt; $img) {
                $ht = $img-&gt;getAttribute('height');
                $wd = $img-&gt;getAttribute('width');
                // if height &amp; width are 1 its a bug, ignore
                if (1 === (int)$ht &amp;&amp; 1 === (int)$wd) {
                    continue;
                }
                // if it doesn't end in an image file extension
                // then ignore
                $src = $img-&gt;getAttribute('src');
                if (!preg_match('/.(png|jpg|gif)$/i', $src)) {
                    continue;
                }
                // do we need to figure out the full url to the image?
                if (!preg_match('/^https?:///', $src)) {
                    continue;
                }
                $alt = $img-&gt;getAttribute('alt');
                $result[$i] = array('src' =&gt; $src, 'alt' =&gt; $alt, 'height' =&gt; $ht, 'width' =&gt; $wd);
            }
            if (!empty($result)) {
                return $result;
            }
        }
    }
    return false;
}

Smelly PHP code

Adam Culp posted the 3rd article in his Clean Development Series this week, Dirty Code (how to spot/smell it). When you read it, you should keep in mind that he is pointing out practices which correlate with poorly written code not prescribing a list of things to avoid. It’s a good list of things to look for and engendered quite a discussion in our internal Musketeers IRC.

Comments are valuable

Using good names for variables, functions, and methods does make your code self commenting, but often times that is not sufficient. Writing good comments is an art, too many comments get in the way, but a lack of comments is just as bad. Code can be dense to parse where a comment will help you out. They also let you quickly scan through a longer code block, just skimming the comments, to find EXACTLY the bit you need to change/desbug/fix/etc. Of course, the latter you can also get by breaking up large blocks of code into functions.

Comments should not explain what the code does, but should capture the “why” of how you are solving a problem. For example, if you’re looping over something a bad comment is “// loop through results”, a good comment is “// loop through results and extract any image tags”

Using Switch Statements

You definitely should not take this item in his list to mean that “Switch statements are evil.” You could have equally bad code if you use a long block of if/then/elseif statements. If you’re using them within a class, you’re better off using polymorphism, as he suggests, or maybe look at coding to an Interface instead of coding around multiple implementations.

Other code smells

In reviewing the article, I thought of other smells that indicate bad code. Some are minor, but if frequent, you know you’re dealing with someone who knows little more than to copy-and-paste code from the Interwebs. These include:

  • Error suppression with @. There are very, very, very few cases where its ok to suppress an error instead of handling the error or preventing it in the first place.
  • Using globals directly. Anything in $_GET, $_POST, $_REQUEST, $_COOKIE should be filtered and validated before you use it. ‘Nuff said
  • Deep class hierarchy. A deep class hierarchy likely means you should be using composition instead of inheritance to change class behaviors.
  • Lack of Prepared DB Statements. Building SQL queries as strings instead of using PDO or the mysqli extensions’ prepared statements can open up sql injection vulnerabilities.
  • Antiquated PHP Practices. A catch all for things we all did nearly a decade ago, includes depending on register globals being on, using “or die()” to catch errors, using the mysql_* functions. PHP has evolved, there’s no reason for you not to evolve with it.

That’s generally what I look for when evaluating code quality. What are some things I missed?

Building CandiData

This past weekend, my colleague and friend Sandy Smith participated in Election Hackathon 2012 (read his take of the hackathon). We built our first public Musketeers.me product, Candidata.me. This was my first hackathon, and it was exciting and exhausting to bring something to life in little more than 24 hours. Our idea combined a number of APIs to produce a profile for every candidate running for President or Congress in the United States. The seed of the idea was good enough that we were chosen among 10 projects to present it to the group at large on Sunday afternoon.

Under the Hood and Hooking Up with APIs

We used our own PHP framework, Treb, as our foundation. It provides routing by convention, controllers, db acccess, caching, and a view layer. Along the way, we discovered a small bug in our db helper function that failed because of the nuances of autoloading.

I quickly wrote up a base class for making HTTP Get requests to REST APIs. The client uses PHPs native stream functions for making the HTTP requests, which I’ve found easier to work with than the cURL extension. The latter is a cubmersome wrapper to the cURL fucntionality.  

To be good API clients, we cached the request responses in Memcached between an hour to a month, depending on how often we anticipated the API response to change.

Sandy also took on the tedious – but not thankless – task of creating a list of all the candidates that we imported into a simpl Mysql table. For each candidate, we could then pull in information such as

  • Polling data from Huffington Post’s Pollster API, which we then plotted using jqplot. Polls weren’t available for every race, so we had to manually match available polls to candidates.
  • Basic Biographical information from govtrack.us
  • Campaign Finance and Fact Checked statements from Washington Post’s APIs.
  • Latest News courtesy of search queries to NPR’s Story Api.
  • A simple GeoIP lookup on the homepage to populate the Congressional candidates when a user loads the page

Bootstrap for UI goodness.

I used this opportunity to check out Twitter’s Bootstrap framework. It let us get a clean design from the start, and we were able to use its classes and responsive grid to make the site look really nice on tablets and smartphones too. I found it a lot more feature filled than Skeleton, which is just a responsive CSS framework and lacks the advanced UI elements like navigation, drop downs, modals found in Bootstrap.

Improvements that we could make

We’ve already talked about a number of features we could add or rework to make the site better. Of course, given the shelf life this app will have after November 6th, we may not get to some of these.

  • Re-work the state navigation on the homepage so that it plays nice with the browser’s history. We did a simple ajax query on load, but a better way to do it would be to change the hash to contain the state “http://candidata.us/#VA”, and then pull in the list of candidates. This would also only initiate the geoip lookup if the hash is missing.
  • Add a simple way to navigate to opponents from a candidate’s page.
  • Allow users to navigate to other state races from a candidate’s page.
  • Get more candidate information, ideally something that can provide us Photos of each candidate. Other apps at the hackathon had this, but we didn’t find the API in time. Sunlight provides photos for Members of Congress.
  • Pull in statements made by a candidate via WaPo’s Issue API, maybe running it through the Trove API to pull out categories, people, and places mentioned in the statement.
  • Use the Trove API to organize or at least tag latest news stories and fact checks by Category.

Overall, I’m very happy with what we were able to build in 24 hours. The hackathon also exposed me to some cool ideas and approaches, particularly the visualizations done by some teams. I wish I’d had spent a little more time meeting other people, but my energy was really focused on coding most of the time.

Please check out CandiData.me and let me know what you think either via email or in the comments below.

Using bcrypt to store passwords

The linkedin password breach highlighted once again the risks associated with storing user passwords. I hope you are not still storing passwords in the clear and are using a one-way salted hash before storing them. But, the algorithm you choose to use is also important. If you don’t know why, go read You’re Probably Storing Passwords Incorrectly.

The choice, at the moment, seems to come down to SHA512 versus Bcrypt encryption. There’s a StackOverflow Q&A discussing the merits of each. Bcrypt gets the nod since its goal is to be slow enough that brute force attacks would take too much time to be feasible, but not so slow that honest users would really notice and be inconvenienced [1].

I wanted to switch one of my personal apps to use bcrypt, which on php means using Blowfish encryption via the crypt() function. There’s no shortage of classes and examples for using bcrypts to hash a string. But I didn’t find anything that outlined how to setup a database table to store usernames and passwords, salt and store passwords, and then verify a login request.

Storing passwords in Mysql

To store passwords in a MySQL database, all we need is a CHAR field of length 60. And you don’t need a separate column for the salt, as it will be stored as part of the password. The SQL for a minimal Users table is shown below.

CREATE TABLE `users` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `username` varchar(30) NOT NULL,
  `password` char(60) NOT NULL,
  PRIMARY KEY (`id`),
);

When a user registers providing a username and password, you have to generate a salt and hash the password, before saving it. This gist helped me figure out how to salt and hash them.

function save_user($username, $password, PDO $db)
{
    // create a random salt
    $salt = substr(str_replace('+', '.', base64_encode(sha1(microtime(true), true))), 0, 22);

    // hash incoming password - this works on PHP 5.3 and up
    $hash = crypt($password, '$2a$12$' . $salt);

    // store username and hashed password
    $insert = $db-&gt;prepare("INSERT INTO users (username, password) VALUES (?, ?)");
    $insert-&gt;execute($username, $hash)
}

Authenticating Users

When a user comes back to your site and tries to login, you retrieve their credentials and then compare the expected password to the supplied password. Remember we were clever and stored the salt as part of our hash in the password field? Now, we can reuse our stored password as the salt for hashing the incoming password. If its the right password, we’ll have two identical hashes. Magic!

function validate_user($username, $password, PDO $db)
{
    // attempt to lookup user's information
    $query= $db-&gt;prepare('SELECT * FROM users WHERE username=?';
    $query-&gt;execute(array($username));

    if (0 == $query-&gt;rowCount()) {
        // user not found
        return false;
    }

    $user = $query-&gt;fetch();
    // compare the password to the expected hash
    if (crypt($password, $user['password']) == $user['password']) {
        // let them in
        return $user;
    }

    // wrong password
    return false;
}

Those are the basics for using bcrypt to store passwords with PHP and MySQL. The main difference I found, was that the hashing and comparison of hashes now happens in PHP. With MD5 and SHA algorithms, you could invoke them using the database functions provided by MySQL. As far as I could find, it doesn’t have a native Blowfish/bcrypt function. If your system provides a crypt() call, you maybe be able to use Blowfish encryption, but it won’t be an option on Windows systems.

Fix SSL timeouts with the Facebook PHP-SDK

I ran into SSL timeouts on in local development setup when I was re-factoring some integration code with facebook and using their SDK. It was tricky to diagnose, I was sure that my changes couldn’t be the cause, and I finally confirmed it by running our production codebase. Since it was having the same timeout error, I knew the bug had to be in an underlying layer.

For the record, I’m running this version of curl on my Archlinux box:

<code>curl 7.25.0 (x86_64-unknown-linux-gnu) libcurl/7.25.0<br /> OpenSSL/1.0.1 zlib/1.2.6 libssh2/1.4.0
</code>

I also got the error from the command line with

<code>curl "https://graph.facebook.com/oauth/access_token"
</code>

But it is fixed with

<code>curl --sslv3 "https://graph.facebook.com/oauth/access_token"
</code>

Debian Server

On a debian squeeze server, with the latest (4/3/2011) version of curl:

<code>curl 7.21.0 (x86_64-pc-linux-gnu) libcurl/7.21.0<br /> OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.15 libssh2/1.2.6
</code>

The timeout does not happen with either of the following commands:

<code>curl "https://graph.facebook.com/oauth/access_token"
curl --sslv3 "https://graph.facebook.com/oauth/access_token"
</code>

OS X

The time out does not happen on OS X which runs curl 7.21.4

Thoughts

So, this timeout only seems to affect users with very new version of curl. Fixing it requires adding a line to the Facebook PHP SDK, which while minor, you have to remember if you ever upgrade it. At the same time, this bug could come back and bite you down the road if your operating system sneaks in a newer version of curl. You can see a fork of the PHP SDK with this fix on github.

Other references:

  1. Facebook bug ticket
  2. Maybe related PHP bug

Drupal finally using OOP

Finally. This is a huge step forward for Drupal. After eschewing OOP practices for a long time, its finally winning over core developers, which will make working with Drupal as a Framework easier in many ways. I copied the announcement below, but you can see that patch and discussion here.

This patch is about to be committed. It is the foundation to change for example $comment from an stdClass to a comment specific class allowing for $comment->save(). This is a monumental change and everyone is invited to review and familiarize with the new system even before it is committed.

What is possible when you use proper classes? The first thing I envision is a plugin/pluggable system in the same way that Zend Framework allows you to use Controller Plugins and View Helpers from a pluggable object, without the need to inherit or compose an object. For example, in a Zend View, you can call a partial like this:

{syntaxhighlighter php}

partial(‘my-partial.html’);
?>

Now, the View class doesn’t have a method named partial, instead the magic __call method intercepts the call, gets the Partial view helper, and calls its invoke method.

Frustration with Drupal core growing

When a prominent developer and contributor lashes out that Drupal is in dire straits, you better listen.  You ought to read his critique of how Drupal core development is stalling, or at least stuck in the mud.  That can’t be good news for anyone looking to upgrade to Drupal 7.  My thoughts after the quote.

In addition to the half-baked, single-purpose product features mentioned above, Drupal core still carries around very old cruft from earlier days, which no one cares for. All of these features are not core functionality of a flexible, modular, and extensible system Drupal pretends to be. They are poor and inflexible product features being based on APIs and concepts that Drupal core allowed for, five and more years ago.

Where would Drupal be if they had worked more closely with the PHP community early on?  I have no idea, but a lot of PHP programmers have looked down on Drupal because most of the codebase can be messy, with poor API design decisions, overuse of globals, and leaky separation of concerns. Along with Drupal eschewing Object Oriented Programming and resulting best practices, its no wonder that talented developers would choose to use a framework like Zend, Symfony, or Cake to build a complicated website.  It sounds like a lot of short cuts and idiosyncrasies are now baked deep into Drupal core, and ripping them out too much work for core developers.

I’ve always thought that Drupal’s greatest strength is certainly not the great design of its codebase, but the Drupal community and ecosystem. A contrib module usally exists for many common website needs, like managing redirects, creating useful URLs for content, integrating with analytics,  and plugging in 3rd party commenting systems.  On top of that, there are super-modules like Views, Panels, and Context, which let you prototype and build parts of a website without having to write any code at all.  The Drupal community has solved a lot of problems through determination and individual brilliance, but that model can’t be sustainable in the long run.

Is there a solution?

Drupal core should cater to programmer’s needs, via coherent APIs and pluggable subsytems.  A complete rewrite of core, or even big parts of core, would be a waste of time. Drupal would stagnate while other frameworks kept improving.  I think Drupal 8 should seriously consider using a framework like Symfony2 as the foundation for core.  I mention Symfony because it has an EventDispatcher component that can replace most of Drupal’s magical hooks system.  The next release of the Zend Framework will have a similar component. A tested framework, used not just by content management applications would expose developers to a wider range of best practices, particularly around configuration management, deploytment, and unit testing.

Contrib should cater to site builders needs and focus on adding features on top of core that they want.  Modules in contrib can improve faster to meet user needs, fix bugs, and innovate.  This is an idea proposed in the discussion linked above. Moving as many modules as possible out of core also makes Drupal leaner.