Smelly PHP code

Adam Culp posted the 3rd article in his Clean Development Series this week, Dirty Code (how to spot/smell it). When you read it, you should keep in mind that he is pointing out practices which correlate with poorly written code not prescribing a list of things to avoid. It’s a good list of things to look for and engendered quite a discussion in our internal Musketeers IRC.

Comments are valuable

Using good names for variables, functions, and methods does make your code self commenting, but often times that is not sufficient. Writing good comments is an art, too many comments get in the way, but a lack of comments is just as bad. Code can be dense to parse where a comment will help you out. They also let you quickly scan through a longer code block, just skimming the comments, to find EXACTLY the bit you need to change/desbug/fix/etc. Of course, the latter you can also get by breaking up large blocks of code into functions.

Comments should not explain what the code does, but should capture the “why” of how you are solving a problem. For example, if you’re looping over something a bad comment is “// loop through results”, a good comment is “// loop through results and extract any image tags”

Using Switch Statements

You definitely should not take this item in his list to mean that “Switch statements are evil.” You could have equally bad code if you use a long block of if/then/elseif statements. If you’re using them within a class, you’re better off using polymorphism, as he suggests, or maybe look at coding to an Interface instead of coding around multiple implementations.

Other code smells

In reviewing the article, I thought of other smells that indicate bad code. Some are minor, but if frequent, you know you’re dealing with someone who knows little more than to copy-and-paste code from the Interwebs. These include:

  • Error suppression with @. There are very, very, very few cases where its ok to suppress an error instead of handling the error or preventing it in the first place.
  • Using globals directly. Anything in $_GET, $_POST, $_REQUEST, $_COOKIE should be filtered and validated before you use it. ‘Nuff said
  • Deep class hierarchy. A deep class hierarchy likely means you should be using composition instead of inheritance to change class behaviors.
  • Lack of Prepared DB Statements. Building SQL queries as strings instead of using PDO or the mysqli extensions’ prepared statements can open up sql injection vulnerabilities.
  • Antiquated PHP Practices. A catch all for things we all did nearly a decade ago, includes depending on register globals being on, using “or die()” to catch errors, using the mysql_* functions. PHP has evolved, there’s no reason for you not to evolve with it.

That’s generally what I look for when evaluating code quality. What are some things I missed?

Building CandiData

This past weekend, my colleague and friend Sandy Smith participated in Election Hackathon 2012 (read his take of the hackathon). We built our first public Musketeers.me product, Candidata.me. This was my first hackathon, and it was exciting and exhausting to bring something to life in little more than 24 hours. Our idea combined a number of APIs to produce a profile for every candidate running for President or Congress in the United States. The seed of the idea was good enough that we were chosen among 10 projects to present it to the group at large on Sunday afternoon.

Under the Hood and Hooking Up with APIs

We used our own PHP framework, Treb, as our foundation. It provides routing by convention, controllers, db acccess, caching, and a view layer. Along the way, we discovered a small bug in our db helper function that failed because of the nuances of autoloading.

I quickly wrote up a base class for making HTTP Get requests to REST APIs. The client uses PHPs native stream functions for making the HTTP requests, which I’ve found easier to work with than the cURL extension. The latter is a cubmersome wrapper to the cURL fucntionality.  

To be good API clients, we cached the request responses in Memcached between an hour to a month, depending on how often we anticipated the API response to change.

Sandy also took on the tedious – but not thankless – task of creating a list of all the candidates that we imported into a simpl Mysql table. For each candidate, we could then pull in information such as

  • Polling data from Huffington Post’s Pollster API, which we then plotted using jqplot. Polls weren’t available for every race, so we had to manually match available polls to candidates.
  • Basic Biographical information from govtrack.us
  • Campaign Finance and Fact Checked statements from Washington Post’s APIs.
  • Latest News courtesy of search queries to NPR’s Story Api.
  • A simple GeoIP lookup on the homepage to populate the Congressional candidates when a user loads the page

Bootstrap for UI goodness.

I used this opportunity to check out Twitter’s Bootstrap framework. It let us get a clean design from the start, and we were able to use its classes and responsive grid to make the site look really nice on tablets and smartphones too. I found it a lot more feature filled than Skeleton, which is just a responsive CSS framework and lacks the advanced UI elements like navigation, drop downs, modals found in Bootstrap.

Improvements that we could make

We’ve already talked about a number of features we could add or rework to make the site better. Of course, given the shelf life this app will have after November 6th, we may not get to some of these.

  • Re-work the state navigation on the homepage so that it plays nice with the browser’s history. We did a simple ajax query on load, but a better way to do it would be to change the hash to contain the state “http://candidata.us/#VA”, and then pull in the list of candidates. This would also only initiate the geoip lookup if the hash is missing.
  • Add a simple way to navigate to opponents from a candidate’s page.
  • Allow users to navigate to other state races from a candidate’s page.
  • Get more candidate information, ideally something that can provide us Photos of each candidate. Other apps at the hackathon had this, but we didn’t find the API in time. Sunlight provides photos for Members of Congress.
  • Pull in statements made by a candidate via WaPo’s Issue API, maybe running it through the Trove API to pull out categories, people, and places mentioned in the statement.
  • Use the Trove API to organize or at least tag latest news stories and fact checks by Category.

Overall, I’m very happy with what we were able to build in 24 hours. The hackathon also exposed me to some cool ideas and approaches, particularly the visualizations done by some teams. I wish I’d had spent a little more time meeting other people, but my energy was really focused on coding most of the time.

Please check out CandiData.me and let me know what you think either via email or in the comments below.

Drupal Workflow and Access control

Large organizations want to, or may actually need, strict access control and content review workflows to manage the publishing process on their website. Drupal’s default roles and permissions system is designed to handle the simplest of setups. However, there is a wide variety of modules that each address these needs. Many of them overlap, some complement each other, while others are abandonded and haven’t been ported to Drupal 7. In this article, I’ll look at some active modules for these requirements.

Revisioning for Basic Workflow

Unless your web publishing workflow is very complicated -= in which case, I think you have other problems – the Revisioning module should suit a basic author-editor review setup. You’ll need at least two roles, an author role that can create and update but not publish new nodes, and an editor role that reviews and publishes changes made by authors.

Authors write content that prior to being made publicly visible must be reviewed (and possibly edited) by moderators. Once the moderators have published the content, authors should be prevented from modifying it while “live”, but they should be able to submit new revisions to their moderators.

Once installed, the module provides a view to show revision status at /content-summary.

Content Access for write privileges

The Content Access module allows more fine grained control over view, edit, and delete permissions. Furthermore, it allows you to set access controls on a per node basis. This lets you restrict the set of pages, stories, or other content types that a single role can change.

This module allows you to manage permissions for content types by role and author. It allows you to specifiy custom view, edit and delete permissions for each content type. Optionally you can enable per content access settings, so you can customize the access for each content node.

Related Modules

If you need “serious” workflow in your Drupal, there is the aptly named Workflow module, and the newer State Machine and Workbench Moderation modules. These modules seem to be actively developed, and with enough time, tweaking and experimentation should help solve many workflow issues.

A new venture: musketeers.me

This has been in the works for a bit now, and I’m excited to announce that along with Eli White, Kevin Bruce, and Sandy Smith, we’ve formed muskeeters.me. We worked well together at our previous venture, that we wanted to keep it going. The new venture will do the usual web consulting, from strategy and planning, systems architecture with an eye on scaling, through design and development. We’ve also kicked around our own product ideas to work on, which we can share sooner rather than later.

Automating FTP uploads

I needed to automate copying files for a website that I was building. Since this site was hosted on an inexpensive shared hosting plan, I didn’t have the luxury of shell or rsync access to automate copying files from my local development environment to the host. The only option was FTP, and after wasting too much time manually tracking down which files I needed to update, I knew I needed an automated solution. Some googling led me to lftp, a command-line and scriptable FTP client. It should be available via your distribution’s repository. Once installed, you can use a script like the one below to automatically copy files.

{syntaxhighlighter BASH}
#!/bin/sh
# paths
HERE=`pwd`
SRC=”$HERE/web”
DEST=”~/www”
# login credentials
HOST=”ftp.commodity-hosting.com”
USER=”mysiteusername”
PASS=”supersecretpassword”
# FTP files to remote host
lftp -c “open $HOST
user $USER $PASS
mirror -X img/* -X .htaccess -X error_log –only-newer –reverse –delete –verbose $SRC $DEST

The script does the following:

  • Copy files from the local ./web directory to the remote ~/www directory.
  • Uses $HOST, $USER, $PASS to login, so make sure your script is readable, writeable, and executable only by you and trusted users.
  • the lftp command connects and copies the files. The -c switch specifies the commands to issue, one per line. The magic happens with the mirror command which will copy the files. Since we added the –only-newer and –reverse switches, this will upload only files which have changed.
  • You could be a little safer and remove the –delete switch, which will remove files from the destination which are not on your local machine.
  • You can use the -X to give it glob patterns to ignore. In this case, it won’t touch the img/ directory or the .htacess file.

If you’re still moving files over FTP manually, even with a good GUI, it’ll be worth your time to automate it and make it a quicker, less error-prone process.

Using bcrypt to store passwords

The linkedin password breach highlighted once again the risks associated with storing user passwords. I hope you are not still storing passwords in the clear and are using a one-way salted hash before storing them. But, the algorithm you choose to use is also important. If you don’t know why, go read You’re Probably Storing Passwords Incorrectly.

The choice, at the moment, seems to come down to SHA512 versus Bcrypt encryption. There’s a StackOverflow Q&A discussing the merits of each. Bcrypt gets the nod since its goal is to be slow enough that brute force attacks would take too much time to be feasible, but not so slow that honest users would really notice and be inconvenienced [1].

I wanted to switch one of my personal apps to use bcrypt, which on php means using Blowfish encryption via the crypt() function. There’s no shortage of classes and examples for using bcrypts to hash a string. But I didn’t find anything that outlined how to setup a database table to store usernames and passwords, salt and store passwords, and then verify a login request.

Storing passwords in Mysql

To store passwords in a MySQL database, all we need is a CHAR field of length 60. And you don’t need a separate column for the salt, as it will be stored as part of the password. The SQL for a minimal Users table is shown below.

CREATE TABLE `users` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `username` varchar(30) NOT NULL,
  `password` char(60) NOT NULL,
  PRIMARY KEY (`id`),
);

When a user registers providing a username and password, you have to generate a salt and hash the password, before saving it. This gist helped me figure out how to salt and hash them.

function save_user($username, $password, PDO $db)
{
    // create a random salt
    $salt = substr(str_replace('+', '.', base64_encode(sha1(microtime(true), true))), 0, 22);

    // hash incoming password - this works on PHP 5.3 and up
    $hash = crypt($password, '$2a$12$' . $salt);

    // store username and hashed password
    $insert = $db->prepare("INSERT INTO users (username, password) VALUES (?, ?)");
    $insert->execute($username, $hash)
}

Authenticating Users

When a user comes back to your site and tries to login, you retrieve their credentials and then compare the expected password to the supplied password. Remember we were clever and stored the salt as part of our hash in the password field? Now, we can reuse our stored password as the salt for hashing the incoming password. If its the right password, we’ll have two identical hashes. Magic!

function validate_user($username, $password, PDO $db)
{
    // attempt to lookup user's information
    $query= $db->prepare('SELECT * FROM users WHERE username=?';
    $query->execute(array($username));

    if (0 == $query->rowCount()) {
        // user not found
        return false;
    }

    $user = $query->fetch();
    // compare the password to the expected hash
    if (crypt($password, $user['password']) == $user['password']) {
        // let them in
        return $user;
    }

    // wrong password
    return false;
}

Those are the basics for using bcrypt to store passwords with PHP and MySQL. The main difference I found, was that the hashing and comparison of hashes now happens in PHP. With MD5 and SHA algorithms, you could invoke them using the database functions provided by MySQL. As far as I could find, it doesn’t have a native Blowfish/bcrypt function. If your system provides a crypt() call, you maybe be able to use Blowfish encryption, but it won’t be an option on Windows systems.

Back to SQL it is

An honest write up with first hand details of the shortcomings of couchdb in production. There’s a reason to stick with proven technologies and not simply chasing the latest shiny. Not saying sauce labs did that, just sayin’.

This post describes our experience using CouchDB, and where we ran into trouble. I’ll also talk about how this experience has affected our outlook on NoSQL overall, and how we designed our MySQL setup based on our familiarity with the positive tradeoffs that came with using a NoSQL database.

From: Goodbye, CouchDB | Selenium Testing? Do Cross Browser Testing with Sauce Labs

Are social referrals killing SEO?

I think this is an improvement to how we write for the web. Instead of optimizing for a machine algorithm that can bring traffic to an article, writers must concentrate on making the article interesting so that readers will share it with their friends. Curation is the key, which is also why I hated the “social reader” apps that want to post for me to see everything my friends read. I want to know what they recommend I read.

“Sixteen months ago we received the same number of monthly referrals from search as social. Now 40% of traffic comes from social media,” Scott Havens, senior vice president of finance and digital operations at The Atlantic Media Company, said in a phone conversation ahead of his on-stage interview at our Mashable Connect conference in Orlando, Fla. last weekend. “Truly [our writers] are not really thinking about SEO anymore. Now it’s about how we can spin a story so that it goes viral.”

From: Why ‘The Atlantic’ No Longer Cares About SEO

Apache: Allowing access to subpaths

Seems this is always a tricky one that I run into when setting up a new test site. If you want to lock down the entire site behind a password, but need to allow some unauthenticated requests for external integration, then have a look at the solution below. It uses SetEnvIf to open up access to selected paths. This should work too, if your route most requests through a front controller script using a rewrite rule.

In this brief tutorial, we are going to enable users to access any file or directory of a site that is password-protected via htaccess. There are many reasons for wanting to employ this technique, including:

From: Stupid htaccess Trick: Enable File or Directory Access to Your Password-Protected Site : Perishable Press

Fix SSL timeouts with the Facebook PHP-SDK

I ran into SSL timeouts on in local development setup when I was re-factoring some integration code with facebook and using their SDK. It was tricky to diagnose, I was sure that my changes couldn’t be the cause, and I finally confirmed it by running our production codebase. Since it was having the same timeout error, I knew the bug had to be in an underlying layer.

For the record, I’m running this version of curl on my Archlinux box:

<code>curl 7.25.0 (x86_64-unknown-linux-gnu) libcurl/7.25.0<br /> OpenSSL/1.0.1 zlib/1.2.6 libssh2/1.4.0
</code>

I also got the error from the command line with

<code>curl "https://graph.facebook.com/oauth/access_token"
</code>

But it is fixed with

<code>curl --sslv3 "https://graph.facebook.com/oauth/access_token"
</code>

Debian Server

On a debian squeeze server, with the latest (4/3/2011) version of curl:

<code>curl 7.21.0 (x86_64-pc-linux-gnu) libcurl/7.21.0<br /> OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.15 libssh2/1.2.6
</code>

The timeout does not happen with either of the following commands:

<code>curl "https://graph.facebook.com/oauth/access_token"
curl --sslv3 "https://graph.facebook.com/oauth/access_token"
</code>

OS X

The time out does not happen on OS X which runs curl 7.21.4

Thoughts

So, this timeout only seems to affect users with very new version of curl. Fixing it requires adding a line to the Facebook PHP SDK, which while minor, you have to remember if you ever upgrade it. At the same time, this bug could come back and bite you down the road if your operating system sneaks in a newer version of curl. You can see a fork of the PHP SDK with this fix on github.

Other references:

  1. Facebook bug ticket
  2. Maybe related PHP bug