Measuring developer productivity

My friend Sandy shared a link to a fascinating natural experiment comparing the productivity of two similarly tasked developer teams. If you haven’t read it already, take a minute to check it out. I’ve seen this need for visibility throughout my career.

The cable company was a rare laboratory, you could observe a direct comparison between the effects of good and bad software design and team behaviour. Most organisations don’t provide such a comparison. It’s very hard to tell if that guy sweating away, working late nights and weekends, constantly fire-fighting, is showing great commitment to making a really really complex system work, or is just failing. Unless you can afford to have two or more competing teams solving the same problem, and c’mon, who would do that, you will never know. Conversely, what about the guy sitting in the corner who works 9 to 5, and seems to spend a lot of time reading the internet? Is he just very proficient at writing stable reliable code, or is his job just easier than everyone else’s? To the casual observer, the first chap is working really hard, the second one isn’t. Hard work is good, laziness is bad, surely?

In agency work, you tend to track hours worked on a project. I’d bristle when each quarter the list of “most billable” employees. Great! If those folks are junior developers, chances are their also creating a lot of “billable” work that pulls in other people. A list of people who were the busiest in the last 3 months, when you should be encouraging people to get the most done in the least amount a time. A better metric, though harder to calculate and report, would be to figure out revenue per hour. That’s not so difficult to do per project, but it gets hairy in trying to tie it directly to people but it can be done.

If you’re part of an internal development team, upper management may use “seeing butts in seats” as a proxy for people getting work done. This encourages people to hang around just to look busy, and discourages using  remote workers. In this case, metrics you’d like to look at are more tied to business outcomes, things like site uptime, conversion rates, sales, etc.

Still, if you want to measure actual productivity, in terms of what tasks your development team what can you do? This is where I think a good habit of issue+SCM tracking, rigorous testing , and continuous integration can really shine.

  • Issue tracking can let you report the number of issues you’ve addressed.
  • Unit testing can report on the health of your code base by looking at test coverage, number of tests added/created, etc.
  • Continuous integration can then give you ongoing performance metrics. How often are we producing successful builds? How often are we deploying code to production?

I’m sure that just scratches the surface of what you can do. How do you measure developer productivity?

Colorized Word Diffs

I’ve been finding myself doing a lot for copy and tech editting. I needed a way to annotate a PDF based on the changes I’d made to our markdown source. Trying to eyeball this was difficult, and I checked if there was a way to do word-by-word diffs based on SVN output. Remember, SVN’s diff command will show line-by-line differences. But there is a Perl script wdiff that will show you word by word diffs. With a bash alias, you can pipe svn output to it and see what’s been added/removed more clearly.  Once you install wdiff, either from source or via your distributions package manager, create an alias (I use bash) that will colorize its output:

alias cwdiff="wdiff -n -w $'\033[30;41m' -x $'\033[0m' -y $'\033[30;42m' -z $'\033[0m'"

wdiff compares two files by default but in can also parse the output of a diff file. By piping svn diff to it, you can see the changes word for word. In the example below, I’m reviewing the changes made by a specific commit and using less to page the output, the -r flag keeps the colors intact.

svn diff -r 267 -x -w 05.chapter.markdown | cwdiff -d | less -r

Words that were deleted will be in red, additions are in green.

color-diff

 Update:

Git has this behavior built in using:

git diff --color-words

Also, if you need to share your changes with a designer (who maybe needs to update a PDF…), with this ansi2html script from pixelbeat.org, you can generate HTML output showing your changes. I found that its useful to run the generated HTML through fold so that the lines wrap if you email it to someone.

svn diff foo.markdown | cwdiff -d | ~/bin/ansi2html.sh | fold -s > foo-updates.html

Finally, you can wrap all this functionality into a simple bash script that you can save anywhere. Pipe it the output of a svn diff or diff command, and get nice HTML as a result. It assumes ansi2html is somewhere in your executables $PATH.

#!/bin/bash
# take diff input (from svn diff too), colorize it, convert it to html

cat - | wdiff -n -w $'\033[30;41m' -x $'\033[0m' -y $'\033[30;42m' -z $'\033[0m' -d \
| ansi2html.sh | fold -s

Highest attended soccer matches in the USA

This started as a reply to a reddit poster claiming a USA-Turkey match in 2010 was “the highest attended soccer match ever”

According to this the attendance was 55,407. Nice, but not the highest ever for soccer.
http://www.ussoccer.com/news/mens-national-team/2010/05/turkey-game-report.aspx

But not the larget for soccer that I can find. Portugal played the USA at RFK during the 1996 olympics, attendance was 58,012. 
http://www.dcconvention.com/Venues/RFKStadium/UniqueSpaces.aspx

Also MLS Cup 1997 at RFK featuring home side D.C. United was attended by 57,431 people.
http://en.wikipedia.org/wiki/MLS_Cup_’97

Also, the LA Coliseum would sell out for soccer matches, albeit ones not featuring the USA. Capacity is 92k
http://articles.latimes.com/1999/apr/29/news/mn-32450

Turns out the USSF has a page with attendance records, and the USA-Turkey game, or the others mentioned by me above, would not make it, as the minimum cutoff is around 78,000. Maybe the US turkey game was the best attended USMNT during the previous world cup cycle?
http://www.ussoccer.com/teams/us-men/records/attendance-records/largest-crowds-in-us.aspx

Quick mysqldump snapshot script

I find myself needing this often on different servers, you may find it useful too. This script creates a timestamped dump of a mysql database to the current directory. Assumes it runs as a user who can connect to the database. You can set those credentials using the -u and -p command line switches for mysqldump

#!/bin/bash
# retrieve a database and gzip it

if [ $# -ne 1]
then
  echo "Usage: `basename $0` {database_name}"
  exit $E_BADARGS
fi

DB="$1"

DATEZ=`date +%Y-%m-%d-%H%M`
OUTFILE="$DB.$DATEZ.sql.gz";

echo "mysqldump for $DB to $OUTFILE"
sudo mysqldump --opt -e -C -Q $1 | gzip -c > $OUTFILE

Measuring Site Speed localy

This is a cool utility that uses YSlow! and PhantomJS to measure your site’s speed across many pages. Should be good for identifying slow individual pages as well as practices that impact multiple pages.

This is how it works: You feed it with a start URL and how deep you want it to crawl. The pages are fetched and all links within the same domain from that page are analyzed, and a couple of HTML pages are created with the result. Sounds simple?

From: Performance Calendar » Do you sitespeed?

Building CandiData

This past weekend, my colleague and friend Sandy Smith participated in Election Hackathon 2012 (read his take of the hackathon). We built our first public Musketeers.me product, Candidata.me. This was my first hackathon, and it was exciting and exhausting to bring something to life in little more than 24 hours. Our idea combined a number of APIs to produce a profile for every candidate running for President or Congress in the United States. The seed of the idea was good enough that we were chosen among 10 projects to present it to the group at large on Sunday afternoon.

Under the Hood and Hooking Up with APIs

We used our own PHP framework, Treb, as our foundation. It provides routing by convention, controllers, db acccess, caching, and a view layer. Along the way, we discovered a small bug in our db helper function that failed because of the nuances of autoloading.

I quickly wrote up a base class for making HTTP Get requests to REST APIs. The client uses PHPs native stream functions for making the HTTP requests, which I’ve found easier to work with than the cURL extension. The latter is a cubmersome wrapper to the cURL fucntionality.  

To be good API clients, we cached the request responses in Memcached between an hour to a month, depending on how often we anticipated the API response to change.

Sandy also took on the tedious – but not thankless – task of creating a list of all the candidates that we imported into a simpl Mysql table. For each candidate, we could then pull in information such as

  • Polling data from Huffington Post’s Pollster API, which we then plotted using jqplot. Polls weren’t available for every race, so we had to manually match available polls to candidates.
  • Basic Biographical information from govtrack.us
  • Campaign Finance and Fact Checked statements from Washington Post’s APIs.
  • Latest News courtesy of search queries to NPR’s Story Api.
  • A simple GeoIP lookup on the homepage to populate the Congressional candidates when a user loads the page

Bootstrap for UI goodness.

I used this opportunity to check out Twitter’s Bootstrap framework. It let us get a clean design from the start, and we were able to use its classes and responsive grid to make the site look really nice on tablets and smartphones too. I found it a lot more feature filled than Skeleton, which is just a responsive CSS framework and lacks the advanced UI elements like navigation, drop downs, modals found in Bootstrap.

Improvements that we could make

We’ve already talked about a number of features we could add or rework to make the site better. Of course, given the shelf life this app will have after November 6th, we may not get to some of these.

  • Re-work the state navigation on the homepage so that it plays nice with the browser’s history. We did a simple ajax query on load, but a better way to do it would be to change the hash to contain the state “http://candidata.us/#VA”, and then pull in the list of candidates. This would also only initiate the geoip lookup if the hash is missing.
  • Add a simple way to navigate to opponents from a candidate’s page.
  • Allow users to navigate to other state races from a candidate’s page.
  • Get more candidate information, ideally something that can provide us Photos of each candidate. Other apps at the hackathon had this, but we didn’t find the API in time. Sunlight provides photos for Members of Congress.
  • Pull in statements made by a candidate via WaPo’s Issue API, maybe running it through the Trove API to pull out categories, people, and places mentioned in the statement.
  • Use the Trove API to organize or at least tag latest news stories and fact checks by Category.

Overall, I’m very happy with what we were able to build in 24 hours. The hackathon also exposed me to some cool ideas and approaches, particularly the visualizations done by some teams. I wish I’d had spent a little more time meeting other people, but my energy was really focused on coding most of the time.

Please check out CandiData.me and let me know what you think either via email or in the comments below.

Drupal Workflow and Access control

Large organizations want to, or may actually need, strict access control and content review workflows to manage the publishing process on their website. Drupal’s default roles and permissions system is designed to handle the simplest of setups. However, there is a wide variety of modules that each address these needs. Many of them overlap, some complement each other, while others are abandonded and haven’t been ported to Drupal 7. In this article, I’ll look at some active modules for these requirements.

Revisioning for Basic Workflow

Unless your web publishing workflow is very complicated -= in which case, I think you have other problems – the Revisioning module should suit a basic author-editor review setup. You’ll need at least two roles, an author role that can create and update but not publish new nodes, and an editor role that reviews and publishes changes made by authors.

Authors write content that prior to being made publicly visible must be reviewed (and possibly edited) by moderators. Once the moderators have published the content, authors should be prevented from modifying it while “live”, but they should be able to submit new revisions to their moderators.

Once installed, the module provides a view to show revision status at /content-summary.

Content Access for write privileges

The Content Access module allows more fine grained control over view, edit, and delete permissions. Furthermore, it allows you to set access controls on a per node basis. This lets you restrict the set of pages, stories, or other content types that a single role can change.

This module allows you to manage permissions for content types by role and author. It allows you to specifiy custom view, edit and delete permissions for each content type. Optionally you can enable per content access settings, so you can customize the access for each content node.

Related Modules

If you need “serious” workflow in your Drupal, there is the aptly named Workflow module, and the newer State Machine and Workbench Moderation modules. These modules seem to be actively developed, and with enough time, tweaking and experimentation should help solve many workflow issues.

Using bcrypt to store passwords

The linkedin password breach highlighted once again the risks associated with storing user passwords. I hope you are not still storing passwords in the clear and are using a one-way salted hash before storing them. But, the algorithm you choose to use is also important. If you don’t know why, go read You’re Probably Storing Passwords Incorrectly.

The choice, at the moment, seems to come down to SHA512 versus Bcrypt encryption. There’s a StackOverflow Q&A discussing the merits of each. Bcrypt gets the nod since its goal is to be slow enough that brute force attacks would take too much time to be feasible, but not so slow that honest users would really notice and be inconvenienced [1].

I wanted to switch one of my personal apps to use bcrypt, which on php means using Blowfish encryption via the crypt() function. There’s no shortage of classes and examples for using bcrypts to hash a string. But I didn’t find anything that outlined how to setup a database table to store usernames and passwords, salt and store passwords, and then verify a login request.

Storing passwords in Mysql

To store passwords in a MySQL database, all we need is a CHAR field of length 60. And you don’t need a separate column for the salt, as it will be stored as part of the password. The SQL for a minimal Users table is shown below.

CREATE TABLE `users` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `username` varchar(30) NOT NULL,
  `password` char(60) NOT NULL,
  PRIMARY KEY (`id`),
);

When a user registers providing a username and password, you have to generate a salt and hash the password, before saving it. This gist helped me figure out how to salt and hash them.

function save_user($username, $password, PDO $db)
{
    // create a random salt
    $salt = substr(str_replace('+', '.', base64_encode(sha1(microtime(true), true))), 0, 22);

    // hash incoming password - this works on PHP 5.3 and up
    $hash = crypt($password, '$2a$12$' . $salt);

    // store username and hashed password
    $insert = $db->prepare("INSERT INTO users (username, password) VALUES (?, ?)");
    $insert->execute($username, $hash)
}

Authenticating Users

When a user comes back to your site and tries to login, you retrieve their credentials and then compare the expected password to the supplied password. Remember we were clever and stored the salt as part of our hash in the password field? Now, we can reuse our stored password as the salt for hashing the incoming password. If its the right password, we’ll have two identical hashes. Magic!

function validate_user($username, $password, PDO $db)
{
    // attempt to lookup user's information
    $query= $db->prepare('SELECT * FROM users WHERE username=?';
    $query->execute(array($username));

    if (0 == $query->rowCount()) {
        // user not found
        return false;
    }

    $user = $query->fetch();
    // compare the password to the expected hash
    if (crypt($password, $user['password']) == $user['password']) {
        // let them in
        return $user;
    }

    // wrong password
    return false;
}

Those are the basics for using bcrypt to store passwords with PHP and MySQL. The main difference I found, was that the hashing and comparison of hashes now happens in PHP. With MD5 and SHA algorithms, you could invoke them using the database functions provided by MySQL. As far as I could find, it doesn’t have a native Blowfish/bcrypt function. If your system provides a crypt() call, you maybe be able to use Blowfish encryption, but it won’t be an option on Windows systems.