Building CandiData

This past weekend, my colleague and friend Sandy Smith participated in Election Hackathon 2012 (read his take of the hackathon). We built our first public Musketeers.me product, Candidata.me. This was my first hackathon, and it was exciting and exhausting to bring something to life in little more than 24 hours. Our idea combined a number of APIs to produce a profile for every candidate running for President or Congress in the United States. The seed of the idea was good enough that we were chosen among 10 projects to present it to the group at large on Sunday afternoon.

Under the Hood and Hooking Up with APIs

We used our own PHP framework, Treb, as our foundation. It provides routing by convention, controllers, db acccess, caching, and a view layer. Along the way, we discovered a small bug in our db helper function that failed because of the nuances of autoloading.

I quickly wrote up a base class for making HTTP Get requests to REST APIs. The client uses PHPs native stream functions for making the HTTP requests, which I’ve found easier to work with than the cURL extension. The latter is a cubmersome wrapper to the cURL fucntionality.  

To be good API clients, we cached the request responses in Memcached between an hour to a month, depending on how often we anticipated the API response to change.

Sandy also took on the tedious – but not thankless – task of creating a list of all the candidates that we imported into a simpl Mysql table. For each candidate, we could then pull in information such as

  • Polling data from Huffington Post’s Pollster API, which we then plotted using jqplot. Polls weren’t available for every race, so we had to manually match available polls to candidates.
  • Basic Biographical information from govtrack.us
  • Campaign Finance and Fact Checked statements from Washington Post’s APIs.
  • Latest News courtesy of search queries to NPR’s Story Api.
  • A simple GeoIP lookup on the homepage to populate the Congressional candidates when a user loads the page

Bootstrap for UI goodness.

I used this opportunity to check out Twitter’s Bootstrap framework. It let us get a clean design from the start, and we were able to use its classes and responsive grid to make the site look really nice on tablets and smartphones too. I found it a lot more feature filled than Skeleton, which is just a responsive CSS framework and lacks the advanced UI elements like navigation, drop downs, modals found in Bootstrap.

Improvements that we could make

We’ve already talked about a number of features we could add or rework to make the site better. Of course, given the shelf life this app will have after November 6th, we may not get to some of these.

  • Re-work the state navigation on the homepage so that it plays nice with the browser’s history. We did a simple ajax query on load, but a better way to do it would be to change the hash to contain the state “http://candidata.us/#VA”, and then pull in the list of candidates. This would also only initiate the geoip lookup if the hash is missing.
  • Add a simple way to navigate to opponents from a candidate’s page.
  • Allow users to navigate to other state races from a candidate’s page.
  • Get more candidate information, ideally something that can provide us Photos of each candidate. Other apps at the hackathon had this, but we didn’t find the API in time. Sunlight provides photos for Members of Congress.
  • Pull in statements made by a candidate via WaPo’s Issue API, maybe running it through the Trove API to pull out categories, people, and places mentioned in the statement.
  • Use the Trove API to organize or at least tag latest news stories and fact checks by Category.

Overall, I’m very happy with what we were able to build in 24 hours. The hackathon also exposed me to some cool ideas and approaches, particularly the visualizations done by some teams. I wish I’d had spent a little more time meeting other people, but my energy was really focused on coding most of the time.

Please check out CandiData.me and let me know what you think either via email or in the comments below.

Drupal Workflow and Access control

Large organizations want to, or may actually need, strict access control and content review workflows to manage the publishing process on their website. Drupal’s default roles and permissions system is designed to handle the simplest of setups. However, there is a wide variety of modules that each address these needs. Many of them overlap, some complement each other, while others are abandonded and haven’t been ported to Drupal 7. In this article, I’ll look at some active modules for these requirements.

Revisioning for Basic Workflow

Unless your web publishing workflow is very complicated -= in which case, I think you have other problems – the Revisioning module should suit a basic author-editor review setup. You’ll need at least two roles, an author role that can create and update but not publish new nodes, and an editor role that reviews and publishes changes made by authors.

Authors write content that prior to being made publicly visible must be reviewed (and possibly edited) by moderators. Once the moderators have published the content, authors should be prevented from modifying it while “live”, but they should be able to submit new revisions to their moderators.

Once installed, the module provides a view to show revision status at /content-summary.

Content Access for write privileges

The Content Access module allows more fine grained control over view, edit, and delete permissions. Furthermore, it allows you to set access controls on a per node basis. This lets you restrict the set of pages, stories, or other content types that a single role can change.

This module allows you to manage permissions for content types by role and author. It allows you to specifiy custom view, edit and delete permissions for each content type. Optionally you can enable per content access settings, so you can customize the access for each content node.

Related Modules

If you need “serious” workflow in your Drupal, there is the aptly named Workflow module, and the newer State Machine and Workbench Moderation modules. These modules seem to be actively developed, and with enough time, tweaking and experimentation should help solve many workflow issues.

A new venture: musketeers.me

This has been in the works for a bit now, and I’m excited to announce that along with Eli White, Kevin Bruce, and Sandy Smith, we’ve formed muskeeters.me. We worked well together at our previous venture, that we wanted to keep it going. The new venture will do the usual web consulting, from strategy and planning, systems architecture with an eye on scaling, through design and development. We’ve also kicked around our own product ideas to work on, which we can share sooner rather than later.

Automating FTP uploads

I needed to automate copying files for a website that I was building. Since this site was hosted on an inexpensive shared hosting plan, I didn’t have the luxury of shell or rsync access to automate copying files from my local development environment to the host. The only option was FTP, and after wasting too much time manually tracking down which files I needed to update, I knew I needed an automated solution. Some googling led me to lftp, a command-line and scriptable FTP client. It should be available via your distribution’s repository. Once installed, you can use a script like the one below to automatically copy files.

{syntaxhighlighter BASH}
#!/bin/sh
# paths
HERE=`pwd`
SRC=”$HERE/web”
DEST=”~/www”
# login credentials
HOST=”ftp.commodity-hosting.com”
USER=”mysiteusername”
PASS=”supersecretpassword”
# FTP files to remote host
lftp -c “open $HOST
user $USER $PASS
mirror -X img/* -X .htaccess -X error_log –only-newer –reverse –delete –verbose $SRC $DEST

The script does the following:

  • Copy files from the local ./web directory to the remote ~/www directory.
  • Uses $HOST, $USER, $PASS to login, so make sure your script is readable, writeable, and executable only by you and trusted users.
  • the lftp command connects and copies the files. The -c switch specifies the commands to issue, one per line. The magic happens with the mirror command which will copy the files. Since we added the –only-newer and –reverse switches, this will upload only files which have changed.
  • You could be a little safer and remove the –delete switch, which will remove files from the destination which are not on your local machine.
  • You can use the -X to give it glob patterns to ignore. In this case, it won’t touch the img/ directory or the .htacess file.

If you’re still moving files over FTP manually, even with a good GUI, it’ll be worth your time to automate it and make it a quicker, less error-prone process.

Using bcrypt to store passwords

The linkedin password breach highlighted once again the risks associated with storing user passwords. I hope you are not still storing passwords in the clear and are using a one-way salted hash before storing them. But, the algorithm you choose to use is also important. If you don’t know why, go read You’re Probably Storing Passwords Incorrectly.

The choice, at the moment, seems to come down to SHA512 versus Bcrypt encryption. There’s a StackOverflow Q&A discussing the merits of each. Bcrypt gets the nod since its goal is to be slow enough that brute force attacks would take too much time to be feasible, but not so slow that honest users would really notice and be inconvenienced [1].

I wanted to switch one of my personal apps to use bcrypt, which on php means using Blowfish encryption via the crypt() function. There’s no shortage of classes and examples for using bcrypts to hash a string. But I didn’t find anything that outlined how to setup a database table to store usernames and passwords, salt and store passwords, and then verify a login request.

Storing passwords in Mysql

To store passwords in a MySQL database, all we need is a CHAR field of length 60. And you don’t need a separate column for the salt, as it will be stored as part of the password. The SQL for a minimal Users table is shown below.

CREATE TABLE `users` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `username` varchar(30) NOT NULL,
  `password` char(60) NOT NULL,
  PRIMARY KEY (`id`),
);

When a user registers providing a username and password, you have to generate a salt and hash the password, before saving it. This gist helped me figure out how to salt and hash them.

function save_user($username, $password, PDO $db)
{
    // create a random salt
    $salt = substr(str_replace('+', '.', base64_encode(sha1(microtime(true), true))), 0, 22);

    // hash incoming password - this works on PHP 5.3 and up
    $hash = crypt($password, '$2a$12$' . $salt);

    // store username and hashed password
    $insert = $db->prepare("INSERT INTO users (username, password) VALUES (?, ?)");
    $insert->execute($username, $hash)
}

Authenticating Users

When a user comes back to your site and tries to login, you retrieve their credentials and then compare the expected password to the supplied password. Remember we were clever and stored the salt as part of our hash in the password field? Now, we can reuse our stored password as the salt for hashing the incoming password. If its the right password, we’ll have two identical hashes. Magic!

function validate_user($username, $password, PDO $db)
{
    // attempt to lookup user's information
    $query= $db->prepare('SELECT * FROM users WHERE username=?';
    $query->execute(array($username));

    if (0 == $query->rowCount()) {
        // user not found
        return false;
    }

    $user = $query->fetch();
    // compare the password to the expected hash
    if (crypt($password, $user['password']) == $user['password']) {
        // let them in
        return $user;
    }

    // wrong password
    return false;
}

Those are the basics for using bcrypt to store passwords with PHP and MySQL. The main difference I found, was that the hashing and comparison of hashes now happens in PHP. With MD5 and SHA algorithms, you could invoke them using the database functions provided by MySQL. As far as I could find, it doesn’t have a native Blowfish/bcrypt function. If your system provides a crypt() call, you maybe be able to use Blowfish encryption, but it won’t be an option on Windows systems.

Back to SQL it is

An honest write up with first hand details of the shortcomings of couchdb in production. There’s a reason to stick with proven technologies and not simply chasing the latest shiny. Not saying sauce labs did that, just sayin’.

This post describes our experience using CouchDB, and where we ran into trouble. I’ll also talk about how this experience has affected our outlook on NoSQL overall, and how we designed our MySQL setup based on our familiarity with the positive tradeoffs that came with using a NoSQL database.

From: Goodbye, CouchDB | Selenium Testing? Do Cross Browser Testing with Sauce Labs

Are social referrals killing SEO?

I think this is an improvement to how we write for the web. Instead of optimizing for a machine algorithm that can bring traffic to an article, writers must concentrate on making the article interesting so that readers will share it with their friends. Curation is the key, which is also why I hated the “social reader” apps that want to post for me to see everything my friends read. I want to know what they recommend I read.

“Sixteen months ago we received the same number of monthly referrals from search as social. Now 40% of traffic comes from social media,” Scott Havens, senior vice president of finance and digital operations at The Atlantic Media Company, said in a phone conversation ahead of his on-stage interview at our Mashable Connect conference in Orlando, Fla. last weekend. “Truly [our writers] are not really thinking about SEO anymore. Now it’s about how we can spin a story so that it goes viral.”

From: Why ‘The Atlantic’ No Longer Cares About SEO

Apache: Allowing access to subpaths

Seems this is always a tricky one that I run into when setting up a new test site. If you want to lock down the entire site behind a password, but need to allow some unauthenticated requests for external integration, then have a look at the solution below. It uses SetEnvIf to open up access to selected paths. This should work too, if your route most requests through a front controller script using a rewrite rule.

In this brief tutorial, we are going to enable users to access any file or directory of a site that is password-protected via htaccess. There are many reasons for wanting to employ this technique, including:

From: Stupid htaccess Trick: Enable File or Directory Access to Your Password-Protected Site : Perishable Press

Fix SSL timeouts with the Facebook PHP-SDK

I ran into SSL timeouts on in local development setup when I was re-factoring some integration code with facebook and using their SDK. It was tricky to diagnose, I was sure that my changes couldn’t be the cause, and I finally confirmed it by running our production codebase. Since it was having the same timeout error, I knew the bug had to be in an underlying layer.

For the record, I’m running this version of curl on my Archlinux box:

<code>curl 7.25.0 (x86_64-unknown-linux-gnu) libcurl/7.25.0<br /> OpenSSL/1.0.1 zlib/1.2.6 libssh2/1.4.0
</code>

I also got the error from the command line with

<code>curl "https://graph.facebook.com/oauth/access_token"
</code>

But it is fixed with

<code>curl --sslv3 "https://graph.facebook.com/oauth/access_token"
</code>

Debian Server

On a debian squeeze server, with the latest (4/3/2011) version of curl:

<code>curl 7.21.0 (x86_64-pc-linux-gnu) libcurl/7.21.0<br /> OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.15 libssh2/1.2.6
</code>

The timeout does not happen with either of the following commands:

<code>curl "https://graph.facebook.com/oauth/access_token"
curl --sslv3 "https://graph.facebook.com/oauth/access_token"
</code>

OS X

The time out does not happen on OS X which runs curl 7.21.4

Thoughts

So, this timeout only seems to affect users with very new version of curl. Fixing it requires adding a line to the Facebook PHP SDK, which while minor, you have to remember if you ever upgrade it. At the same time, this bug could come back and bite you down the road if your operating system sneaks in a newer version of curl. You can see a fork of the PHP SDK with this fix on github.

Other references:

  1. Facebook bug ticket
  2. Maybe related PHP bug

Start ups are for everybody

I came across Dan Crow’s insights into startups and older workers this morning, and I couldn’t stop myself from nodding in agreement through out the article. Part of it, surely, is that I am now closer to 40 than 30. But everything he says about the value of spending time with family, the pointlessness of working grueling hours, and the skills that come from experience had an air of “I’ve been there”.

Many startups, especially in Silicon Valley, have a macho culture of working extremely long hours. I vividly recall a long stretch of consecutive 100+ hour weeks at Apple early in my career — which came on top of a 3 hour commute to San Francisco. The quality of my work noticeably declined, and it took me months to get my focus and energy back afterwards.

It seems that both corporate america and silicon valley startups, while vastly different cultures in almost every regard, still see people as expendable resources that can be used up and replaced. Sure, if you’re working non stop for a startup, you can tell yourself that there’s a huge payoff at the end, or the chance of it. But the risk is that you spend your 20s and early 30s working forever without much to show for it. That was never something I wanted to do, and I’m lucky that I didn’t have to, either.

Why startups shouldn’t just be for the young