Quick mysqldump snapshot script

I find myself needing this often on different servers, you may find it useful too. This script creates a timestamped dump of a mysql database to the current directory. Assumes it runs as a user who can connect to the database. You can set those credentials using the -u and -p command line switches for mysqldump

#!/bin/bash
# retrieve a database and gzip it

if [ $# -ne 1]
then
  echo "Usage: `basename $0` {database_name}"
  exit $E_BADARGS
fi

DB="$1"

DATEZ=`date +%Y-%m-%d-%H%M`
OUTFILE="$DB.$DATEZ.sql.gz";

echo "mysqldump for $DB to $OUTFILE"
sudo mysqldump --opt -e -C -Q $1 | gzip -c > $OUTFILE

Smelly PHP code

Adam Culp posted the 3rd article in his Clean Development Series this week, Dirty Code (how to spot/smell it). When you read it, you should keep in mind that he is pointing out practices which correlate with poorly written code not prescribing a list of things to avoid. It’s a good list of things to look for and engendered quite a discussion in our internal Musketeers IRC.

Comments are valuable

Using good names for variables, functions, and methods does make your code self commenting, but often times that is not sufficient. Writing good comments is an art, too many comments get in the way, but a lack of comments is just as bad. Code can be dense to parse where a comment will help you out. They also let you quickly scan through a longer code block, just skimming the comments, to find EXACTLY the bit you need to change/desbug/fix/etc. Of course, the latter you can also get by breaking up large blocks of code into functions.

Comments should not explain what the code does, but should capture the “why” of how you are solving a problem. For example, if you’re looping over something a bad comment is “// loop through results”, a good comment is “// loop through results and extract any image tags”

Using Switch Statements

You definitely should not take this item in his list to mean that “Switch statements are evil.” You could have equally bad code if you use a long block of if/then/elseif statements. If you’re using them within a class, you’re better off using polymorphism, as he suggests, or maybe look at coding to an Interface instead of coding around multiple implementations.

Other code smells

In reviewing the article, I thought of other smells that indicate bad code. Some are minor, but if frequent, you know you’re dealing with someone who knows little more than to copy-and-paste code from the Interwebs. These include:

  • Error suppression with @. There are very, very, very few cases where its ok to suppress an error instead of handling the error or preventing it in the first place.
  • Using globals directly. Anything in $_GET, $_POST, $_REQUEST, $_COOKIE should be filtered and validated before you use it. ‘Nuff said
  • Deep class hierarchy. A deep class hierarchy likely means you should be using composition instead of inheritance to change class behaviors.
  • Lack of Prepared DB Statements. Building SQL queries as strings instead of using PDO or the mysqli extensions’ prepared statements can open up sql injection vulnerabilities.
  • Antiquated PHP Practices. A catch all for things we all did nearly a decade ago, includes depending on register globals being on, using “or die()” to catch errors, using the mysql_* functions. PHP has evolved, there’s no reason for you not to evolve with it.

That’s generally what I look for when evaluating code quality. What are some things I missed?

Building CandiData

This past weekend, my colleague and friend Sandy Smith participated in Election Hackathon 2012 (read his take of the hackathon). We built our first public Musketeers.me product, Candidata.me. This was my first hackathon, and it was exciting and exhausting to bring something to life in little more than 24 hours. Our idea combined a number of APIs to produce a profile for every candidate running for President or Congress in the United States. The seed of the idea was good enough that we were chosen among 10 projects to present it to the group at large on Sunday afternoon.

Under the Hood and Hooking Up with APIs

We used our own PHP framework, Treb, as our foundation. It provides routing by convention, controllers, db acccess, caching, and a view layer. Along the way, we discovered a small bug in our db helper function that failed because of the nuances of autoloading.

I quickly wrote up a base class for making HTTP Get requests to REST APIs. The client uses PHPs native stream functions for making the HTTP requests, which I’ve found easier to work with than the cURL extension. The latter is a cubmersome wrapper to the cURL fucntionality.  

To be good API clients, we cached the request responses in Memcached between an hour to a month, depending on how often we anticipated the API response to change.

Sandy also took on the tedious – but not thankless – task of creating a list of all the candidates that we imported into a simpl Mysql table. For each candidate, we could then pull in information such as

  • Polling data from Huffington Post’s Pollster API, which we then plotted using jqplot. Polls weren’t available for every race, so we had to manually match available polls to candidates.
  • Basic Biographical information from govtrack.us
  • Campaign Finance and Fact Checked statements from Washington Post’s APIs.
  • Latest News courtesy of search queries to NPR’s Story Api.
  • A simple GeoIP lookup on the homepage to populate the Congressional candidates when a user loads the page

Bootstrap for UI goodness.

I used this opportunity to check out Twitter’s Bootstrap framework. It let us get a clean design from the start, and we were able to use its classes and responsive grid to make the site look really nice on tablets and smartphones too. I found it a lot more feature filled than Skeleton, which is just a responsive CSS framework and lacks the advanced UI elements like navigation, drop downs, modals found in Bootstrap.

Improvements that we could make

We’ve already talked about a number of features we could add or rework to make the site better. Of course, given the shelf life this app will have after November 6th, we may not get to some of these.

  • Re-work the state navigation on the homepage so that it plays nice with the browser’s history. We did a simple ajax query on load, but a better way to do it would be to change the hash to contain the state “http://candidata.us/#VA”, and then pull in the list of candidates. This would also only initiate the geoip lookup if the hash is missing.
  • Add a simple way to navigate to opponents from a candidate’s page.
  • Allow users to navigate to other state races from a candidate’s page.
  • Get more candidate information, ideally something that can provide us Photos of each candidate. Other apps at the hackathon had this, but we didn’t find the API in time. Sunlight provides photos for Members of Congress.
  • Pull in statements made by a candidate via WaPo’s Issue API, maybe running it through the Trove API to pull out categories, people, and places mentioned in the statement.
  • Use the Trove API to organize or at least tag latest news stories and fact checks by Category.

Overall, I’m very happy with what we were able to build in 24 hours. The hackathon also exposed me to some cool ideas and approaches, particularly the visualizations done by some teams. I wish I’d had spent a little more time meeting other people, but my energy was really focused on coding most of the time.

Please check out CandiData.me and let me know what you think either via email or in the comments below.

Using bcrypt to store passwords

The linkedin password breach highlighted once again the risks associated with storing user passwords. I hope you are not still storing passwords in the clear and are using a one-way salted hash before storing them. But, the algorithm you choose to use is also important. If you don’t know why, go read You’re Probably Storing Passwords Incorrectly.

The choice, at the moment, seems to come down to SHA512 versus Bcrypt encryption. There’s a StackOverflow Q&A discussing the merits of each. Bcrypt gets the nod since its goal is to be slow enough that brute force attacks would take too much time to be feasible, but not so slow that honest users would really notice and be inconvenienced [1].

I wanted to switch one of my personal apps to use bcrypt, which on php means using Blowfish encryption via the crypt() function. There’s no shortage of classes and examples for using bcrypts to hash a string. But I didn’t find anything that outlined how to setup a database table to store usernames and passwords, salt and store passwords, and then verify a login request.

Storing passwords in Mysql

To store passwords in a MySQL database, all we need is a CHAR field of length 60. And you don’t need a separate column for the salt, as it will be stored as part of the password. The SQL for a minimal Users table is shown below.

CREATE TABLE `users` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `username` varchar(30) NOT NULL,
  `password` char(60) NOT NULL,
  PRIMARY KEY (`id`),
);

When a user registers providing a username and password, you have to generate a salt and hash the password, before saving it. This gist helped me figure out how to salt and hash them.

function save_user($username, $password, PDO $db)
{
    // create a random salt
    $salt = substr(str_replace('+', '.', base64_encode(sha1(microtime(true), true))), 0, 22);

    // hash incoming password - this works on PHP 5.3 and up
    $hash = crypt($password, '$2a$12$' . $salt);

    // store username and hashed password
    $insert = $db->prepare("INSERT INTO users (username, password) VALUES (?, ?)");
    $insert->execute($username, $hash)
}

Authenticating Users

When a user comes back to your site and tries to login, you retrieve their credentials and then compare the expected password to the supplied password. Remember we were clever and stored the salt as part of our hash in the password field? Now, we can reuse our stored password as the salt for hashing the incoming password. If its the right password, we’ll have two identical hashes. Magic!

function validate_user($username, $password, PDO $db)
{
    // attempt to lookup user's information
    $query= $db->prepare('SELECT * FROM users WHERE username=?';
    $query->execute(array($username));

    if (0 == $query->rowCount()) {
        // user not found
        return false;
    }

    $user = $query->fetch();
    // compare the password to the expected hash
    if (crypt($password, $user['password']) == $user['password']) {
        // let them in
        return $user;
    }

    // wrong password
    return false;
}

Those are the basics for using bcrypt to store passwords with PHP and MySQL. The main difference I found, was that the hashing and comparison of hashes now happens in PHP. With MD5 and SHA algorithms, you could invoke them using the database functions provided by MySQL. As far as I could find, it doesn’t have a native Blowfish/bcrypt function. If your system provides a crypt() call, you maybe be able to use Blowfish encryption, but it won’t be an option on Windows systems.

What’s in your Project Management toolbox?

Matthew at DogStar describes his PM toolbox today, The Project Management Tool Box | Opensource, Nonprofits, and Web 2.0.  It’s a detailed and well organized list, and I think reflects a very practical approach. The first thing that strikes me, is the overwhelming amount of tools available to the would-be PM.  Certainly, there is no lack of tools out there.

You see, the general feeling is, there is no silver bullet. There is no grail of a tool that does everything a single Web Producer, Project Manager, Product Manager, or Content Manager might need or want. There is clearly a gap that is filled with a series of different products. This walked hand in hand with a desire to review processes at work and engage in course corrections. It is an excellent habit to follow – look what you are doing with a critical eye, analyse why you are doing it, and make changes as needed. I have worked across four different shops with a wide variety of different ways of practicing project management. I have used these methodologies and tools across ~ 50 different Drupal projects and another 25 or so custom PHP MySQL projects.

I could not agree more that its important to not be seduced into picking the one right tool for every situation.  It is a difficult tempation to resists, especially when you have tool makers pushing to sell you a solution. The best tool for the job isn’t the one that has the most features, its the one that you, and your team, end up using the most.

As I read the article, a thought that struck me is that sometimes, you don’t need ONE tool, you just need to make sure everyone has the right tools (and skills) to be productive and responsible. At work, we’re a tiny team of 3 who deal with day to day management of our Drupal site, unexpected requests on tight deadlines, and long term projects to build new features. Here’s a secret – we don’t have a central bug/ticket tracking tool. We can be productive simply with email, IM, code bearing, and face to face conversations. For big projects we use a whiteboard to wireframe, capture tasks, and track progress.  This works better than a more sophisticated technical solution that would impose a greater burder on our time.

What’s your experience with tools and grappling with finding the perfect tool?

Mysql Client more powerful than I thought

So far, the article I’ve learned the most from in this year’s edition of PHP Advent is PHP Advent 2008 / MySQL Client Tips.  Did you know you can use your favorite text editor from the mysql command line client? Or get help on the syntax for SQL commands?  I didn’t before reading this article, and I prefer using the command line interface whenever possible.  Every developer should be familiar with it, one day you’ll have to diagnose a poorly executing query without phpMyAdmin.

The mysql client is an interactive command-line client that works directly with your MySQL server. Many PHP developers prefer to use phpMyAdmin, a web-based client, when working with MySQL. It is easy and rather intuitive to use, doesn’t require you to know much SQL to fetch information, and is a whole lot less intimidating than a blank command prompt blinking at you. It is an application that sits between you and the database, providing a layer of abstraction. Sometimes that extra layer gets in the way. Sometimes, you need information that is not readily available from phpMyAdmin. Enter the mysql client.

HOWTO: Convert mysql data to UTF-8 quickly

I’ve been revamping my Bolivian soccer blog, which I finally updated and made live tonight.  In doing so, I needed to convert the data from ISO-8859-1 (back when I didn’t know better about how to setup a database) to UTF-8.  I was fearing the process would be complicated and tedious, involving some script to read the data and use PHP’s mbstring or iconv extensions to do the work.  Thanks to a post on a Mysql message board (I wish I’d saved the link, I’d give you full credit), it turned out to be a drop dead simple, or is it quick-and-dirty, 3 step process at the command line using the iconv tool.

$ mysqldump --opt --quote-names my_database &gt; my_database.sql<br />$ iconv -f ISO_8859-1 -t UTF-8 my_database.sql &gt; my_database-utf-8.sql<br />$ mysql my_database &lt; my_database-utf-8.sql<br />

Server Upgrade: OS basics

Debian AMD64 Setup

My Linux distribution of choice is Debian, although for the desktop I’d recommend Ubuntu. The server was first setup using the unofficial debian64 repositories. Since the amd64 architecture is now officially part of Debian, we moved to using an official repository. see Google groups. An unofficial repository contains /debian-amd64/ like so:

#deb http://mirror.espri.arizona.edu/debian-amd64/debian/ sarge main contrib
#deb-src http://mirror.espri.arizona.edu/debian-amd64/debian/ sarge main contrib

Since packages make it to stable at a fairly slow pace, we also want to use the testing branch. This is fairly straightforward, first update your /etc/apt/sources.list to use etch/testing by removing (or commenting out) other repositories and adding the following lines. If you are outside the USA, replace the .us. part with your country code.

deb http://ftp.us.debian.org/debian testing main contrib
deb-src http://ftp.us.debian.org/debian testing main contrib

Next, run apt-get to update the system, first clean the system, then update, and upgrade.

apt-get clean
apt-get update
apt-get dist-upgrade

Follow the onscreen instructions, when in doubt pick the default option or swithc to google and read up. If apt installs a new kernel, you’ll have to restart your server. Finally, to get up to date LAMP packages, you can use the dotdeb repositories by adding the following lines to your sources.list. We will need these later to install Apache2, MySQL, and PHP5.

deb http://dotdeb.netmirror.org/ stable all
deb-src http://dotdeb.netmirror.org/ stable all

Kernel Parameters

Since our last server crashed because of a very low max open files limit, its worth checking that this setting is not too low on the new server.

cat /proc/sys/fs/file-max

Currently this returns 100905, which should be more than enough open files for our expected traffic. See Debian kernel tuning for more info.

Enabling Hyperthreading

Finally, because the CPU is an Intel P4 with hyperthreading, you can try using an SMP kernel to enable the 2 “virtual” processors. While there seems to be some debate on the benefits of Intel’s hyperthreading, IBM produced some benchmarks on the 2.6.15 kernel which show some gains. You can install a new kernel with:

apt-get install linux-image-2.6.16-2-em64t-p4-smp

Make sure you install the correct one for your CPU, since there are a generic 64-bit kernel, AMD specific kernels, Intel specific kernels, as well as single and multi-processor versions for each.

That’s it for this part, if you restart you should still have a working server (I do). Still to do: basic system security and setting up a LAMP environment

Fun Saturday night

Bizarre. My self-signed SSL certificate I use to access my email using IMAP over SLL expired last week. The real significance is that we’ve been on this server for one year now. Compared to how painful working on the old Redhat 7.2 install we had back then, working with a Debian system has been a breeze.

So, I figured I’d just generate a new certificate, the instructions are readily googleable and I’d be playing a little Warcraft in no time. But no, after following the instructions Cyrus’s imap server wouldn’t start up. Eventually I figured out it was complaining about a missing lib related to the net-snmp package. A year ago, Cyrus 2.2 wasn’t in Debian testing yet, but it is now so it ended up being time to upgrade to the official packages.

The install was easy as usual, except for some disconcerting messages about my Cyrus databases needing an upgrade from DBD 3.2 to 4.2. To get the upgrade tool I installed the db4.2-utils package and ran the databases through the utility. I had my fingers crossed, with the spectre of data loss looming nearby (although I do have nightly backups that in theory I can recover from). The conversion worked well to, and all my email is in its place.

Feeling invincible, at this point I decided it was time to upgrade the kernel from the old stock 2.4 kernel to a newer 2.6 linux kernel. The only real reason is that I’d read Mysql performs noticeably better on the 2.6 kernels. Can you tell a difference? As far as I can tell, nothing is massively broken. If you notice anything odd though, please let me know!