Remove unapproved comments from WordPress exports

Recently, I needed to migrate some WordPress blogs to another system. WordPress provides a handy way to export content in its WXR format. However, it’ll export all comments, whether approved or not. This is good from a data backup standpoint, but I didn’t need to import these. They were also bloating the XML file and affecting how long it took my import to process.  I needed a way to remove unapproved comments, the following code will do that using PHP’s DOMDocument extension to walk an input file. The cleaned up content is sent to STDOUT so you can pipe it to another file to save.

<?php
if (!isset($_SERVER['argv'][1])) {
    echo "\nSpecify input file \n";
    exit;
}

$infile = $_SERVER['argv'][1];

$doc = new DOMDocument();
$doc->recover = TRUE;
$doc->load($infile);

$comments = $doc->getElementsByTagName('comment');
$to_remove = array();

foreach ($comments as $comment) {
    if ($approved = $comment->getElementsByTagName('comment_approved')) {
        if ($approved->length > 0) {
            $app = $approved->item(0);

            // can't remove nodes while looping
            if (0 == $app->nodeValue) {
                $to_remove[] = $comment;
            }
        }
    }
}

if (count($to_remove)) {
    foreach ($to_remove as $elt) {
        $elt->parentNode->removeChild($elt);
    }
}

$doc->formatOutput = true;
$doc->preserveWhiteSpace = false;
echo $doc->saveXML();