From WordPress to Sculpin

Posted on

I moved this blog from a WordPress site to a static site generated by Sculpin. Here's why and how I did it.

Auto-updates made keeping the WordPress install up-to-date a low-effort affair. I logged in occasionally to check on things but other than that, this blog has been dormant. Then there were two-years of WordPress drama and the release of WordPress 7.0. This release adds hooks for AI integration-a feature I don't need. This release made me rethink why I was running WordPress. I've kept this site pretty basic and I realized I don't need a MySQL-backed CMS for it. I used to have a number of PHP apps on this server that used the database. This was the only one left using it. I prefer not running a database server for personal apps. The main uses for this VPS is for a notes wiki using Dokuwiki, which is completely file-based, and running Foundry for my online TTRPG games.

Static site generators are a go-to solution for hosting content-heavy websites today. The last two Drupal projects at work used the Tome module to build a static website for public consumption. If it's good enough for them, a static site for my blog should good enough for me.

Why Sculpin? Familiarity with PHP. I know how to follow the setup instructions, its all composer driven, and uses Twig for templating. Blog posts are written in Markdown, which I've been using regularly forever now and has become a de-facto format for raw content. Knowing the maintainers always helps and there must be some benefit to having played D&D with them, no?

I wrote a basic Symfony CLI script when I could steal some time away to code over a weekend. The script has two main components. A streaming XML reader for getting blog posts out of a WordPress export XML file one-at-a-time. This keeps the memory usage low to convert my posts going back to 2003 (!!). The second part uses a PHP wrapper around Pandoc to handle the actual HTML to Markdown conversion of each post plus some clean up of the result.

It worked well enough. Some broken HTML markup would cause Sculpin to fatal error at steps which were not obvious in the error output until I ran it wit vendor/bin/sculpin generate -vvv. Once all my posts were cleaned up, the site would build. I still had a bit of manual clean up for some old code samples, removing some extra markdown for HTML attributes that sculpin couldn't handle, and things like that. It took an evening to tidy those things up. I know I have broken images from my earliest posts, which I may be able to get from an old backup somewhere but for now I'm calling it good enough.

Let me know of the fediverse if anyone's interested in using my conversion script.

Migration script

Tags: WordPress

─── ✧ ─── ✦ ─── ✧ ───
─── ✧ ─── ✦ ─── ✧ ───

Comparing directories with HTML output

Posted on

If you have two copies of static sites and want to compare the output of the two, here are some Linux command line utilities I've used for that task on a recent project.

HTML Tidy

I used HTML tidy to normalize the HTML output of the two directories. This step formats the HTML files consistently and helped reduce false positives.

I was happy to see that HTML tidy has been resurrected, on Ubuntu I installed it with

apt-get install tidy

I ran tidy against all the HTML files in each output directory. I used the default configuration and it worked fine for me

find . -name '*.html' -type f -print -exec tidy --warn-proprietary-attributes false -mq '{}' \;

Diff

You can compare two directories at the command line and pipe the output to a text file for review. Once you're familiar reading diff output, you can make sense of the lines that are different. I ended up using the --exclude and --ignore-matching-lines flag to get rid of lines which were different due to cache-busting flags or machine-generated CSS and JavaScript filenames. Doing so helped focus to find changes that matter.

diff -burw html/ html-new-nav/ > html-diff.txt

This diff output was useful for identifying which files were actually different. To see the changes in specific pairs of files, I like using a visual diff utility like WinMerge, Meld, or the diff built-in to PHPStorm.

─── ✧ ─── ✦ ─── ✧ ───