If you have two copies of static sites and want to compare the output of the two, here are some Linux command line utilities I’ve used for that task on a recent project.
HTML Tidy
I used HTML tidy to normalize the HTML output of the two directories. This step formats the HTML files consistently and helped reduce false positives.
I was happy to see that HTML tidy has been resurrected, on Ubuntu I installed it with
apt-get install tidy
I ran tidy against all the HTML files in each output directory. I used the default configuration and it worked fine for me
find . -name '*.html' -type f -print -exec tidy --warn-proprietary-attributes false -mq '{}' \;
Diff
You can compare two directories at the command line and pipe the output to a text file for review. Once you’re familiar reading diff output, you can make sense of the lines that are different. I ended up using the –exclude and –ignore-matching-lines flag to get rid of lines which were different due to cache-busting flags or machine-generated CSS and JavaScript filenames. Doing so helped focus to find changes that matter.
diff -burw html/ html-new-nav/ > html-diff.txt
This diff output was useful for identifying which files were actually different. To see the changes in specific pairs of files, I like using a visual diff utility like WinMerge, Meld, or the diff built-in to PHPStorm.