, 8 min read
Example Theme for Simplified Saaze: Lemire
Another theme for Simplified Saaze called "Lemire". You can inspect it here. This theme is modeled after the blog from Daniel Lemire. That blog is powered by WordPress and hosted on SiteGround and performance enhanded by Cloudflare since 2019. Prof. Lemire started blogging in 2004. The number of posts per year are given in below table. Year 2023 is not complete.
Year | 04 | 05 | 06 | 07 | 08 | 09 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
#posts | 118 | 267 | 224 | 217 | 196 | 104 | 67 | 63 | 53 | 64 | 55 | 59 | 81 | 132 | 123 | 112 | 85 | 66 | 58 | 80 |
#comments | 223 | 458 | 215 | 361 | 647 | 836 | 892 | 743 | 888 | 903 | 744 | 656 | 1340 | 1165 | 1005 | 1269 | 832 | 560 | 501 | 671 |
These numbers are given by:
for i in `seq 2004 2023`; do grep 'h2 class="entry-title"' b*.html | grep -c me/blog/$i/; done
In total there are 2,224 blog posts over 20 years of permanent blogging. It can clearly be seen that the blog is updated on a regular basis, and many readers interact with the content.
Prof. Lemire values to have control over his blog, therefore doesn't use Medium or similar offers. Some key functionalities:
- Allows WordPress comments
- Informs e-mail subscribers about new posts, he has over 12,500 mail subscribers
- Provides search-functionaly on his blog
- Doesn't show any advertisements
- Provides an Atom RSS feed
- Blog posts are all in English
- Doesn't use categories or tags
- Doesn't use the
<!--more-->
tag - WordPress theme is based on "Twenty-Fifteen"
- There is no regular sitemap.xml for the blog posts
1. Converting WordPress blog. Download all blog posts via Perl script bloglemirecurl
.
This script downloads the so called "pages", which in turn contains 20 blog posts.
This HTML file, which contains 20 blog posts, is then converted to Markdown via Perl script bloglemiremd
.
bloglemiremd b*.html
The Markdown files are placed in /tmp/lemire
.
As usual you might need a few rounds to eliminate obvious conversion errors.
Finally you copy the Markdown files from /tmp/lemire
to your final destination.
There are 14 blog posts, which reside at the top of the directory, which are not part of the timeline. These posts are accessed via the left navigation bar (in blue). To convert these posts use
bloglemiremd -t *-*.html pred*.html
Again, the converted HTML files are stored under /tmp/lemire
for inspection.
Once you are fine with them, copy them to the final destination.
Go to .../content/blog
and run below loop using blogdate
to create an index.md
for each year:
for i in `seq 2004 2023`; do blogdate -p/lemire/blog/ -y$i $i/*.md > $i/index.md; done
Embedding icon in head-template file:
- Download icon:
curl https://lemire.me/blog/wp-content/uploads/2015/10/profile2011_152-150x150.jpg -o pr.jpg
- Converting to 32x32 size:
convert -resize 32x32 pr.jpg pr32x32.jpg
- Base64-encoding file:
base64 -w0 pr32x32.jpg
Size comparison for this icon: original JPG is 6,699 bytes, converted image is 934 bytes, base64-encoded is finally 1,248 bytes.
2. Installation. The entire theme including content and Simplified Saaze is installed via composer.
$ time composer create-project eklausme/saaze-lemire
Creating a "eklausme/saaze-lemire" project at "./saaze-lemire"
Installing eklausme/saaze-lemire (v1.0)
- Downloading eklausme/saaze-lemire (v1.0)
- Installing eklausme/saaze-lemire (v1.0): Extracting archive
Created project in /tmp/saaze-lemire
Loading composer repositories with package information
Updating dependencies
Lock file operations: 1 install, 0 updates, 0 removals
- Locking eklausme/saaze (v1.34)
Writing lock file
Installing dependencies from lock file (including require-dev)
Package operations: 1 install, 0 updates, 0 removals
- Downloading eklausme/saaze (v1.34)
- Installing eklausme/saaze (v1.34): Extracting archive
Generating optimized autoload files
No security vulnerability advisories found.
real 1.85s
user 0.27s
sys 0
swapped 0
total space 0
You need to compile a single C file once:
cd vendor/eklausme/saaze
cc -fPIC -Wall -O2 -shared php_md4c_toHtml.c -o php_md4c_toHtml.so -lmd4c-html
Now you can run php saaze
.
As mentioned Simplified Saaze is already installed via above composer command. In case you want to take a separate view at the Simplified Saaze source code see saaze.
3. Building static site. Running Simplified Saaze on all 2,224 blog posts:
saaze-lemire: time php saaze -rb /tmp/build
Building static site in /tmp/build...
execute(): filePath=/home/klm/php/saaze-lemire/content/blog.yml, nentries=2224, totalPages=112, entries_per_page=20
Finished creating 1 collections, 1 with index, and 2259 entries (0.39 secs / 22.55MB)
#collections=1, YamlParser=0.0314/2260-1, md2html=0.0362, MathParser=0.0167/2259, renderEntry=2259, content=2259/0, excerpt=0/0
real 0.41s
user 0.26s
sys 0
swapped 0
total space 0
In less than half a second the generation of all static files is completed. Machine in question: CPU is Ryzen 7 5700G, max clock 4.6 GHz, running on Arch Linux with kernel 6.6.8.
A screenshot of the theme is here:
The screenshot shows the results of a search, here for "WordPress".
The theme also features Pagefind. I have written on Pagefind: Searching in Static Sites. Creating the Pagefind index goes like this:
/tmp/build: time pagefind -s . --exclude-selectors aside --exclude-selectors footer
Running Pagefind v1.0.4
Running from: "/tmp/build"
Source: ""
Output: "pagefind"
[Walking source directory]
Found 2372 files matching **/*.{html}
[Parsing files]
Did not find a data-pagefind-body element on the site.
↳ Indexing all <body> elements on the site.
[Reading languages]
Discovered 1 language: en
[Building search indexes]
Total:
Indexed 1 language
Indexed 2372 pages
Indexed 29164 words
Indexed 0 filters
Indexed 0 sorts
Finished in 5.325 seconds
real 5.43s
user 4.50s
sys 0
swapped 0
total space 0
The index creation is way slower than creating all static pages.
4. Webserver rewrite rules. The conversion from WordPress to Markdown placed all blog posts from one year into a single directory at the same level. For example, the posts
https://lemire.me/blog/2006/01/03/are-debuggers-obselete/
is in directory .../content/blog/2006
and in file
01-03-are-debuggers-obselete.md
On my webserver the URL can be both, watch out for dash vs. slash:
- https://eklausmeier.goip.de/lemire/blog/2006/01-03-are-debuggers-obselete
- https://eklausmeier.goip.de/lemire/blog/2006/01/03/are-debuggers-obselete
Watch out for the /
slashes.
This is accomplished by below rewriting rule in the NGINX configuration file:
rewrite "^/lemire/blog/(\d\d\d\d)/(\d\d)/(\d\d)/(.*)" "/lemire/blog/$1/$2-$3-$4";
Instead of above rewriting rule once could place above Markdown file in the following directory
.../content/blog/2006/01/03
But this would create a lot of directories, which essentially all contain only a single file.
5. Fetching comments from WordPress. Perl script bloglemirecurlcomment
scans through above "pages", i.e., collection of 20 blog posts.
These pages contain 20 URLs. These URLs are fetched via curl
.
Essentially, this duplicates the blog posts, but at least we now have the comments for each post as well.
for i in `seq 1 112`; do bloglemirecurlcomment ../b$i.html; done
These HTML files are then processed by bloglemirecomment
, which scans for <h2 class="comments-title">
and writes out the comment file.
Each comment file is generated from the original blog post file by adding the word -comment-
to the file name after the day.
Type | File name |
---|---|
Blog post | /blog/yyyy/mm/dd/title.html |
Comment file | /blog/yyyy/mm-dd-comment-title.md |
Each comment file has index: false
, i.e., it will not show up in the index.
Though, all content is fully searchable.
In addition the Perl script blogdate
adds a link to each comment file. Calling is like:
for i in `seq 2004 2023`; do ( cd $i; ~/php/saaze-lemire/bin/blogdate -y$i *.md > index.md ) done
Counting the number of comments per year is like:
#!/bin/perl -W
# Count comments per year
use strict;
my ($year,%H) = (0,());
while (<>) {
$year = $1 if (/<link rel="canonical" href="https:\/\/lemire.me\/blog\/(\d\d\d\d)\/(\d\d)\/(\d\d)\//);
if (/(\w+) thought(|s) on “/) {
my $cnt = $1;
$cnt = 1 if ($cnt eq 'One');
$H{$year} += $cnt;
}
}
for (sort keys %H) {
printf("%04d\t%d\n",$_,$H{$_});
}
6. Building static site with separate comment pages. Generating all static pages for the entire blog including comments is:
saaze-lemire: time php saaze -rb /tmp/build
Building static site in /tmp/build...
execute(): filePath=/home/klm/php/saaze-lemire/content/blog.yml, nentries=2224, totalPages=112, entries_per_page=20
Finished creating 1 collections, 1 with index, and 3935 entries (0.89 secs / 66.49MB)
#collections=1, YamlParser=0.0630/3936-1, md2html=0.0895, MathParser=0.0575/3935, renderEntry=3935, content=3935/0, excerpt=0/0
real 0.91s
user 0.56s
sys 0
swapped 0
total space 0
This time can be reduced to 0.46 seconds, see Parallelizing the Output of Simplified Saaze.
Generating the pagefind index for 4048 files takes roughly 12 seconds:
/tmp/build: time pagefind -s . --exclude-selectors aside --exclude-selectors footer
Running Pagefind v1.0.4
Running from: "/tmp/build"
Source: ""
Output: "pagefind"
[Walking source directory]
Found 4048 files matching **/*.{html}
[Parsing files]
Did not find a data-pagefind-body element on the site.
↳ Indexing all <body> elements on the site.
[Reading languages]
Discovered 1 language: en
[Building search indexes]
Total:
Indexed 1 language
Indexed 4048 pages
Indexed 60783 words
Indexed 0 filters
Indexed 0 sorts
Finished in 11.412 seconds
real 11.59s
user 10.22s
sys 0
swapped 0
total space 0
Simplified Saaze allows to generate single files, i.e., only a single blog post can be processed by Simplified Saaze, see Single file generation. This can be used to significantly reduce the generation time.
7. HTML validation. The original site lemire.me contains more than 90 warnings and errors. See W3 Nu Html Checker.
The new site contains no errors or warnings.
8. Recap. Prof. Lemire is quite hesitant to move all static:
Several commenters pointed out that I could just drop WordPress and use something else. I fear that they greatly underestimate how hard this would be. Yes, I know about things like Hugo. My relatively simple home page is built using Hugo… and it took me nearly took weeks of hacking to get it to be how I want. Porting my blog to something like Hugo would be a major disruption, might imply moving to disqus (see point above) and so forth.
Porting Prof. Lemire's blog started in 12-Dec-2023 and was "finished" 14-Jan-2024 including porting all comments to HashOver. Of course, I did not work on this full-time.
There are still some open issues pending regarding conversion and functionality:
- Some pages have wrong formatting, e.g., there is bold printing in the converted site not present in the original.
- Left and right double quotes have been converted to HTML codes. Entering those is not very convenient. We clearly want SmartyPants.
- Five URLs were not correctly mapped as they contain special characters.
- E-mail subscriptions is absent. Although I doubt that there really 12,500 active subscribers. Though, there are probably a lot, which want to get noticed when something new arrives. One possible approach is to use Buttondown. For example, Buttondown can send e-mails based on RSS, see below screenshot from the "Settings" dialog in Buttondown.
Tool | Purpose | Technology |
---|---|---|
Simplified Saaze | Static site generator | PHP, C |
HashOver | Commenting system | PHP, XML/JSON/SQLite |
Pagefind | Static search | JavaScript, Rust, WebAssembly |