All in the <head>

Ideas of March

143 days ago

In between the first and the second time I re-pledged my commitment to the medium of blogging, I posted just three times. This year, it’s four times, which represents a strong upward trend. Let’s say it represents a strong upward trend.

Last year, I wrote about the permanence of ideas, and the trend towards short-form fire-and-forget tweets serving as the only written expression of important thoughts and ideas. How 140 characters can so vastly over-distill an expression that perhaps all that is left is a bitter syrupy remnant of an otherwise complex and nuanced thought. Worse still, the distillation never occurs, the idea overflows and escapes leaving nothing but a curious smell and a slight unease around naked flames.

This year, my thoughts are turned to something much more fundamental. Chris writes about the shutdown of Google Reader and with it, the importance of not only capturing and expressing your thoughts and ideas, but continuing to own the means by which they are published. Ever since the halcyon days of Web 2.0, we’ve been netting our butterflies and pinning them to someone else’s board. The more time that passes, the more we contribute and the more we become invested in platforms that are becoming less and less relevant to current market conditions and trends.

Will it end well? It will not.

If content is important to you, keep it close. If your content is important to others, keep it close and well backed up. Hope that what you’ve created never has to die. Make sure that if something has to die, it’s you that makes that decision. Own your own data, friends, and keep it safe.

Well, this has been weird.

- Drew McLellan

Comments

Comments [3]

Follow me on Twitter

Don't Parse Markdown at Runtime

214 days ago

I’m really pleased to see the popularity of Markdown growing over the last few years. Helped, no doubt, by its adoption by major forces in the developer world like github and StackOverflow; when developers like to use something, they put it in their own projects, and so it grows. I’ve always personally preferred Textile over Markdown, but either way I’m of the opinion that a neutral, simple text-based language that can be simply transformed into any number of other formats is the most responsible way to author and store content.

We have both Textile and Markdown available in Perch in preference to HTML-based WYSIWYG editors, and it’s really positive to see other content management systems taking the same approach.

From a developer point of view, using either of these languages is pretty straightforward. The user inputs the content in e.g. Markdown, and you then store that directly in Markdown format in order to facilitate later editing. Obviously you can’t just output Markdown to the browser, so at some point that needs to be converted into HTML. The question that is sometimes debated is when this should happen.

If you’ve ever looked at the source code for a parser of this nature, it should be clear that transcoding from text to HTML is a fair amount of work. The PHP version of Markdown is about 1500 lines of mostly regular expressions and string manipulation. What other single component of your application is comparable?

I’m always of the opinion that if the outcome of a task is known then it shouldn’t be performed more than once. For a given Markdown input, we know the output will always be the same, so in my own applications I transform the text to HTML once and store it in the database alongside the original. That just seems like the smart thing to do. However, I see lots of CMSs these days (especially those purporting to be ‘lightweight’) that parse Markdown at runtime and don’t appear to suffer from it.

But which is better, parsing Markdown at runtime, or parsing at edit time and retrieving? There’s only one way to find out…

FIGHT!

Ok, perhaps not a fight, but I thought it would be interesting to run some highly unscientific, finger-in-the-air benchmarks to get an idea of whether parsing Markdown really does impact page performance compared to fetching HTML from a file or database. Is it really that slow?

Using the PHP version of Markdown, I took the jQuery github README.md file as an example document. I figured it wasn’t too long or short, contained a few different features of the language, and was pretty much a typical example.

My methodology was simply to write a PHP script to perform the task being tested, and then hit it with apachebench a few times to get the number of requests per second. In unscientific conditions, I expected my results to be useful only for comparison – the conditions weren’t perfect, but they were consistent across tests.

In the most basic terms, measuring requests per second tells you how many visitors your site can support at once. The faster the code, the higher the number, the better.

Test 1: Runtime parsing

Below is the script I used. Pretty much no-nonsense, reading in the source Markdown file, instantiating the parser and parsing the text.

<?php
    require('markdown.php');
    $text = file_get_contents('jquery.md');
    $Markdown_Parser = new Markdown_Parser;
    $Markdown_Parser->transform($text);		
    unset($Markdown_Parser);
 ?>

I blasted this with apachebench for 10,000 requests with a concurrency of 100.

Result: around 155 requests per second.

Test 2: Retrieving HTML from a database

I created a very simple database with one table containing one row. I pasted in the HTML result of the parsed Markdown (created using the same method as above). I then took some boilerplate PHP PDO database connection code from the PHP manual.

<?php
    $dbh = new PDO(
        'mysql:host=localhost;dbname=markdown-test',
        'username', 'password');
    foreach($dbh->query('SELECT html 
        FROM content WHERE id=1') as $row) {
        $text = $row['html'];
    }
    $dbh = null;
?>

I restarted the server, and then hit this script with the same ab settings.

Result: around 3,575 requests per second.

Test 3: Retrieving HTML from a file

For comparison, I thought it would be interesting to look at a file-based approach. For this text, I parsed the Markdown on the first attempt, and then reused it for subsequent runs. A very basic form of runtime parsing and caching, if you will.

<?php
    if (file_exists('jquery.html')) {
        $html = file_get_contents('jquery.html');
    }else{
        require('markdown.php');
        $text = file_get_contents('jquery.md');
        $Markdown_Parser = new Markdown_Parser;
        $html = $Markdown_Parser->transform($text);		
        file_put_contents('jquery.html', $html);
        unset($Markdown_Parser);
    }
?>

In theory, this should be very fast, as it’s basically just stating a file then fetching it. It hit it with the same settings again.

Result: around 12,425 requests per second.

Conclusion

It would be improper to raise a formal conclusion from such rough tests, but I think we can get an idea of the overall work involved with each method, and the numbers tally with common sense.

Parsing Markdown is slow. It can be around 25 times slower than fetching pre-transformed HTML from the database. Considering you’re likely already fetching your Markdown text from the database, you’re effectively doing the work of Test 2 and then Test 1 on top.

It would be interesting to compare the third test with caching the output to something like Redis. Depending on your traffic profile, that could be quite an effective approach if you really didn’t want to store the HTML permanently, although I’m not sure why that would be an issue. It would also be interesting to compare these rough results with some properly conducted ones, if anyone’s set up to do those and has the time.

All applications and situations are different, and therefore everyone has their own considerations and allowances to make. Operating and different scales, with different platforms can affect your choices. Perhaps you have CPU in abundance, but are bottlenecking on IO.

However, for the typical scenario of a basic content managed website and for any given web hosting, parsing Markdown at runtime can vastly reduce the number of visitors your site can support at once. It could make the difference between surviving a Fireballing and not. For my own work, I will continue to parse at edit time and store HTML in the database.

- Drew McLellan

Comments

Comments [2]

Follow me on Twitter

24 ways and Perch 2.1

249 days ago

When I launched 24 ways in 2005, it was pretty much a last-minute project. To get the site live quickly, I just reached for the blogging system Textpattern, as I was familiar with it. Textpattern did a good job managing articles, comments, RSS feeds and so on, but for one reason or another development stagnated.

Textpattern’s flexibility enabled me to implement some custom features as plugins (like the day-by-day navigation down the side of the article pages) but basic features like comment spam detection were causing problems. Replacing a CMS isn’t usually a fun job, so I carried on with Textpattern for longer than perhaps I should have.

Fast forward to 2012, and we now have our own CMS, Perch. I’m currently working Perch 2.1, and so it made sense to rebuild 24 ways using it and at the same time test the new features I’ve been working on.

Rebuilding the site with Perch took about a day, and migrating the content took another day on top of that.

Improving Comments

The design hasn’t changed, but we’ve changed the comments functionality a little. Comments can be a real challenge – very often they don’t add anything of value to a post. We’ve all see cases of people rushing to post the first comment, just posting something useless, trolling or getting into pointless arguments about things only tangentally related to the topic of the post. I’d thought about removing comments from the site all together.

We do get lots of useful comments, but they can get lost in the noise. To try and combat this, I set about creating a system which I hope will surface the good comments, and bury the less useful ones. I’ve done that by adding a simple voting system (a helpful or unhelpful vote on each comment) with the list sorted from most helpful to least.

This has the obvious effect of putting the most helpful comments at the top, but sorting comments by something other than time has a useful secondary effect. ‘First’ comments are now no longer valid – what appears at the top is the ‘best’ comment, not the first. The fact that the comments are not sorted by time makes it hard to have an argument with another commenter, which helps solve another problem.

Having the site built in Perch has enabled me to immediately implement those improvements to comments, and to implement Akismet spam filtering. Using the Perch API, I built an app (most of it extracted directly from the existing Blog app) to handle comments on any sort of content. I’ll be packaging it up and making it available on grabaperch.com once 2.1 is done.

Eating my own dog food

Despite leading development on the core of Perch, I don’t spend a lot of time building sites using it. Rebuilding 24 ways alongside Perch 2.1 has proved to be incredibly useful in both finding bugs in my new code and in identifying features that are needed.

Querying across multiple pages

I’ve implemented each year as a page containing a region with 24 items. This meant that wherever I needed to display a list of articles from multiple years (such as the author detail page) I would need to be able to filter articles across multiple pages. So one big improvement in 2.1 is that the page option in perch_content_custom() now will accept a list of pages, or even a wildcard string. You could, for instance, use a value of /products/* to display content from any product pages a client had dynamically added. That will be useful.

Dataselect improvements

The 24 ways Authors page is a region containing multiple ‘author’ items. An article is then associated with an author by using a dataselect to list the authors. As I needed to display the author’s first and last name in the select box as labels, the dataselect label options can now take multiple field IDs, which are concatenated to form the label.

Increasing performance

When displaying an individual article, I needed to get the article title to use in the HTML title tag. That’s easy enough, using the skip-template option, but then I also needed the templated HTML output for the rest of the page. Needing to query for the data twice seamed like the sort of thing a lesser CMS would do, so I added a return-html option for using alongside skip-template. This gives you the usual raw data output, but then also returns the templated HTML for use without needing to re-query.

Multiple filters

One thing we knew we wanted to add to 2.1 was the ability to filter a region by multiple fields. 24 ways helped test this, as we need to do things like list all articles that are by a given author and are set to be live on the site. The multiple filters can be AND or OR filters and are really quite powerful. They enable you to do things like filter a list of properties that are houses, with two bedrooms or more, cost more than £100,000 and less than £500,000, for example.

Images

The basic image resizing in Perch has been the same for a couple of years, so I thought it was time for some improvements. The first thing I added was image sharpening. Scaling images down tends to make them a bit softer, which is usually undesirable. A new sharpen attribute on image tags lets you set a value from zero to 10 for the amount of sharpening you want to apply. It defaults to 4, which is usually about right to correct the natural softening that occurs, and you can tweak that up or down or turn it off.

The other big feature for images is a density template tag attribute. This is for producing sites to work well with hiDPI screens like Apple retina displays. Density defaults to 1. If you set it to 2 and set a resize width of, say, 200 pixels, Perch will actually scale the image to 400 pixels, but then display it at 200 pixels. The density is doubled, and it all happens automatically. Of course, it doesn’t need to be 2, you can set the density to 1.4 or whatever value makes sense to your project.

This change makes Perch ready and able to serve any of the proposed responsive image <picture> or srcset solutions.

Almost there

Perch 2.1 isn’t done yet, as there are more features and improvements we need to add. We’ll be announcing what those are once they’re ready. 24 ways is running a beta of 2.1 and the new Comments app, and it should be available soon. It’s shaping up to be a really useful release.

- Drew McLellan

Comments

Comments [4]

Follow me on Twitter

Ideas of March

Comments

Tags

Don't Parse Markdown at Runtime

FIGHT!

Test 1: Runtime parsing

Test 2: Retrieving HTML from a database

Test 3: Retrieving HTML from a file

Conclusion

Comments

Tags

24 ways and Perch 2.1

Improving Comments

Eating my own dog food

Querying across multiple pages

Dataselect improvements

Increasing performance

Multiple filters

Images

Almost there

Comments

Tags

Photographs

Work With Me

Follow me

Affiliation

I made

About Drew McLellan

Recommended

Recent Articles

Popular Articles

About This Site