When reading this 800-word Guardian story—about half a page of text long—our web browser loads the equivalent of 55 pages of HTML code—almost half a million characters. To be precise: an article of 757 words (4667 characters and spaces), requires 485,527 characters of code:
Put another way, “useful” text (the human-readable article) weighs less than one percent (0.96%) of the underlying browser code. The rest consists of links (more than 600) and scripts of all types (120 references), related to trackers, advertising objects, analytics, etc.
As I wrote in this previous Monday Note, the sole Chartbeat analytics trackers requires 29,000 characters of code!
It would be useful to know how the amount of code used on their website correlates with the Guardian’s abysmal financial losses. (Sad humor, I’m a big fan of The Guardian.)
The Guardian is a kind of extreme when it comes to bloated HTML. In due fairness, this cataract of code loads very fast on a normal connection. The Guardian technical team was also the first one to devise a solid implementation of Google’s new Accelerated Mobile Page (AMP) format. In doing so, it eliminated more than 80% of the original code, making it blazingly fast on a mobile device.
As an admittedly biased reference point, I took one of the first texts, the “World Wide Web Summary,” written in HTML by its inventor Tim Berners-Lee. Published in 1991, it probably is one of the purest, most barebones forms of hypertext markup language: 4,200 characters of readable text for less than 4,600 characters of code. That’s a 90% usefulness rate, as shown in the table below (you can also refer to my original Google Sheet here to get precise numbers, stories, URLs, and formulae).
This selection is arbitrary but nonetheless interesting to look at. Aside from the original Berners-Lee text, it includes, on line two, a Washington Post article coded in the experimental Progressive Web App format (more on this in a moment), a classic HTML Politico story, a piece from the official AMP Blog (hopefully coded in AMP), a past Monday Note column published on Medium (which tweaked the W3 standards), a NYT piece, The Guardian piece coded in AMP, a short piece from the MailOnline embedded in a 40-scrolls webpage (remarkably optimized), a WaPo story, and the original Guardian one.
- If we cut the extreme cases out, the “pure” Berners-Lee and the heavy Guardian one, we see a ratio ranging from 4.55% to 9% of readable text over underlying HTML. The least optimized is the Washington Post that doesn’t mind a large HTML file for reading on a desktop as it offers alternative formats. The lightest (relatively speaking) is Politico, which maintains a simple page structure.
- The big surprise (at least for me) comes from the Progressive Web App implemented by the Washington Post. The Plain HTML page offers roughly the same content as the PWA version, but with a huge gain in HTML size.
Google is just starting to promote the PWA on a large scale, and the tools are already available. While it has been already implemented by the giant Indian retailer Flipkart, the Post is the first news publisher to experiment with it (the pages are still a little buggy and don’t support ads yet). Because it supports Push notifications and other features until now reserved to native apps, PWA has great potential for publishers—as long as it doesn’t end up in Google’s graveyard of innovations lost in some murky internal quagmire…).
Regardless of promising innovations, the war to reduce HTML’s bloat remains to be won. Many forces—advertising technologies, user profiling, endless analytics, trackers—might conspire to eat all the benefits promised by the proposed new standard.
This post originally appeared at Monday Note.