The PDF versus HTML argument
Wendy Bradley investigates why the Government Digital Service (GDS) has suggested that gov.uk content should be published in HTML and not as PDF.
"People read differently on the web," says a gov.uk blog post in July entitled "why gov.uk content should be published in HTML and not PDF". The words link to this document advising how to write for the site, where the same words recur but this time linking to a 1997 study into how people used websites.
Really GDS? I'm sorry to be rude, but do you really want to cite a study from the last millennium as an authoritative source on how people use the internet today?
Back in 1997, I was using a dial-up connection to Compuserve and Geocities to create my first website, and PDFs were the complicated and difficult things people sent you if they didn't trust this newfangled technology.
The ensuing storm on Twitter and AccountingWEB’s Any Answers led to a "clarification" by GDS that PDFs were not to be banned outright:
“To clarify, we are not suggesting there is no place for PDFs on gov.uk. There are some cases where a PDF might be required to meet the needs of the user. For example, when there’s a need for a static document to show what was said at a particular point in time.”
Where is the problem?
As a regular reader of HMRC's consultation documents I’ve noticed that they often say that the condoc is "not compatible with assistive technology", a lapse from best practice which it seems will be regulated out of existence later in the year when new accessibility standards become mandatory.
As far as I understand it, PDF documents can, if they are not properly prepared, be perceived by screen readers and other technology, as single images rather than text. If you are visually impaired and rely on software that reads out web pages to you, the whole page becomes invisible.
There are different kinds of PDF documents: for assistive tech, a PDF needs to be readable as text (not as a single image) and “tagged” to show reading order if it has columns, tables etc. There is information readily available on the web on how to do this, although it is not aimed at the kind of writer who is generally tasked with producing civil service documentation.
Perhaps the impetus for the blog post was the relative ease of mandating HTML webpages compared to the relative difficulty of training a cadre of less-than-tech-savvy civil servants in how to produce accessible PDFs.
Why the controversy?
If HTML pages are better for users with disabilities, would it not be better to move towards HTML? This is where we come back to the out-of-date research that suggests people do not read web pages but scans them for keywords, and the resulting design of short pages with bullet-pointed lists.
This is a misunderstanding of why and how people use web pages today, particularly authoritative websites like gov.uk which has replaced HMSO as the repository of draft legislation, regulations, guidance and other reference material.
Process of seeking information
The gov.uk guidance imagines that an individual’s process of finding and absorbing information on the web should follow these steps:
- I have a question
- I can find the page with the answer easily – I can see it’s the right page from the search results listing
- I have understood the information
- I have my answer…
This process might be helpful if you have enough background knowledge to formulate a specific question, but what if you don't know what you don't know?
Imagine an individual starting a new business. I spoke to a group of hopeful writers at a science fiction convention earlier this year who were eager to know what they needed to do once they sold their first story or novel.
I was able to tell them about the £1,000 trading income exemption and not to worry if they were just starting out and making a sale on top of their wages from their day job. I was able to signpost them to this web page – because I knew the allowance existed and the page must be there somewhere.
What I did not tell them was my frustration the night before my presentation, trying to find this page using search terms like "starting in business", "casual earnings", "first sale" (the signposting is, admittedly, better now). If you don't know enough to formulate the question, it is impossible to find the correct answer.
Experts and others
There are several kinds of users of the gov.uk website. Some will be members of the public wanting a simple answer to a simple question, eg: “How do I apply for a passport?”
Many people will be "expert" users, who know the background and context of a simple question but need to have an authoritative statement of the position at the time they give advice on a question.
Expert users may need to screen capture dozens of screens of HTML to provide an audit trail of where they did their research. It’s just not practical to take dozens of URLs and go back to check through the change statement for each one to find any updates that have changed the information since you gave the advice.
This is a misunderstanding of the nature of audit trail, and the suggestion that the National Archive "scraping" of government websites gives a full audit trail is, frankly, laughable. Have you ever tried searching the archive for a document you know was there because you used to give advice based on it? I have, and I still couldn't find it.
A PDF is a useful tool for the user to read the material offline or to add to a client’s file (electronic or paper). A PDF doesn’t necessarily need to be printed out, but can also be stored on a laptop or tablet for reading on the commute, when a lack of wifi can prevent jumping from HTML page to page.
From the point of view of the expert user, PDF vs HTML is not solely a question of convenience vs accessibility. For the GDS, I presume it is not solely a question of compliance with accessibility legislation.
PDF and HTML both have a place on the gov.uk website, but the GDS really needs to conduct some more relevant research into how its customers actually use the site. Perhaps it should undertake a consultation on how GDS will meet the new accessibility standards and meet the needs of all of its users.