The PDF versus HTML argument

Conceptual keyboard
istock_artemsam_aw
Share this content

Wendy Bradley investigates why the Government Digital Service (GDS) has suggested that gov.uk content should be published in HTML and not as PDF.

Why?

"People read differently on the web," says a gov.uk blog post in July entitled "why gov.uk content should be published in HTML and not PDF". The words link to this document advising how to write for the site, where the same words recur but this time linking to a 1997 study into how people used websites.

Really GDS? I'm sorry to be rude, but do you really want to cite a study from the last millennium as an authoritative source on how people use the internet today?

Back in 1997, I was using a dial-up connection to Compuserve and Geocities to create my first website, and PDFs were the complicated and difficult things people sent you if they didn't trust this newfangled technology.

Clarification

The ensuing storm on Twitter and AccountingWEB’s Any Answers led to a "clarification" by GDS that PDFs were not to be banned outright:

“To clarify, we are not suggesting there is no place for PDFs on gov.uk. There are some cases where a PDF might be required to meet the needs of the user. For example, when there’s a need for a static document to show what was said at a particular point in time.”

Where is the problem?

As a regular reader of HMRC's consultation documents I’ve noticed that they often say that the condoc is "not compatible with assistive technology", a lapse from best practice which it seems will be regulated out of existence later in the year when new accessibility standards become mandatory.

As far as I understand it, PDF documents can, if they are not properly prepared, be perceived by screen readers and other technology, as single images rather than text. If you are visually impaired and rely on software that reads out web pages to you, the whole page becomes invisible.

Potential solutions

There are different kinds of PDF documents: for assistive tech, a PDF needs to be readable as text (not as a single image) and “tagged” to show reading order if it has columns, tables etc. There is information readily available on the web on how to do this, although it is not aimed at the kind of writer who is generally tasked with producing civil service documentation.

Perhaps the impetus for the blog post was the relative ease of mandating HTML webpages compared to the relative difficulty of training a cadre of less-than-tech-savvy civil servants in how to produce accessible PDFs.

Why the controversy?

If HTML pages are better for users with disabilities, would it not be better to move towards HTML? This is where we come back to the out-of-date research that suggests people do not read web pages but scans them for keywords, and the resulting design of short pages with bullet-pointed lists.

This is a misunderstanding of why and how people use web pages today, particularly authoritative websites like gov.uk which has replaced HMSO as the repository of draft legislation, regulations, guidance and other reference material.

Process of seeking information

The gov.uk guidance imagines that an individual’s process of finding and absorbing information on the web should follow these steps:

  1. I have a question
  2. I can find the page with the answer easily – I can see it’s the right page from the search results listing
  3. I have understood the information
  4. I have my answer…

This process might be helpful if you have enough background knowledge to formulate a specific question, but what if you don't know what you don't know?

Example

Imagine an individual starting a new business. I spoke to a group of hopeful writers at a science fiction convention earlier this year who were eager to know what they needed to do once they sold their first story or novel.

I was able to tell them about the £1,000 trading income exemption and not to worry if they were just starting out and making a sale on top of their wages from their day job. I was able to signpost them to this web page – because I knew the allowance existed and the page must be there somewhere.

What I did not tell them was my frustration the night before my presentation, trying to find this page using search terms like "starting in business", "casual earnings", "first sale" (the signposting is, admittedly, better now). If you don't know enough to formulate the question, it is impossible to find the correct answer.

Experts and others

There are several kinds of users of the gov.uk website. Some will be members of the public wanting a simple answer to a simple question, eg: “How do I apply for a passport?”

Many people will be "expert" users, who know the background and context of a simple question but need to have an authoritative statement of the position at the time they give advice on a question.

Audit trail

Expert users may need to screen capture dozens of screens of HTML to provide an audit trail of where they did their research. It’s just not practical to take dozens of URLs and go back to check through the change statement for each one to find any updates that have changed the information since you gave the advice.

This is a misunderstanding of the nature of audit trail, and the suggestion that the National Archive "scraping" of government websites gives a full audit trail is, frankly, laughable. Have you ever tried searching the archive for a document you know was there because you used to give advice based on it? I have, and I still couldn't find it.

Read offline

A PDF is a useful tool for the user to read the material offline or to add to a client’s file (electronic or paper). A PDF doesn’t necessarily need to be printed out, but can also be stored on a laptop or tablet for reading on the commute, when a lack of wifi can prevent jumping from HTML page to page.

Conclusions

From the point of view of the expert user, PDF vs HTML is not solely a question of convenience vs accessibility. For the GDS, I presume it is not solely a question of compliance with accessibility legislation.

PDF and HTML both have a place on the gov.uk website, but the GDS really needs to conduct some more relevant research into how its customers actually use the site. Perhaps it should undertake a consultation on how GDS will meet the new accessibility standards and meet the needs of all of its users.

About Wendy Bradley

Wendy Bradley is a retired tax inspector, now working as a freelance journalist.

Replies

Please login or register to join the discussion.

avatar
14th Aug 2018 09:21

I use both and would be sad to see pdfs go. Sometimes they are useful to download particularly long detailed documents. Sometimes as allowed under Government copyright law they are an annex to conference notes and then a pdf is much easier to email to conference providers.

I also fear we are losing great and needed detail with some Government websites.

Thanks (8)
avatar
15th Aug 2018 09:43

This is a very personal view, but I cannot absorb information from reading on a screen; I have never taken to book readers, for example.
When needing to review government documents I will search for relevant material but then I nearly always print off the relevant section for further reading. I will also save key documents to client folders or my own CPD folders.
PDFs are the only easy way to do this. I am already frustrated by many VAT notices, for example, no longer being in PDF form.
Printing from HTML is random in my experience and produces only a mess.

Thanks (6)
avatar
15th Aug 2018 10:17

The article is great and hits the nail on the head. There is no one standard user of government information. I regularly store pdfs, print out the important passages that solve a client enquiry and annotate them to support the advice I give. if the document changes, I then have the original on file, to justify the advice I gave. Consumers are very different from professional users.

Thanks (5)
avatar
15th Aug 2018 10:40

Oh for goodness sake. This is the second accountingweb article I've seen on this, and again based on a misunderstanding - and misrepresentation - of the GDS advice.

The advice is about content *production*, not content *delivery*.

Accessible PDFs are not just about being "single images rather than text". There are a whole host of issues that make PDFs difficult to use - for disabled and non-disabled users - in various circumstances.

Even simple things like internally linking references and tables of contents. Very often I find myself in a long (sometimes government-produced) PDF scrolling through pages to find "see section 2" or "page 19" - which of course isn't actually page 19 of the PDF because the pages are numbered for print and the numbering doesn't include the cover plus an arbitrary number of introductory front pages which vary from document to document (or in some of the worst examples even restart for each section of the PDF - so the only way to get to Section 5, Page 15 is to physically scroll till your hand gets sore).

And scrolling is a nightmare : trying to follow the text as it jumps from the bottom of one column to the top of the next on a screen that isn't an A4 piece of paper, moving from side to side and trying not to get lost. It gets worse when the document was designed as a leaflet or booklet, with content beautifully laid out across the full width of each spread, but now arbitrarily split at the edges of bits of paper most people will never have seen, with each panel stacked one below the other.

Or you google a term and get a promising excerpt. So you click through to find it's somewhere in the unending pages of the PDF that's just opened, and you have to try to remember the exact wording of the search result you saw to be able to use PDF's much-less-intelligent "find in document" tool, because PDFs always open on page 1 and there's no way of deep-linking even to the page number of the result, much less the paragraph.

So yes, producing properly accessible, user-friendly PDFs is very hard work, generally requires specialist software and a fair bit of training. Meaning most of the time it doesn't happen. And even when it does, they're fundamentally broken for some usecases because - like the research you mention - they're a product of a different age, created only as a way to get accurate artwork from a designer to a printing press.

If you *produce* content as a PDF - even an accessible one, certainly one beautifully designed for printing out - that information is locked away. It is virtually impossible for anyone to reliably convert that to another format, HTML or anything else, that might better suit their usecase. And so either you only publish the PDF, or someone attempts to manually publish HTML excerpts which inevitably end up getting out of step with the actual text of the detailed guidance/handbook/whatever.

But, great news! If you *produce* content as HTML, that can easily, reliably, automatically, and rapidly be converted into an accessible PDF complete with hyperlinks, tagged text and all the rest.

That could be a PDF of a single page (which by the way most browsers can produce, so there's no need to be "capturing dozens of screens of HTML") or a single PDF containing all the pages from a whole section of the site.

There's plenty of software that e.g. HMRC could use to generate an accessible PDF version of sections of their guidance for people to download - even before GDS adds such a tool directly into gov.uk, which their blog specifically states is part of their future plan.

Indeed, if content is *produced* as HTML, it can be reliably and automatically converted into virtually any format - and that can be done by the publisher, the end user or by third parties.

*Producing* content as HTML allows end-users to *consume* it however best suits them, which is why GDS are encouraging content producers to do just that unless there's a specific reason that particular content actually needs to be designed and planned for print-first.

Thanks (4)
to andyscotland
15th Aug 2018 11:56

Interesting: do you have links to any sites where that’s actually happening? Sounds good to me - but then so did the “paperless office”...

Thanks (1)
avatar
15th Aug 2018 11:00

Really? Here's a good example of how laughable this is and how ignorant GDS is when it comes to the psychology when giving advice. Gave a client a print out from webpage and a printed document from same webpage. Print out ignored. Document read. I have even given "idiot's guide in ultra simplistic language" verses a complex .gov document and the document is preferred.

Good grief. Heaven help us if this is the result of the thought processes in our civil service. (Let alone Wendy's very comments!)

Thanks (2)
to reconynge
15th Aug 2018 17:51

What is the difference between "a print out from webpage" and "a printed document from same webpage"? Do you mean that one is HTML and the other PDF, or do you mean that one was shown on screen and one printed onto paper, or do you mean something else? If we are not specific in our terminology, how can we persuade HMRC to use the options we prefer?

Thanks (0)
avatar
15th Aug 2018 11:04

I've had several occasions where I have had to challenge HMRC's changes in policy. HTML versions mean that I can't easily access the previous versions. It also makes it difficult to print just parts of guidance if you need to send it to a client (or that may be my lack of IT knowledge!)

Thanks (3)
avatar
15th Aug 2018 12:33

As a retired user, be still using HMRC sites for trusteeship responsibilities and my own, I'm not sure how useful my comment will be, but here goes: HMRC used to have their own website. It was possible to reach an item in the published manuals , or statutes, but searching it. As Wendy says, you have to know what you are looking for. But assuming you do , search for an item now and you are taken through everything the Government does, probate, planning permission etc. This is not without its frustration factor. If they are hacking it about anyway, can this be addressed?
With most aspects of tax, it's often quicker just to google it.

Thanks (1)
avatar
to Ken of Chester le Street
15th Aug 2018 22:20

Yes, it was a Coalition decision to stop individual department websites and it is has been awful for lawyers too because instead of going to just that part of Government eg competition law or public procurement you have almost unsearchable huge morass of stuff. It was a really bad move.

Thanks (1)
avatar
to Ken of Chester le Street
16th Aug 2018 10:13

I hardly think this is relevant to the topic but, nonetheless, I have to agree about the "quicker to Google it" point. Mind you the search facility on the old HMRC web site was pretty hopeless too. It never came up with what I wanted.

Thanks (0)
avatar
15th Aug 2018 13:03

We send most of our documents to clients as PDFs, so that we know exactly what we have sent, and they can't change them!

Surely, if you wanted to keep an HTML web page you could either print to PDF or save to PDF?

Thanks (2)
avatar
15th Aug 2018 14:21

Agree with Wendy - reading long complicated stuff is more easily undertaken via a PDF offline and also has the benefit that you are not paying for the access (ONS stats indicate that the majority access the internet on a smartphone or tablet). Also printing a web page is hit and miss as it depends on your printer setup , usually resulting in lots of dead space and the risk of loss of words at the edges. Simplest solution is to provide pdf download as an option.

Thanks (2)
avatar
15th Aug 2018 15:28

I really don't understand what the article is proposing...

Almost every website is displayed in HTML. It is the standard method of navigating and displaying information on the internet. PDF's have their purpose for storing reports, long data, etc which you need to transfer between parties but it is not accessible or flexible. If you open a PDF in a mobile it can often be difficult to read the text - plus the file size is huge compared to HTML.

I'm assuming the writer isn't suggesting the Government switch to PDFs just to ensure the file doesn't change? Consider they can replace PDFs very easily.

Need to save information or vital text for offline use? Print to PDF. Otherwise, it has no place for displaying information on the web.

We honestly can't start talking to GDC as if we are web development experts, least we risk them pretending to be tax experts (the content is produced by HMRC before anyone suggests they already think they are).

Thanks (2)
to Harrison88
17th Aug 2018 16:46

“I really don't understand what the article is proposing...” I was suggesting the gov.uk developers consult with their actual users and producers, and quit relying on/linking to twenty year old research.

Thanks (0)
avatar
15th Aug 2018 20:10

Maybe they are talking about those awful PDF forms they make you fill in that never work? If so they actually read one of my feedback rants!

Thanks (0)
avatar
14th Sep 2018 10:50

So its PDF versus HTML, or is it what about all of the others XPS, DIF, the list goes on.

Thanks (0)