AI chatbot falls just short on accounting examby
An accountant ran the much-hyped AI tool ChatGPT through a sample first-stage ACA paper, and while the chatbot came up short, the fact it narrowly missed the pass mark should make the profession sit up and take note.
For those of us who’ve struggled to extract even the most basic information from chatbot gatekeepers stationed on the websites of our banks, delivery companies or utility providers, the thought that a distant cousin of theirs could nearly pass an Institute of Chartered Accountants in England and Wales (ICAEW) assurance exam seems a little implausible.
However, that’s just what happened when chartered accountant Stuart Cobbe decided to put AI-powered chatbot ChatGPT to the test.
“I’d heard the noise around ChatGPT and wanted to see where the cutting edge was,” Cobbe told AccountingWEB. “The established way of evaluating language models like this is to set a test, and the sample papers on the ICAEW website seemed as good as any.”
He spent just under an hour entering questions from a sample exam into the chatbot, then extrapolating its answers back into the online test. Overall, ChatGPT scored 42% – under the pass mark of 55% set by the accounting institute, but a reasonable attempt nonetheless.
A consultant in the use of artificial intelligence (AI) in accounting and audit, Cobbe previously worked as global head of analytics and industry insights at AI leader MindBridge, co-founded a cloud financial statements platform and worked in data and analytics at top-20 firm Moore Kingston Smith.
What is ChatGPT?
ChatGPT is part of a subset of artificial intelligence tools known as “generative AI”, which are able to generate responses based on prompts. In ChatGPT’s case, the prompts can be text or code, and it is able to both converse and answer questions posed by users. Like other “large language models”, ChatGPT is trained on a large collection of text collected from both the internet and, it is believed, from books.
The latest version of the chatbot (dubbed version 3.5), released in November 2022, caught the public imagination thanks to its ability to generate jokes, essays and even computer code from a short writing prompt. It surpassed a million users in just five days and flooded social media feeds with potential use cases, including accountants spotting how they could benefit from the tool – with examples ranging from client correspondence to Excel gruntwork, with one firm even “recruiting” it as a full-time member of staff.
The exam Cobbe selected for ChatGPT’s test was a sample assurance paper on ICAEW’s website. “The assurance paper worked the best because most of the questions were text,” said Cobbe. “Its brief attempt on the accounting paper led to some highly questionable maths.”
While the exam is multiple choice, the test-taker is often required to tick multiple boxes to demonstrate a depth of understanding. By and large, ChatGPT seemed to understand what it needed to do, successfully selecting responses out of the given options, although it also accompanied each answer with a long-winded explanation.
When it came to understanding the subject matter, Cobbe was also impressed. “In the question on rights and obligations [screengrabbed below], it clearly understood what the question meant by rights and obligations over an asset and identified how you’d gain assurance – and gave an explanation to back it up,” he said.
However, the chatbot clearly struggled with questions that required a more nuanced understanding, such as in the second rights and obligations question below. The chatbot incorrectly selected answers A and C, when the correct answers were B and D.
“For A in particular, which tests understanding of rights and obligations, it doesn’t quite seem to understand what the question means when it talks about ‘inspecting’ the asset,” said Cobbe. “It does not seem aware that we as auditors have a narrow definition of ‘inspection’ and the assurance that inspection of an asset provides.”
Cobbe flagged another example of the chatbot’s lack of accounting nuance in the way below, asking about specific accounting treatment for a type of asset.
“This is the type of question a technical team within a firm might receive. It broadly describes the accounting treatment correctly, but there are specific parts included in its response that are not really relevant and can be confusing to the reader. The paragraph references are also totally wrong. When asked for the accounting treatment of more complex areas, it clearly struggles and often returns the IFRS treatment when specifically asked for the FRS102 treatment.
Like having a fresh-faced junior who’s convinced they’re always right
While the machines may not be ready to rise up and take accountants’ jobs just yet, the exercise was an informative one for Cobbe, with the chatbot delivering the performance he expected from the media hype.
“I’m guessing we’ll see a large language model (LLM) capable of passing the first-stage assurance paper without any specific fine-tuning pretty soon,” he said. “The accounting papers are more difficult. I’m guessing it’s the lack of detailed training on technical accounting content, but also the combination of text, mathematics and layout present in financial statements make them hard to feed into LLM as they are currently designed and given the data they are currently being trained on.”
While there is limited information on the exact nature of the billions of words in study materials fed into the chatbot, in Cobbe’s view it’s obvious that accounting content has formed part of its reading list.
However, according to Cobbe, ChatGPT has a way to go before it adds an ACA to its email signature. “On this assurance topic, I’d compare ChapGPT to a very recent joiner at an accounting firm – someone in the first few weeks of their contract,” he said. “Unlike a new joiner, however, ChatGPT gives answers with an air of confidence even when it’s completely wrong. It’s not afraid to give a garbage answer and back it up with garbage. It’s like having a fresh-faced junior who’s always convinced they’re right, so users need to approach it with a degree of caution.”
Cobbe suggested that very soon, AI tools such as ChatGPT can play an excellent support role for accountants in a range of different ways – from reviewing correspondence to providing training materials for clients and staff. “My original post on LinkedIn was about ChatGPT failing, but really it was about how well it had done.”