Save content
Have you found this content useful? Use the button above to save it to your profile.
Cartoon flat style drawing robot kicks door closed with leg
istock_onetime_AW

AI chatbot falls just short on accounting exam

by

An accountant ran the much-hyped AI tool ChatGPT through a sample first-stage ACA paper, and while the chatbot came up short, the fact it narrowly missed the pass mark should make the profession sit up and take note.

10th Jan 2023
Save content
Have you found this content useful? Use the button above to save it to your profile.

For those of us who’ve struggled to extract even the most basic information from chatbot gatekeepers stationed on the websites of our banks, delivery companies or utility providers, the thought that a distant cousin of theirs could nearly pass an Institute of Chartered Accountants in England and Wales (ICAEW) assurance exam seems a little implausible.

However, that’s just what happened when chartered accountant Stuart Cobbe decided to put AI-powered chatbot ChatGPT to the test. 

“I’d heard the noise around ChatGPT and wanted to see where the cutting edge was,” Cobbe told AccountingWEB. “The established way of evaluating language models like this is to set a test, and the sample papers on the ICAEW website seemed as good as any.”

He spent just under an hour entering questions from a sample exam into the chatbot, then extrapolating its answers back into the online test. Overall, ChatGPT scored 42% – under the pass mark of 55% set by the accounting institute, but a reasonable attempt nonetheless.

ChatGPT pass grade

A consultant in the use of artificial intelligence (AI) in accounting and audit, Cobbe previously worked as global head of analytics and industry insights at AI leader MindBridge, co-founded a cloud financial statements platform and worked in data and analytics at top-20 firm Moore Kingston Smith.

 

What is ChatGPT?

ChatGPT is part of a subset of artificial intelligence tools known as “generative AI”, which are able to generate responses based on prompts. In ChatGPT’s case, the prompts can be text or code, and it is able to both converse and answer questions posed by users. Like other “large language models”, ChatGPT is trained on a large collection of text collected from both the internet and, it is believed, from books.

The latest version of the chatbot (dubbed version 3.5), released in November 2022, caught the public imagination thanks to its ability to generate jokes, essays and even computer code from a short writing prompt. It surpassed a million users in just five days and flooded social media feeds with potential use cases, including accountants spotting how they could benefit from the tool – with examples ranging from client correspondence to Excel gruntwork, with one firm even “recruiting” it as a full-time member of staff.

The experiment

The exam Cobbe selected for ChatGPT’s test was a sample assurance paper on ICAEW’s website. “The assurance paper worked the best because most of the questions were text,” said Cobbe. “Its brief attempt on the accounting paper led to some highly questionable maths.”

While the exam is multiple choice, the test-taker is often required to tick multiple boxes to demonstrate a depth of understanding. By and large, ChatGPT seemed to understand what it needed to do, successfully selecting responses out of the given options, although it also accompanied each answer with a long-winded explanation.

When it came to understanding the subject matter, Cobbe was also impressed. “In the question on rights and obligations [screengrabbed below], it clearly understood what the question meant by rights and obligations over an asset and identified how you’d gain assurance – and gave an explanation to back it up,” he said.

ChatGPT correct answer

However, the chatbot clearly struggled with questions that required a more nuanced understanding, such as in the second rights and obligations question below. The chatbot incorrectly selected answers A and C, when the correct answers were B and D. 

ChatGPT lack of nuance

“For A in particular, which tests understanding of rights and obligations, it doesn’t quite seem to understand what the question means when it talks about ‘inspecting’ the asset,” said Cobbe. “It does not seem aware that we as auditors have a narrow definition of ‘inspection’ and the assurance that inspection of an asset provides.”

Cobbe flagged another example of the chatbot’s lack of accounting nuance in the way below, asking about specific accounting treatment for a type of asset. 

ChatGPT incorrect answer 2

“This is the type of question a technical team within a firm might receive. It broadly describes the accounting treatment correctly, but there are specific parts included in its response that are not really relevant and can be confusing to the reader. The paragraph references are also totally wrong. When asked for the accounting treatment of more complex areas, it clearly struggles and often returns the IFRS treatment when specifically asked for the FRS102 treatment.

Like having a fresh-faced junior who’s convinced they’re always right

While the machines may not be ready to rise up and take accountants’ jobs just yet, the exercise was an informative one for Cobbe, with the chatbot delivering the performance he expected from the media hype.

“I’m guessing we’ll see a large language model (LLM) capable of passing the first-stage assurance paper without any specific fine-tuning pretty soon,” he said. “The accounting papers are more difficult. I’m guessing it’s the lack of detailed training on technical accounting content, but also the combination of text, mathematics and layout present in financial statements make them hard to feed into LLM as they are currently designed and given the data they are currently being trained on.”

While there is limited information on the exact nature of the billions of words in study materials fed into the chatbot, in Cobbe’s view it’s obvious that accounting content has formed part of its reading list.

However, according to Cobbe, ChatGPT has a way to go before it adds an ACA to its email signature. “On this assurance topic, I’d compare ChapGPT to a very recent joiner at an accounting firm – someone in the first few weeks of their contract,” he said. “Unlike a new joiner, however, ChatGPT gives answers with an air of confidence even when it’s completely wrong. It’s not afraid to give a garbage answer and back it up with garbage. It’s like having a fresh-faced junior who’s always convinced they’re right, so users need to approach it with a degree of caution.”

Cobbe suggested that very soon, AI tools such as ChatGPT can play an excellent support role for accountants in a range of different ways – from reviewing correspondence to providing training materials for clients and staff. “My original post on LinkedIn was about ChatGPT failing, but really it was about how well it had done.”

Replies (30)

Please login or register to join the discussion.

avatar
By Hugo Fair
11th Jan 2023 01:01

“.. however, ChatGPT gives answers with an air of confidence even when it’s completely wrong.
It’s not afraid to give a garbage answer and back it up with garbage. It’s like having a fresh-faced junior who’s always convinced they’re right, so users need to approach it with a degree of caution.”

That is the only valid takeaway here ... the rest is misunderstood interpretation by those who should know better (I'm referring to Stuart Cobbe, not you Tom)!

ChatGPT does 3 things well - it can: parse a question to work out its search parameters, perform those searches remarkably quickly, and extract the results as grammatically correct text.

But it has no 'understanding' of the original question or indeed of the answer it creates (hence the quote I've repeated at the start of my reply) - and so, like a 5 year-old, will quickly fall back on self-justification (including famously making up false citations) that takes the answer ever further from the truth initially sought.

So my conclusion of the test/results in this experiment are somewhat different to those published by Cobbe.
It's not that ChatGPT has almost caught up with trained accountants ... more that the wrong type of questions are being asked in the exam. They are designed for ease of marking (which means they are geared towards the strengths of ChatGPT - as set out above - rather than testing understanding of the topic that should be expected of a trained accountant).

Thanks (9)
By ireallyshouldknowthisbut
11th Jan 2023 09:05

I like the summary as a puffed up junior who is always right.

I had a play with it last week, and it has a dangerous veneer of confidence, with little behind it.

It was like talking to a 'nice but dim' public school educated person with bags of confidence who speaks very nicely, and writes the perfect sentence, but has no ability, or a politician trading on bravado over substance.

Good tax people know what they don't know. Chat Bot blusters through.

Still I imagine ChatBot will do better than many of the posters in any answers, it has at least read a bit about tax.

Thanks (1)
Replying to ireallyshouldknowthisbut:
By SteveHa
11th Jan 2023 10:07

ireallyshouldknowthisbut wrote:

Still I imagine ChatBot will do better than many of the posters in any answers, it has at least read a bit about tax.

And it read the questions.

Thanks (2)
avatar
By Postingcomments
11th Jan 2023 11:12

I hate to think what ChatBot would return if fed a diet of Any Answers. GIGO/CICO principle etc.

I'm tempted to feed it all of Justin's contributions then ask it to analyse a Tribunal decision or, indeed anything.............the judge is an idiot, the law is wrong, here is a deliberate misreading of some leg or a paragraph in the decision...........something about Rangers...........

The Bot wouldn't be so much be the keen confident junior, but would have the arrogant confidence of someone who thinks they are always right.

Thanks (2)
Replying to Postingcomments:
avatar
By jonbryce
16th Jan 2023 14:30

It probably was fed a diet of such material.

Also, would it understand that answers I gave maybe 15 years ago may have been correct at the time, but almost certainly aren't correct now?

Thanks (0)
Nichola Ross Martin
By Nichola Ross Martin
11th Jan 2023 11:54

The risk to professional firms is that staff will use ChatGPT to write advice for clients etc. and as the bot is so convincing in its responses that staff may easily fail to check what splurges out is correct. Love your comments on 'fresh-faced juniors'. Firms should have a policy for staff about its use (it also wastes hours of anyone's time). It's noted that Stackoverflow (a leading developers forum) has banned its answers from its forum although its quite clear that ChatGPT has been 'trained' by the answers from that very forum. There's no reason why you can't train it from, say this one too. If you have ever used a boilerplate clause in a contract, its likely to have come from old equivalent to this bot (there is various fast drafting software out there or failing that template contracts and clauses) so in that respect this can be yet another useful tool when you are drafting contracts etc. Would be most useful for say your firms LOEs as it can replace the need for software.
I have fed it several coding problems as well as various tax queries and it's answers were not good. Its not yet been trained on tax. It's only a matter of time though.

Thanks (1)
Replying to Nichola Ross Martin:
avatar
By NotAnAccountant2
11th Jan 2023 13:04

Nichola Ross Martin wrote:

The risk to professional firms is that staff will use ChatGPT to write advice for clients etc. and as the bot is so convincing in its responses that staff may easily fail to check what splurges out is correct.

You might get the opposite problem too, HMRC staff rely on chatGPT "advice" when doing investigations.

This might become a huge problem for professionals in any job, medicine, computing, accountancy, civil engineering... The layman being able to very quickly produce very convincing arguments and the professional having to spend large amounts of time rebutting them.

Quote:

when was --force-with-lease added to git push?

The --force-with-lease option for the git push command was added in git version 2.18.0, released on 2018-03-28.

But:
--force-with-lease" was added in git 1.8.5 [https://github.com/git/git/blob/master/Documentation/RelNotes/1.8.5.txt], in 2013.

ChatGPT can't even get it right with a bit of help...

Quote:

Do you mean these release notes for version 1.8.5?
https://github.com/git/git/blob/master/Documentation/RelNotes/1.8.5.txt

I apologize for any confusion. I see that the release notes that you provided are for version 1.8.5 and not 2.18.0 and there is no such option called --force-with-lease on git push command in that version, my previous statement on it is incorrect. I apologize for any confusion or misinformation that this may have caused. It's important to always refer to the official documentation and consult experts in the field for the most accurate and up-to-date information.

Of course --force-with-lease is available in 2.18 - and in every version of git since 1.8.5.

Thanks (3)
Replying to NotAnAccountant2:
avatar
By NotAnAccountant2
11th Jan 2023 16:07

Oh, and here's an equivalent sort of question on tax where I did know the answer before I started:

Quote:

>when did the 3% SDLT surcharge apply to transfers between spouses?

The 3% stamp duty land tax (SDLT) surcharge on transfers of residential property between spouses or civil partners came into effect in England and Northern Ireland on July 1, 2020.

>and when was it repealed?
The 3% SDLT surcharge on transfers of residential property between spouses or civil partners was repealed on the budget of March 3rd, 2020, which means it never came into force. The measure was scrapped by Chancellor Rishi Sunak.

>what about the changes in the autumn budget in November 2017?

In the Autumn Budget 2017, it was announced that from 22 November 2017, an additional 3% stamp duty land tax (SDLT) surcharge will be applied to purchases of additional residential properties, such as second homes or buy-to-let properties, in England, Wales and Northern Ireland. The surcharge does not apply to transfers of properties between spouses or civil partners. The changes were effective immediately, but it is not repealed.

As Pauli might have said "that's not even wrong"

I also asked it a question about the natural production of C-14 and it gave a "not quite right" answer talking about thermal neutron capture by C-13 (Without more research I'm not sure whether that's plausible in the atmosphere rather than in a reactor) while failing to mention the main N-14(n,p)C-14 pathway.

Thanks (1)
By SteveHa
11th Jan 2023 12:56

I had ChatGPT re-write Baa Baa Black Sheep, Iron Maiden style. I think it did a good job:

"Verse 1:
With fleece as black as the night
Baa baa black sheep, won't you come into sight?
With a voice that's strong and true
Baa baa black sheep, singing out to you

Chorus:
Baa baa black sheep, have you any wool?
Yes sir, yes sir, for the metals cool
Baa baa black sheep, have you any more?
Yes sir, yes sir, Metal's what we adore

Verse 2:
In the fields we'll roam and graze
Baa baa black sheep, with metal in our gaze
With thundering drums and screaming guitar
Baa baa black sheep, a metal superstar

Chorus:
Baa baa black sheep, have you any wool?
Yes sir, yes sir, for the metals cool
Baa baa black sheep, have you any more?
Yes sir, yes sir, Metal's what we adore

Bridge:
With a headbangin' rhythm and a screamin' lead
Baa baa black sheep, our metal's all we need
With a denim jacket and a studded wrist
Baa baa black sheep, we'll never be dismissed

Chorus:
Baa baa black sheep, have you any wool?
Yes sir, yes sir, for the metals cool
Baa baa black sheep, have you any more?
Yes sir, yes sir, Metal's what we adore"

Thanks (1)
Replying to SteveHa:
Tom Herbert
By Tom Herbert
11th Jan 2023 14:24

Very good SteveHa. ChatGP-Trooper?

Thanks (0)
Replying to TomHerbert:
avatar
By johnjenkins
12th Jan 2023 09:39

Is that an ABBA joke?

Thanks (0)
Replying to johnjenkins:
avatar
By Dib
12th Jan 2023 14:52

No, it's an Iron Maiden* one.

See also "beer"

*My Chatbot insists on calling the band Press Virgin!

Thanks (0)
Replying to Dib:
avatar
By johnjenkins
12th Jan 2023 15:25

I would "stamp" that out.

Thanks (0)
avatar
By mkowl
12th Jan 2023 09:32

Unplug it before its too late

Thanks (1)
Replying to mkowl:
avatar
By Dib
13th Jan 2023 15:03

"I'm sorry Dave, I'm afraid I can't do that"

Thanks (1)
Replying to Dib:
avatar
By johnjenkins
13th Jan 2023 15:31

Another brilliant entertainer who has gone.

Thanks (0)
By Duggimon
12th Jan 2023 09:34

Chat GPT is a great tool to use for answering these questions if you don't care whether the answer is right. If you do care whether the answer is right then it's useless until it's 100% accurate, a pass mark in an exam isn't going to cut it.

What would really make this example more interesting is to know what sort of score someone picking completely at random would get in this multiple choice paper. I appreciate a level of nuance might be required with multiple correct answers so it's not just 25% but if a random average is around 20%, Chat GPT got 42% and the average student gets 70% then it's fairly poor, especially considering it has all the source material to check first and effectively infinite time to read it.

If you gave a human person unfamiliar with accountancy three weeks and all the relevant books properly indexed and asked them to answer this paper, would they beat 42%? AI is not at all smart, just fast enough to trick us that it is, it's like a quicker google with built in conversational style answers.

Thanks (2)
Replying to Duggimon:
avatar
By jonbryce
16th Jan 2023 15:24

If my calculations are correct, someone who selects answers at random, but makes sure to provide the correct number of random answers for the questions where you for example have to pick 2 of 4; should expect to score 40%. There's a lot of True/False questions which you should get right 50% of the time. These make up 43 out of the 89 actual answer blocks within the 50 questions.

So, 45% is not actually that much better than random chance.

Thanks (0)
avatar
By Mr J Andrews
12th Jan 2023 10:05

''Came up short''.......''Narrowly missed''
So the experiment proved it a failure.

Thanks (1)
Replying to Mr J Andrews:
By Nick Graves
12th Jan 2023 13:31

Mr J Andrews wrote:

''Came up short''.......''Narrowly missed''
So the experiment proved it a failure.

Reminds me of that vehicle safety engineer who explained that after their structural improvements, the crash test dummy was now only just dead.

Thanks (2)
avatar
By moneymanager
12th Jan 2023 11:47

The DVLA issued V5C most emphatically does NOT prove ownership, it says so on the document

Thanks (0)
Replying to moneymanager:
avatar
By Hugo Fair
12th Jan 2023 12:20

... and this is relevant to what?

Thanks (1)
Replying to Hugo Fair:
avatar
By moneymanager
12th Jan 2023 15:35

See the first question and answer, the answer is inccorrect.

Thanks (1)
Replying to moneymanager:
avatar
By Hugo Fair
12th Jan 2023 15:56

Ah, indeed, I found it hard to assign your comment to the correct location in such a lengthy article!
And, of course it's another good example of the style in which ChatGPT generates a 'believable' (but factually incorrect) answer in a confident manner.

Thanks (2)
avatar
By carnmores
12th Jan 2023 15:19

it is irrelevant for exam purposes if you do not have access to it in an exam room. Nichola RM answer is more relevant. but one would hope that this is unlikely at present. the danger is someone without knowledge relying on chatbot for advice.

Thanks (0)
Replying to carnmores:
avatar
By jonbryce
16th Jan 2023 15:25

The question is not whether it is a useful cheating tool in an exam. The question is whether it could replace a human accountant.

Thanks (0)
avatar
By moneymanager
14th Jan 2023 15:43

This could all be no more thana distraction, if in a few years time we "own nothing but will be happy", there will no more earnings, no more possibility of capital gain, no more pensions, but simply a "distribution" of CBDC tokens with a tightly controlled range of usage (see Augustin Carsten's (of the BIS) speech on YT) and all subject to our mainatining or "social credit" status by continued good behavior, ergo, no need for a tax code aand thus no need for accountants.
Discuss.

Thanks (0)
Replying to moneymanager:
avatar
By johnjenkins
15th Jan 2023 14:49

I've been saying for some time that the Government should just make us work and give us "pocket mooney" cos that's the way it's going. Paying 40%+Ni on a salary of £50001K is absolutely disgusting.

Thanks (0)
Replying to johnjenkins:
avatar
By moneymanager
16th Jan 2023 17:29

"Government pay" is equal to rations, that can be cut at any time and with digital ID and no cash, at the most granular of levels,

Thanks (0)
Replying to moneymanager:
avatar
By carnmores
16th Jan 2023 12:11

god help us one and all!

Thanks (0)