AI chatbot falls just short on accounting exam
An accountant ran the much-hyped AI tool ChatGPT through a sample first-stage ACA paper, and while the chatbot came up short, the fact it narrowly missed the pass mark should make the profession sit up and take note.
You might also be interested in
Replies (30)
Please login or register to join the discussion.
“.. however, ChatGPT gives answers with an air of confidence even when it’s completely wrong.
It’s not afraid to give a garbage answer and back it up with garbage. It’s like having a fresh-faced junior who’s always convinced they’re right, so users need to approach it with a degree of caution.”
That is the only valid takeaway here ... the rest is misunderstood interpretation by those who should know better (I'm referring to Stuart Cobbe, not you Tom)!
ChatGPT does 3 things well - it can: parse a question to work out its search parameters, perform those searches remarkably quickly, and extract the results as grammatically correct text.
But it has no 'understanding' of the original question or indeed of the answer it creates (hence the quote I've repeated at the start of my reply) - and so, like a 5 year-old, will quickly fall back on self-justification (including famously making up false citations) that takes the answer ever further from the truth initially sought.
So my conclusion of the test/results in this experiment are somewhat different to those published by Cobbe.
It's not that ChatGPT has almost caught up with trained accountants ... more that the wrong type of questions are being asked in the exam. They are designed for ease of marking (which means they are geared towards the strengths of ChatGPT - as set out above - rather than testing understanding of the topic that should be expected of a trained accountant).
I like the summary as a puffed up junior who is always right.
I had a play with it last week, and it has a dangerous veneer of confidence, with little behind it.
It was like talking to a 'nice but dim' public school educated person with bags of confidence who speaks very nicely, and writes the perfect sentence, but has no ability, or a politician trading on bravado over substance.
Good tax people know what they don't know. Chat Bot blusters through.
Still I imagine ChatBot will do better than many of the posters in any answers, it has at least read a bit about tax.
Still I imagine ChatBot will do better than many of the posters in any answers, it has at least read a bit about tax.
And it read the questions.
I hate to think what ChatBot would return if fed a diet of Any Answers. GIGO/CICO principle etc.
I'm tempted to feed it all of Justin's contributions then ask it to analyse a Tribunal decision or, indeed anything.............the judge is an idiot, the law is wrong, here is a deliberate misreading of some leg or a paragraph in the decision...........something about Rangers...........
The Bot wouldn't be so much be the keen confident junior, but would have the arrogant confidence of someone who thinks they are always right.
It probably was fed a diet of such material.
Also, would it understand that answers I gave maybe 15 years ago may have been correct at the time, but almost certainly aren't correct now?
The risk to professional firms is that staff will use ChatGPT to write advice for clients etc. and as the bot is so convincing in its responses that staff may easily fail to check what splurges out is correct. Love your comments on 'fresh-faced juniors'. Firms should have a policy for staff about its use (it also wastes hours of anyone's time). It's noted that Stackoverflow (a leading developers forum) has banned its answers from its forum although its quite clear that ChatGPT has been 'trained' by the answers from that very forum. There's no reason why you can't train it from, say this one too. If you have ever used a boilerplate clause in a contract, its likely to have come from old equivalent to this bot (there is various fast drafting software out there or failing that template contracts and clauses) so in that respect this can be yet another useful tool when you are drafting contracts etc. Would be most useful for say your firms LOEs as it can replace the need for software.
I have fed it several coding problems as well as various tax queries and it's answers were not good. Its not yet been trained on tax. It's only a matter of time though.
The risk to professional firms is that staff will use ChatGPT to write advice for clients etc. and as the bot is so convincing in its responses that staff may easily fail to check what splurges out is correct.
You might get the opposite problem too, HMRC staff rely on chatGPT "advice" when doing investigations.
This might become a huge problem for professionals in any job, medicine, computing, accountancy, civil engineering... The layman being able to very quickly produce very convincing arguments and the professional having to spend large amounts of time rebutting them.
when was --force-with-lease added to git push?The --force-with-lease option for the git push command was added in git version 2.18.0, released on 2018-03-28.
But:
--force-with-lease" was added in git 1.8.5 [https://github.com/git/git/blob/master/Documentation/RelNotes/1.8.5.txt], in 2013.
ChatGPT can't even get it right with a bit of help...
Do you mean these release notes for version 1.8.5?
https://github.com/git/git/blob/master/Documentation/RelNotes/1.8.5.txtI apologize for any confusion. I see that the release notes that you provided are for version 1.8.5 and not 2.18.0 and there is no such option called --force-with-lease on git push command in that version, my previous statement on it is incorrect. I apologize for any confusion or misinformation that this may have caused. It's important to always refer to the official documentation and consult experts in the field for the most accurate and up-to-date information.
Of course --force-with-lease is available in 2.18 - and in every version of git since 1.8.5.
Oh, and here's an equivalent sort of question on tax where I did know the answer before I started:
>when did the 3% SDLT surcharge apply to transfers between spouses?The 3% stamp duty land tax (SDLT) surcharge on transfers of residential property between spouses or civil partners came into effect in England and Northern Ireland on July 1, 2020.
>and when was it repealed?
The 3% SDLT surcharge on transfers of residential property between spouses or civil partners was repealed on the budget of March 3rd, 2020, which means it never came into force. The measure was scrapped by Chancellor Rishi Sunak.>what about the changes in the autumn budget in November 2017?
In the Autumn Budget 2017, it was announced that from 22 November 2017, an additional 3% stamp duty land tax (SDLT) surcharge will be applied to purchases of additional residential properties, such as second homes or buy-to-let properties, in England, Wales and Northern Ireland. The surcharge does not apply to transfers of properties between spouses or civil partners. The changes were effective immediately, but it is not repealed.
As Pauli might have said "that's not even wrong"
I also asked it a question about the natural production of C-14 and it gave a "not quite right" answer talking about thermal neutron capture by C-13 (Without more research I'm not sure whether that's plausible in the atmosphere rather than in a reactor) while failing to mention the main N-14(n,p)C-14 pathway.
I had ChatGPT re-write Baa Baa Black Sheep, Iron Maiden style. I think it did a good job:
"Verse 1:
With fleece as black as the night
Baa baa black sheep, won't you come into sight?
With a voice that's strong and true
Baa baa black sheep, singing out to you
Chorus:
Baa baa black sheep, have you any wool?
Yes sir, yes sir, for the metals cool
Baa baa black sheep, have you any more?
Yes sir, yes sir, Metal's what we adore
Verse 2:
In the fields we'll roam and graze
Baa baa black sheep, with metal in our gaze
With thundering drums and screaming guitar
Baa baa black sheep, a metal superstar
Chorus:
Baa baa black sheep, have you any wool?
Yes sir, yes sir, for the metals cool
Baa baa black sheep, have you any more?
Yes sir, yes sir, Metal's what we adore
Bridge:
With a headbangin' rhythm and a screamin' lead
Baa baa black sheep, our metal's all we need
With a denim jacket and a studded wrist
Baa baa black sheep, we'll never be dismissed
Chorus:
Baa baa black sheep, have you any wool?
Yes sir, yes sir, for the metals cool
Baa baa black sheep, have you any more?
Yes sir, yes sir, Metal's what we adore"
No, it's an Iron Maiden* one.
See also "beer"
*My Chatbot insists on calling the band Press Virgin!
Chat GPT is a great tool to use for answering these questions if you don't care whether the answer is right. If you do care whether the answer is right then it's useless until it's 100% accurate, a pass mark in an exam isn't going to cut it.
What would really make this example more interesting is to know what sort of score someone picking completely at random would get in this multiple choice paper. I appreciate a level of nuance might be required with multiple correct answers so it's not just 25% but if a random average is around 20%, Chat GPT got 42% and the average student gets 70% then it's fairly poor, especially considering it has all the source material to check first and effectively infinite time to read it.
If you gave a human person unfamiliar with accountancy three weeks and all the relevant books properly indexed and asked them to answer this paper, would they beat 42%? AI is not at all smart, just fast enough to trick us that it is, it's like a quicker google with built in conversational style answers.
If my calculations are correct, someone who selects answers at random, but makes sure to provide the correct number of random answers for the questions where you for example have to pick 2 of 4; should expect to score 40%. There's a lot of True/False questions which you should get right 50% of the time. These make up 43 out of the 89 actual answer blocks within the 50 questions.
So, 45% is not actually that much better than random chance.
''Came up short''.......''Narrowly missed''
So the experiment proved it a failure.
Reminds me of that vehicle safety engineer who explained that after their structural improvements, the crash test dummy was now only just dead.
The DVLA issued V5C most emphatically does NOT prove ownership, it says so on the document
Ah, indeed, I found it hard to assign your comment to the correct location in such a lengthy article!
And, of course it's another good example of the style in which ChatGPT generates a 'believable' (but factually incorrect) answer in a confident manner.
it is irrelevant for exam purposes if you do not have access to it in an exam room. Nichola RM answer is more relevant. but one would hope that this is unlikely at present. the danger is someone without knowledge relying on chatbot for advice.
The question is not whether it is a useful cheating tool in an exam. The question is whether it could replace a human accountant.
This could all be no more thana distraction, if in a few years time we "own nothing but will be happy", there will no more earnings, no more possibility of capital gain, no more pensions, but simply a "distribution" of CBDC tokens with a tightly controlled range of usage (see Augustin Carsten's (of the BIS) speech on YT) and all subject to our mainatining or "social credit" status by continued good behavior, ergo, no need for a tax code aand thus no need for accountants.
Discuss.
I've been saying for some time that the Government should just make us work and give us "pocket mooney" cos that's the way it's going. Paying 40%+Ni on a salary of £50001K is absolutely disgusting.
"Government pay" is equal to rations, that can be cut at any time and with digital ID and no cash, at the most granular of levels,