Any suggestions on extracting and parsing information from iXBRLs?
I have a bunch of companies whose financials I need to monitor. I need some way of getting iXBRL files from Companies House and extracting the data from there to put into a spreadsheet / database. I'm aware of the CH API and there are one or two parsing projects in development (examples: https://github.com/ONSBigData/parsing_company_accounts and https://www.codeproject.com/Articles/1227765/Parsing-XBRL-with-Python). Most use the python BeautifulSoup library. But by the coders' honest admission, their XBRL parsers are a bit flaky.
And I don't want to get into OCR-ing PDFs.
Anyone has any other suggestions?
There needs to be a way of monitoring CH for changes to the companies I'm tracking. When there's a new iXBRL, I would need it downloaded, parsed and the data added to my spreadsheet / database so I can make graphs showing, for example, how the company's net asset figure has changed over the years. (I have multiple VPS accounts and can setup a cron job if that would make the monitoring bit easier).
The daily zip file from Companies House is not that useful in this context (http://download.companieshouse.gov.uk/en_accountsdata.html)
My ideal solution would visit individual company pages at CH and download the iXBRLs. I tried places like Freelancer to find someone to do this and nothing turns up in searches of XBRL so no easy way to find the right talent.
https://www.bizdb.co.uk/ does collate and publish some data, but it doesn't seem to do that great a job. For several companies I checked in their database there were gaps in the tables where numbers needed to be. And, besides, bizbd don't offer a licence to get what I want, so I'd have to scrape their data, page by page. This is not something that's worth doing given the poor quality of data there.
Help!
Replies (17)
Please login or register to join the discussion.
There is a facility @ Cos House to follow companies. It involves you calling up each company you're interested in and giving your e-mail & password.
You will then be notified of any future changes and can then input them into your system/spread sheets as required
Simplest answer from me is the create an account with companiesmadesimple add the company to your account. Then set up notifications when there are changes.
No your quite right. I thought you just wanted notification of changes to the company which it does.
Have you looked at xml coding? You write a simple xml file which pulls the data from the ixrbl in to a readable format.
I think what you need to start with is one of these services from Companies House
http://download.companieshouse.gov.uk/en_accountsdata.html
They also have other bulk download systems.
You then need someone to write code to analyse the iXBRL and slot it into a database.
It depends really on what you intend doing with the data as you can simply use the iXBRL as a data storage format.
Edit: I have now had a glance at the links in your OP. iXBRL is a special case of XML so any XML parser should do the job. You then need to extract the particular data as defined in each taxonomy.
Try to find someone who does XML. You could see what is happening by putting raw iXBRL iinto an online XML parser.
Such as this one:
https://countwordsfree.com/xmlviewer
I tried it myself on that page just to be certain and it did work.
You need to make sure you use the source iXBRL (individual characters)
I am busy at the moment, but this sort of thing is a doddle. Perhaps you should email me at [email protected] and we should have a short phone call about what you are trying to do.
Sounds like this could be done within your spreadsheet using Excel VBA, at a push!
Excel VBA has a Microsoft XML library that may help with this parsing process.
An example of possible code that could be hacked around to do this is available online at https://www.accessforums.net/showthread.php?t=28974&highlight=Timegenie+...
Whether it is worth paying someone to code this up depends on how much data is being pulled down.
Your spreadsheet could contain a record of each company's latest accounts year end and then the VBA could step through these and use the Co's House API to compare if there are newer accounts available.
These new accounts could be pulled down as iXBRL and then processed using code like above.
A more elegant and robust solution would be to code a full application for this (e.g. C++), but that would presumably be more expensive and perhaps overkill?
VT do a free viewer from which you can copy and paste a list of the tags and their values. See https://www.vtsoftware.co.uk/factviewer/
Xml coding is so simple. A quick 5 minute lesson you could easily write an xml script to pull the data and make it readable.
I wouldn't suggest doing it that way. The proposal above to use an XML parser in VBA is one way. Making it work with the Companies House API is quite a bit more work. It depends really on what people are experienced with as an important part of this is to work out what the structure of the database is to make this searchable and fast.
Pretty well every modern language, however, has some form of XML parser. Hence what should drive the choice of tool is what someone's experience is and what database they want the data to end up in.
Hello,
It's a bit late, sorry. But it doesn't look like anyone has posted a solution. So...
We've been tackling this exact problem recently - picking up a few bits of information for each company from their accounts. We can upscale our solution to grab more information if required, so far we've just been pulling out a few key fields.
Let me know if you still have a requirement.
I just stumbled in this question. We developed a simple scrapper - see https://lamatuk.com/2021/07/07/companies-house-and-heaven/
For turnover, we have the live app in https://run.worksheet.systems/app/lamat/Turnover - you can check if the companies you follow appear there.
Hi Clinton
I've got the exact opposite problem to yours. I am a developer and , as a passion project, have developed an automated IXBRL parsing solution which , while in beta, currently has the ability to parse the several million files available as zip download on CH website , build a database and run various queries across the collected data.
I am not an accountant so while my solution has the ability to query these millions of files for data points in seconds ( query examples include 'look up all companies with 100K+ revenue in 2019, look up companies with 200K+ cash in hand). I don't know what information is actually useful for accountants like yourself and I'm looking for someone to tell me how an accountant would USE this data.
I'll be open to giving you ( or anyone else here) a demo of my data search service in return for them advising me what features my search solution needs to have. I'll also be open to adding people as beta testers of the search service.
( by the way I created FastUKCompanySearch app, currently the highest rated companies house search app across both iOS and Google play store)