Share this content
15

Companies House, iXBRL, extracting data. Help!

Does anyone have any suggestion on extracting data from iXBRL files?

Didn't find your answer?

Any suggestions on extracting and parsing information from iXBRLs?

I have a bunch of companies whose financials I need to monitor. I need some way of getting iXBRL files from Companies House and extracting the data from there to put into a spreadsheet / database. I'm aware of the CH API and there are one or two parsing projects in development (examples: https://github.com/ONSBigData/parsing_company_accounts and https://www.codeproject.com/Articles/1227765/Parsing-XBRL-with-Python). Most use the python BeautifulSoup library. But by the coders' honest admission, their XBRL parsers are a bit flaky.

And I don't want to get into OCR-ing PDFs.

Anyone has any other suggestions?

There needs to be a way of monitoring CH for changes to the companies I'm tracking. When there's a new iXBRL, I would need it downloaded, parsed and the data added to my spreadsheet / database so I can make graphs showing, for example, how the company's net asset figure has changed over the years. (I have multiple VPS accounts and can setup a cron job if that would make the monitoring bit easier).

The daily zip file from Companies House is not that useful in this context (http://download.companieshouse.gov.uk/en_accountsdata.html)

My ideal solution would visit individual company pages at CH and download the iXBRLs. I tried places like Freelancer to find someone to do this and nothing turns up in searches of XBRL so no easy way to find the right talent.

https://www.bizdb.co.uk/ does collate and publish some data, but it doesn't seem to do that great a job. For several companies I checked in their database there were gaps in the tables where numbers needed to be.  And, besides, bizbd don't offer a licence to get what I want, so I'd have to scrape their data, page by page. This is not something that's worth doing given the poor quality of data there.

Help!

Replies (15)

Please login or register to join the discussion.

avatar
By bernard michael
30th Dec 2019 09:58

There is a facility @ Cos House to follow companies. It involves you calling up each company you're interested in and giving your e-mail & password.
You will then be notified of any future changes and can then input them into your system/spread sheets as required

Thanks (0)
Replying to bernard michael:
avatar
By Clinton Lee
30th Dec 2019 10:31

Thank you for your reply. I'm subscribed to that service for a select few companies and I get email notifications.

Unfortunately, that's not scalable. I need to track several hundred companies.

I appreciate that a dedicated email address could be setup for these notifications and an ITTT rule or other automation could trigger, on receipt of a notification email, the software going to CH and downloading the file (if it's an iXBRL rather than a PDF about a director resignation). That bit's easy.

My big issue is with the parsing.

Thanks (0)
avatar
By SXGuy
30th Dec 2019 14:22

Simplest answer from me is the create an account with companiesmadesimple add the company to your account. Then set up notifications when there are changes.

Thanks (0)
Replying to SXGuy:
avatar
By Clinton Lee
30th Dec 2019 16:58

Thanks, but that looks like a company formation service, nothing to do with what I described. I've no idea how that will get me balance sheet data imported into a spreadsheet. Am I missing something?

Thanks (0)
Replying to Clinton:
avatar
By SXGuy
31st Dec 2019 09:42

No your quite right. I thought you just wanted notification of changes to the company which it does.

Have you looked at xml coding? You write a simple xml file which pulls the data from the ixrbl in to a readable format.

Thanks (0)
avatar
By johnhemming
30th Dec 2019 16:35

I think what you need to start with is one of these services from Companies House
http://download.companieshouse.gov.uk/en_accountsdata.html

They also have other bulk download systems.

You then need someone to write code to analyse the iXBRL and slot it into a database.

It depends really on what you intend doing with the data as you can simply use the iXBRL as a data storage format.

Edit: I have now had a glance at the links in your OP. iXBRL is a special case of XML so any XML parser should do the job. You then need to extract the particular data as defined in each taxonomy.

Thanks (1)
Replying to johnhemming:
avatar
By Clinton Lee
30th Dec 2019 16:57

Thanks, John.

I did start off thinking I'll hire someone to write the code to extract data from iXBRLs and put it into a spreadsheet. But looking at places like Freelancer etc there doesn't seem to be anyone with the requisite skills! Forget iXBRL - just search for XBRL and you'll get nothing.

I intend to use the data to create graphs / charts for publication on a niche website. There'll be about 500-1000 Ltd companies and LLPs I'll be tracking.

Thanks (0)
Replying to Clinton:
avatar
By johnhemming
30th Dec 2019 17:57

Try to find someone who does XML. You could see what is happening by putting raw iXBRL iinto an online XML parser.

Such as this one:
https://countwordsfree.com/xmlviewer

Thanks (0)
Replying to johnhemming:
avatar
By Clinton Lee
30th Dec 2019 18:35

Er, you can't put an iXBRL into an XML parser, you need to convert it to XML first. I tried copy pasting from an iXBRL file into that page you provided but it doesn't work for me.

Thanks (0)
Replying to Clinton:
avatar
By johnhemming
30th Dec 2019 20:47

I tried it myself on that page just to be certain and it did work.

You need to make sure you use the source iXBRL (individual characters)

I am busy at the moment, but this sort of thing is a doddle. Perhaps you should email me at [email protected] and we should have a short phone call about what you are trying to do.

Thanks (0)
By alan.rolfe
30th Dec 2019 15:49

Sounds like this could be done within your spreadsheet using Excel VBA, at a push!

Excel VBA has a Microsoft XML library that may help with this parsing process.

An example of possible code that could be hacked around to do this is available online at https://www.accessforums.net/showthread.php?t=28974&highlight=Timegenie+...

Whether it is worth paying someone to code this up depends on how much data is being pulled down.

Your spreadsheet could contain a record of each company's latest accounts year end and then the VBA could step through these and use the Co's House API to compare if there are newer accounts available.

These new accounts could be pulled down as iXBRL and then processed using code like above.

A more elegant and robust solution would be to code a full application for this (e.g. C++), but that would presumably be more expensive and perhaps overkill?

Thanks (0)
avatar
By vtsoftware
31st Dec 2019 14:50

VT do a free viewer from which you can copy and paste a list of the tags and their values. See https://www.vtsoftware.co.uk/factviewer/

Thanks (0)
avatar
By SXGuy
31st Dec 2019 18:08

Xml coding is so simple. A quick 5 minute lesson you could easily write an xml script to pull the data and make it readable.

Thanks (0)
Replying to SXGuy:
avatar
By johnhemming
31st Dec 2019 19:20

I wouldn't suggest doing it that way. The proposal above to use an XML parser in VBA is one way. Making it work with the Companies House API is quite a bit more work. It depends really on what people are experienced with as an important part of this is to work out what the structure of the database is to make this searchable and fast.

Pretty well every modern language, however, has some form of XML parser. Hence what should drive the choice of tool is what someone's experience is and what database they want the data to end up in.

Thanks (0)
avatar
By oblonguk
06th Oct 2020 18:43

Hello,

It's a bit late, sorry. But it doesn't look like anyone has posted a solution. So...

We've been tackling this exact problem recently - picking up a few bits of information for each company from their accounts. We can upscale our solution to grab more information if required, so far we've just been pulling out a few key fields.

Let me know if you still have a requirement.

Thanks (0)
Share this content

Related posts