Any suggestions on extracting and parsing information from iXBRLs?
I have a bunch of companies whose financials I need to monitor. I need some way of getting iXBRL files from Companies House and extracting the data from there to put into a spreadsheet / database. I'm aware of the CH API and there are one or two parsing projects in development (examples: https://github.com/ONSBigData/parsing_company_accounts and https://www.codeproject.com/Articles/1227765/Parsing-XBRL-with-Python). Most use the python BeautifulSoup library. But by the coders' honest admission, their XBRL parsers are a bit flaky.
And I don't want to get into OCR-ing PDFs.
Anyone has any other suggestions?
There needs to be a way of monitoring CH for changes to the companies I'm tracking. When there's a new iXBRL, I would need it downloaded, parsed and the data added to my spreadsheet / database so I can make graphs showing, for example, how the company's net asset figure has changed over the years. (I have multiple VPS accounts and can setup a cron job if that would make the monitoring bit easier).
The daily zip file from Companies House is not that useful in this context (http://download.companieshouse.gov.uk/en_accountsdata.html)
My ideal solution would visit individual company pages at CH and download the iXBRLs. I tried places like Freelancer to find someone to do this and nothing turns up in searches of XBRL so no easy way to find the right talent.
https://www.bizdb.co.uk/ does collate and publish some data, but it doesn't seem to do that great a job. For several companies I checked in their database there were gaps in the tables where numbers needed to be. And, besides, bizbd don't offer a licence to get what I want, so I'd have to scrape their data, page by page. This is not something that's worth doing given the poor quality of data there.