Jahan Zahid moves on from plotting data with Python to creating a stacked bar chart and then finish up with creating an aged debtors report with a pivot table.
In last week’s coding for accountants we learned how to plot our first chart and group data. This week we will take the debtor invoices data and create a stacked bar chart and pivot tables.
To get started login to your Azure Notebooks account from their homepage.
Stacked bar chart
We’ll carry on using the debtor invoices data, by first loading it into our jupyter notebook:
For the next chart we want to show a visualisation of the paid and unpaid invoice amounts by month. To do this we’ll need two extra columns - one showing the paid amount and the other with the month the invoice is due. From this we’ll be able to aggregate the paid and unpaid amounts by month and plot a stacked bar chart.
Let’s start by adding a column for the “paid_amount”
Content seriesView full content series
Next we’ll add a column with the month the invoice is due. This requires running a function to parse the due_date column to a month format.
We do this by calling “.map” on the due_date column and passing in the function we want to apply. In this case we’re not going to give our function a name (as we did with “overdue_days” in part two), instead we can use the lambda operator to define a function which takes “x” (i.e. the due_date) as the argument and calls “.strftime” on it to return the parsed month.
Don’t worry if that was a lot to take in - the more you see this being applied in other contexts the easier it will be to fully understand.
Now that we have all we need we can go ahead and group our data by the newly created column due_month.
Plotting this is quite similar to how we’ve been previously plotting data:
We’ll finish up by covering how to create a pivot table and do a “vlookup” using the “merge” operation in pandas. This will allow us to create an aged debtors report.
To start we’ll need to import a couple of libraries:
Note that we’ve also set a date for today as 1 January 2018, as we’re working with static dummy data. To get the actual date today you would instead run: datetime.datetime.today()
Next, we’ll add a due_by column which will indicate which week the invoice is due by - this will form the columns for our pivot table. To do this we’ll create a function which we can apply to the debtors data, again similar to how we did for “overdue_days”.
Try and follow the logic in the function if you can.
We’ve also filtered out invoices that have been paid already, and storing that table in the variable “outstanding”.
We’re now in a position to create our aged debtors report using the pandas pivot table:
Hopefully, some of the arguments of “.pivot_table” will sound familiar when compared to creating a pivot table in Excel. We’ve also included a “.fillna(0)” as pandas doesn’t automatically set cells with no value to 0. There is also a “.reset_index” in there which will make doing our merge (i.e. vlookup) easier.
One thing you might notice is missing from the above table is the total amount due by customer. To add this in we’ll first create a table with the amount due by customer and then merge this with our “aged_debtors” pivot table.
First we’ll create the amount due by customer table
Note we’ve added “[['customer', 'amount_due']]” at the end of the groupby operation to ensure we only return those two columns.
Finally we’ll merge (ie vlookup) this with our “aged_debtors” table to get the result we want
Next week we’ll start the first part of how to implement a machine learning algorithm. Click here to see the code we covered today.
About Jahan Zahid
Jahan is the founder and CEO of Indigo Cashflow, an app created to take the pain out of cash flow forecasting. Prior to this he ran a successful software consultancy, developed FX trading models at Bank of America Merrill Lynch using machine learning techniques, and worked as a Mathematics Lecturer at Bristol University. He holds a PhD in Mathematics from Oxford University.