Jahan Zahid explains how to group and plot data with Python on a jupyter notebook in the latest edition of his step-by-step coding for accountants series.
Over the course of this series, we have learned how coding can empower accountants to be in full control of their data, as spreadsheets become a concern of the past. In part two of the series, we learned how to filter and sort data. This week we will look at different ways we can plot and visualise this data.
It’s worth mentioning that there are a few decent Python plotting libraries out there such as Matplotlib and Bokeh. But for the purpose of this series we’ll be using Plotly, which is particularly easy to use in my opinion.
As before, log in to your Azure Notebooks account from their homepage to get started.
Bar and line charts
We’ll be using a dummy profit and loss dataset for our first plot. Click pnl.xlsx to download this file. To start let’s import the libraries we’ll need:
After uploading the pnl.xlsx data to our library, load the data in using the read_excel function, as follows:
Content seriesView full content series
Note that by default this loads the first sheet in the spreadsheet. If your spreadsheet has multiple tabs you can specify which tab you want by passing in a “sheet_name” argument. More information on this can be found in the pandas documentation.
Let’s add the columns expenses and net_profit to this table:
Now we’re ready to plot our first chart. There’s quite a bit of code here which we’ll explain in a moment:
The data variable in the code above contains a list of each separate plot we want to show on the same chart. For each plot we specify its type - in this case “go.Bar” or “go.Scatter” for bar and line charts respectively. (Note the “.go” prefix here comes from the import: import plotly.graph_objs as go)
We’ve gone a bit further and added a couple of optional features to our plot:
- Custom colours for each data source using Hex Code format
- Including a title and axis labels for the chart
Also note that we simply passed in the column we want plotted in either the x or y coordinates. In this case month was the common x coordinate which allows us to plot everything on the same chart.
Grouping data and donut charts
For this plot we’ll be using the debtor invoices data which we’ve used before:
One aspect which we’ll cover in more detail in the next article will be on grouping data. For now, we’ll do a simple group operation summing the amount_due and total_amount columns by customer - similar to what one would do when creating a pivot table.
This grouped data is then stored in the variable totals. We can now go ahead and plot a simple donut chart showing the total amount invoiced to each customer as follows:
The “hole=0.4” is optional here and can be dropped if you want a pie chart instead.
Next week we’ll cover charts and grouping operations in more depth. In the meantime it would be worth taking a look at some examples of what one can do with the plotly library here.
There are lots of possibilities here; for example, plotting a heatmap of clients comparing revenue earned with their industry sector.
See the code we covered today here.
How are you getting on? Next week we will continue our look at plotting data before concluding the series the week after with implementing a machine learning algorithm.
About Jahan Zahid
Jahan is the founder and CEO of Indigo Cashflow, an app created to take the pain out of cash flow forecasting. Prior to this he ran a successful software consultancy, developed FX trading models at Bank of America Merrill Lynch using machine learning techniques, and worked as a Mathematics Lecturer at Bristol University. He holds a PhD in Mathematics from Oxford University.