One of the things I’ve been doing a lot more of over the past year at my University of Lincoln job has been looking at datasets, specifically how to harvest, store and analyse data from third party sources over time. This is a fairly exciting project but it’s one that’s also massively daunting and hasn’t really had a focus until this week, when I was asked to produce a report on the Library’s eBook usage for the last three academic years. EBooks are a huge part of the student experience these days and a growing area that we’re continuing to invest in at the University, so being able to look at the uptick in usage is essential to understanding which services are proving popular. Each service we subscribe to provides its own reporting back end for usage and each of those provides its own set of data at differing levels of granularity – so, the option open to us is to either do the Excel fandango and manipulate a bunch of exported CSV files, some of which could contain hundreds of thousands of rows of data, or figure out a way of automating the process – hello Power BI!

Okay, so Power BI, a Microsoft application, isn’t the only tool around – I also took some time to look into the possibility of using open source BI (business intelligence) application, Metabase; while it was fine to get up and running, Power BI offered more options for data imports, as well as tighter integration with our Microsoft infrastructure at the Uni. It’s a REALLY powerfull tool which is easy to get started with but has quite a steep learning curve so a lot of the last few months has been feeling my way around the software and understanding what I can ‘make it do.

Power BI comes in two parts – a desktop application which is free to download and use, and a cloud element which can be used to share completed reports. The basic idea is that you suck data into a Power BI dashboard from various different sources, build graphs and tables using the desktop app and publish it to the cloud space. Imports can range from flat files like csv, Excel spreadsheets to more complex interactions like database connections and web page scrapes. It’s really cool being able to just take a bunch of data and quickly pull something together from it!

For our purposes, we’re looking at using a centralised database (MySQL for testing purposes, moving to a SQL Server database that’s more tightly wound into our infrastructure for long term data storage) that we can push monthly data dumps into and have a number of reports that we maintain for management to review. For the eBook report I was able to draw in all of our sources and create visualisations for each of them, however what I wanted to be able to do was create a top down view of ALL the services we subscribe to. That needed a little extra work.

Reporting by date is fine, but what I needed was to be able to join all of the different datasets on that date. That needed a lookup table but I wasn’t sure on how to produce that. Well, thanks to YouTube I found this little nugget which solved the problem for me:

It’s literally as straightforward as that. On creating the lookup table I hooked it into all my other datasets with a relationship on their specific date table and, voila – I used the lookup date in all of my reports as well as to create the high level view. It might not seem it, but this is a very exciting bit of progress for this project and one which is going to open a lot more doors in the coming months!