EDIT 2: Okay, so I’ve fixed the issue with the script filling available memory by reading and writing on a per-line basis. I’ve updated the repository with the latest code.
EDIT: Talk about posting too soon! So, it turns out that some of the AWStats files we have are pretty darn big – upwards of 50Mb, which is causing some issues when the script tries to read them. I’m looking into this but any thoughts are appreciated! Original article follows…
Time for another php script on GitHub!
Okay, so I’m quite pleased with this, despite its simplicity. Over the last four years I’ve been working with AWStats to process web log files from one of our services at work. This has basically involved ingesting the log files into AWStats, processing the data into HTML pages and publishing them to a shared area on the Universities Sharepoint instance. Recently, however, there has been a call to be able to analyse the files in a little more depth. So, this week I started exploring how we could get that data out in a more “interesting” format, either CSV, JSON, XML – something more usable.
Well, it turns out it’s not that straightforward. There isn’t any native options within AWStats to export data to something other than HTML reports. I did come across a great article providing a template for getting the data into Excel and using that to generate slightly richer reporting. If you’re interested, you can find that here: https://www.timelesswebsolutions.com/ms-excel-import-awstats/
Now, that article got me both looking at the data within the AWStats files, as well as thinking that perhaps I could write a script to step through the file and pull the relevant data out. So I wrote one. You can find it here, on GitHub: https://github.com/abeeken/awstats_extractor
So, what’s that doing? Well, AWStats stores all its data in flat .txt files which makes it fairly easy to read into memory and “do stuff” to. So, the script is taking a path to a file, reading it line by line into an array and stepping through, looking for key flags within the data. The data is fairly organised which is good for working with in this way. Each chunk of data always starts with BEGIN_[DATANAME] and ends with END_[DATANAME], so I set the script to look for those flags and set a marker as appropriate. If a marker is set, subsequent lines will be read into a data array, split into values on space characters, using the marker as the key. Once the whole file has been read in, the script then outputs a csv file for each data marker into an “outputs” folder which can then be retrieved and worked with.
It’s fairly simple but, that’s the point. It’s simplified this whole process so that we can now, with relative ease, take these complex files and digest them down into something more usable. The next step is to include column headings in the CSVs (something a little more difficult given how AWStats stores some of the data) and allow a batch of files to be processed (although this will largely be down to whatever control script is running the function – it’s been designed to be flexible).
So, is this of use to you? Let me know down below or throw me a message on GitHub!