WikiRT

SLQ Wiki Reporting Tool

WikiRT is a cross-platform, open source application developed in-house to analyse, filter and clean up the wiki's access log file and produce accurate data for reporting activity on State Library of Queensland’s Wiki.

 

Background.

Our team at The State Library (SLQ) managed a wiki for sharing workshops, ideas, programs and information about the maker space and it’s equipment. The library’s main website uses Google Analytics for analysing visitor data but wasn’t giving accurate results for our wiki (DokuWiki). We had to come up with a better way to get the most accurate numbers possible. DokuWiki produces an Apache-formatted log that we ended up parsing, initially, manually using the Linux command prompt. When I noticed my colleague torturing himself with this every quarter I decided to automate the process, adding a few improvements along the way.

Development.

My time developing this was very limited due to the nature of my work, which generally doesn’t involve any coding. There were a few speed bumps in the way of approvals from different departments and stakeholders.

The program purges unwanted/invalid data and outputs a clean file and calculates the number of sessions. One of the challenges was how to mimic sessions, which is a specific timeframe dictating a single visit and is how the library calculated online visitations.

Status.

WikiRT has been approved across the organisation, by all involved departments and stakeholders. It is being used to report monthly and quarterly figures of the visitations. 

I plan to create a white-label version of this non-specific to the wiki which works on any Apache log file. The user interface could also need more features such as; adding filters and specifying file download formats.

Read more.

The SLQ Wiki has a page on reporting which includes additional information about the tool.

 

Features

  • Analyses raw Apache access logs

  • Filters out unwanted keywords such as requests made by bots and web crawlers

  • Runs on Mac, Win and Linux (requires Python)

  • User interface

  • Removes entries that are missing crucial information

  • Splits user activity into sessions (30mins currently as to reflect SLQ website analytics)

  • Outputs clean csv file with results for further analysis

Tools / Libraries

  • Pandas is a fast, powerful, flexible and open source data analysis and manipulation tool

  • PyQT5 allows the creation of cross platform graphical user interfaces

  • Helper libraries: numpy, datetime, pytz

Previous
Previous

unjumble (working title)

Next
Next

stem.studio