ClickStream Analysis Tool

For those of you that want a free/easy clickstream analysis tool, have a look at StatViz. If you're running Apache and using the standard log format then plugging in this tool is very easy.

  1. Download and install GraphViz. There's an RPM for linux...
  2. Download and install StatViz in a directory. It's basically one php file. The README file will tell you how to customize the configuration file and run it.
  3. I don't have too many PHP apps running so there's a couple of other things you may need to do. First, you'll need PEAR:Config. Once you have this, uncompress/untar it the easiest thing to do is move it Config.php and the Config dir to /usr/share/pear. Second, statviz takes up a lot of memory so you may need to increase the memory_limit configuration parameter in your /etc/php.ini
That's pretty much it...

Basic Running

You can run it using

./statviz.php --config configfile

and then create a gif file of the output by doing something like

dot -Tgif -oOutputGifFileName InputDotFile

If you put the output gif file in a web accessible dir then you'll be able to see it from your browser.

Things To Look For

There are a number of things you'll need to consider if you want accurate results:
  • Make sure you look at the bot extensions and make best attempts to get these filtered out.
  • Make sure you have all non-pages (graphics, js, css) filtered out.
  • If possible, try to filter out requests from internal users. Statviz doesn't have a filter for this, so I just scrubbed out of the logs myself using a grep -v.
  • If you're site has long URL's, you will most certaintly want to clean them up before processing. The tool allows you to create an alias file, but you may need/want to do some log scrubbing on your own.
  • Play around with the GraphNReferrerPairs parameter. You can get a lot more detail on site activity with higher numbers, but the graph becomes the graph then becomes a lot more complex to digest. If you decide on a large graph, you may need to modify the source and change the size of the graph. It defaults to 10, 8 and there isn't a parameter to configure this. I changed it to 20, 16 for most of my small graphs (GraphNReferrerPairs <>) and to 40, 32 for larger graphs.
  • Very long URLs are going to be a hassle, especially if they come from external referrers and out of your control. I put in some checks in the code to clip the very long URLs.


I've automated a couple of things on my site:
- A report that updates hourly on today's activity.
- I archive a daily gif file. (I will add weekly and monthly in the future).
- I have a 'full report' that shows activity for the last 30 days. I update this daily.

I'll put out another entry with a quick 101 on interpretting the results.

No comments:

Post a Comment

Comments on this blog are moderated and we do not accept comments that have links to other websites.


About Isaac Sacolick

Isaac Sacolick is President of StarCIO, a technology leadership company that guides organizations on building digital transformation core competencies. He is the author of Digital Trailblazer and the Amazon bestseller Driving Digital and speaks about agile planning, devops, data science, product management, and other digital transformation best practices. Sacolick is a recognized top social CIO, a digital transformation influencer, and has over 900 articles published at InfoWorld,, his blog Social, Agile, and Transformation, and other sites. You can find him sharing new insights @NYIke on Twitter, his Driving Digital Standup YouTube channel, or during the Coffee with Digital Trailblazers.