Data visualization tools are a key component of Self Service Business Intelligence (BI) tools and I found one of the better overviews of these tools in Gartner's Business Value of Self Service BI. My personal favorite slide is #11 where they classify data dumps in Microsoft and Excel and Access as "The Dark Side" of self service BI. See my previous posts on the perils of abusing these tools, Please Stop Creating Microsoft Access Databases, The Problems with Siloed Databases and Dear Spreadsheet Jockey, Welcome to BigData. The problem with these tools is that they focus too much on data manipulation and too little on providing understanding and insight and if these tools are abused, they can lead to a data management nightmare.
The Gartner report provides a chart of many of the top tools in this category and I only have experience using several of them. If you're looking for a tool, I would suggest reading, downloading, and experimenting.
But the key for Data Scientists using these tools is where I can provide some insight. Harvard Business just published their Elements of Successful Visualizations covering what makes a successful visualization. The other aspect is to recognize the type of problem being solved with the visualization, the audience, and its life expectancy. Some examples
- Data Discovery visualizations are created by Data Scientists to explore the data, understand the various dimensions, and search for patterns. Scientists use a variety of visualizations, filters, calculations, and other tools to look for correlations and determine what the data says and what may be of interest.
- Data Quality visuals allow the Data Scientist to explore single or groups of dimensions. They will look for data that should be normalized such as "New York", "NY" and "N.Y." or grouped together to form hierarchies or segments. They will investigate sparse dimensions (ones with many blank or null values) or develop logic to merge common dimensions aggregated from multiple data sources.
- Storytelling visualizations often filter into a specific data set and utilize color, size, and other tools to highlight or provide insight to the reader. A scientist may create storytelling visualizations to show correlations or patterns in the data or to identify outliers. Storytelling often requires some narrative such as text, video, or presentation to help the audience understand the visualization.
- Dashboards and Tools are developed by Data Scientists as decision making tools for a selected audience. It may be a dashboard for Sales to better understand their pipeline, or a set of reports for Operations to better understand quality and productivity factors. These tools often require the Scientist to develop some documentation or training materials so that the intended audience understands the data and knows how to use the tool.
- Trends and Predicative visualizations can be used for Storytelling or may be deployed as Dashboards, but often have broader audiences. They demonstrate the collective results of decisions and activities, some that the audience can not directly control. In that regard, the Scientist must use the real estate and visual tools to display as much direct and related data so that the audience can have a complete understanding of the trends and predictions.
Once the Data Scientist understands the audience and the type of problem the visualization will solve they can the select and utilize the best visual (and often visuals) whether it include bar graphs, trees, maps and others. There are many articles covering types of visuals including this Introduction to Data Visualization, but Data Scientists should first recognize their intended audiences and needs before diving into visual approach.