Data Visualization Examples

In my last post, I reviewed Five Types of Data Visualizations and broke them down to discovery, quality, storytelling, dashboards/tools and trends/predictive. In this post, I will share some examples of Discovery, Quality, and Dashboard Visualizations. There are too many good examples of Trends/Predicative and Storytelling Dashboards and it was too hard to select one for this post.

Discovery

My Network on LinkedIn

 

LinkedIn provides this fun tool to help its users visualize and navigate their networks. Data discovery tools have to provide mechanisms to visualize large data sets and help identify relationships. This particular tool leverages color, distance relationships, and zoom in/out to help the user find clusters and potential connections.

Quality

There many tools on the market to help Data Stewards and Data Scientists to identify and address data quality issues. My favorite tools help users identify quality issues and drill down into the data by its dimensions to identify causes and fixes. The tool below is featured in Profiler: Integrated Statistical Analysis and Visualization for Data Quality Assessment

Quality issues highlighted in this dashboard

Dashboards

Tableau Public has numerous examples of interesting visualizations. The one below is a good example of a Dashboard that lets the user see the visualization, review the detailed data, and use filters to drill into results.

continue reading "Data Visualization Examples"

Five Types of Data Visualizations

Data visualization tools help Data Scientists explore data, find patterns, and provide organizations tools to leverage data for making decisions. Data Visualization tools stand at the top of my Top Five Tools of Big Data Analytics because other tools such as infrastructure, data quality, semantic engines, predictive analytics and data mining all require good visualizations to demonstrate results. Data visualization tools are also a key tool Data Scientists can differentiate.

Data visualization tools are a key component of Self Service Business Intelligence (BI) tools and I found one of the better overviews of these tools in Gartner's Business Value of Self Service BI.  My personal favorite slide is #11 where they classify data dumps in Microsoft and Excel and Access as "The Dark Side" of self service BI. See my previous posts on the perils of abusing these tools, Please Stop Creating Microsoft Access Databases, The Problems with Siloed Databases and Dear Spreadsheet Jockey, Welcome to BigData. The problem with these tools is that they focus too much on data manipulation and too little on providing understanding and insight and if these tools are abused, they can lead to a data management nightmare.

The Gartner report provides a chart of many of the top tools in this category and I only have experience using several of them. If you're looking for a tool, I would suggest reading, downloading, and experimenting.

But the key for Data Scientists using these tools is where I can provide some insight. Harvard Business just published their Elements of Successful Visualizations covering what makes a successful visualization. The other aspect is to recognize the type of problem being solved with the visualization, the audience, and its life expectancy. Some examples


  • Data Discovery visualizations are created by Data Scientists to explore the data, understand the various dimensions, and search for patterns. Scientists use a variety of visualizations, filters, calculations, and other tools to look for correlations and determine what the data says and what may be of interest.

  • Data Quality visuals allow the Data Scientist to explore single or groups of dimensions. They will look for data that should be normalized such as "New York", "NY" and "N.Y." or grouped together to form hierarchies or segments. They will investigate sparse dimensions (ones with many blank or null values) or develop logic to merge common dimensions aggregated from multiple data sources.

  • Storytelling visualizations often filter into a specific data set and utilize color, size, and other tools to highlight or provide insight to the reader. A scientist may create storytelling visualizations to show correlations or patterns in the data or to identify outliers. Storytelling often requires some narrative such as text, video, or presentation to help the audience understand the visualization.

  • Dashboards and Tools are developed by Data Scientists as decision making tools for a selected audience. It may be a dashboard for Sales to better understand their pipeline, or a set of reports for Operations to better understand quality and productivity factors. These tools often require the Scientist to develop some documentation or training materials so that the intended audience understands the data and knows how to use the tool.

  • Trends and Predicative visualizations can be used for Storytelling or may be deployed as Dashboards, but often have broader audiences. They demonstrate the collective results of decisions and activities, some that the audience can not directly control. In that regard, the Scientist must use the real estate and visual tools to display as much direct and related data so that the audience can have a complete understanding of the trends and predictions.



Once the Data Scientist understands the audience and the type of problem the visualization will solve they can the select and utilize the best visual (and often visuals) whether it include bar graphs, trees, maps and others. There are many articles covering types of visuals including this Introduction to Data Visualization, but Data Scientists should first recognize their intended audiences and needs before diving into visual approach.


continue reading "Five Types of Data Visualizations"

Dark Data - A Business Definition

I mentioned dark data in one my recent posts covering the issues with creating Microsoft Access Databases and with siloed databases in general. I have a broad definition of dark data and more expansive then some posts that I've found -


All of these definitions are true, but somewhat limited. In What Is Big Data? The Real Challenges Beyond Volume, Velocity and Variety, I provided my definition of Big Data:

Big Data Defined - Big Data is not defined by its data management challenges, but by the organization's capabilities in analyzing the data, deriving intelligence from it, and leveraging it to make forward looking decisions. It should also be defined by the organization's capability in creating new data streams and aggregating them into its data warehouses. 
As such, my definition of Dark Data is fairly expansive
Dark Data Defined - Dark data is data and content that exists and is stored, but is not leveraged and analyzed for intelligence or used in forward looking decisions. It includes data that is in physical locations or formats that make analysis complex or too costly, or data that has significant data quality issues. It also includes data that is currently stored and can be connected to other data sources for analysis, but the Business has not dedicated sufficient resources to analyze and leverage. Finally (and this may be debatable), dark data also includes data that currently isn't captured by the enterprise, or data that exists outside of the boundary of the enterprise.
This basically demonstrates three conditions where data could be, but is not leveraged in Big Data analytics: (i) It could exist in a sufficient format, but the Business hasn't leveraged it yet, (ii) it exists, but it is too costly to clean or process, or (iii) it doesn't exist and it needs to be captured or acquired.

In some ways, Dark Data is the opposite of Big Data..



continue reading "Dark Data - A Business Definition"

The Problems with Siloed Databases Part 2

I received several comments on my last post, Please, Stop Creating Microsoft Access Databases and thought I'd use today's post to respond to some of them.

  • "It's more often poor planning, lack of knowledge transfer and/or changes of use over time." - I completely agree that this is what makes the ongoing database and application support complex. I accept all of these as reality so the question is, are non-developers or business users developing databases with structures and documentation that simplify changes? If they are performing one-time data analysis, then maybe this isn't a concern but for databases that will be used and updated over time, they should be managed by database developers and dba's that are trained to support enhancements and changes.

  • "The underlying problem here isn't MS Access." - MS Access is unique in that it is widely available to business users, it easily allows saving databases to desktop hard drives that are difficult to administer, and has application functionality such as forms and reports. So yes, one of the underlying problems is MS Access because of its capabilities and how it is deployed.

  • "Can we arrange a process of promotion, where ad hoc DBs get promoted to proper data in due course?" - Yes, this is possible and ideal, but hard to govern and sometimes difficult to staff. It depends on how much database development is in practice and the size of the organization. My policy would read something like:
    • Register all non-IT database development in a directory.
    • Allow databases to be created for one-time data analysis, but archive them in three months or less.
    • Prototype databases for single user use, but if multiple users need access or if form/reports are needed, then the prototype should be transferred to IT so that they can be properly developed and managed. 

  • "Non-developers create Access databases because they need to get some work done" - I agree, and relying on IT isn't always the answer. However, most non-developers don't have an objective to create a database - they are usually looking to develop a workflow or to perform some analytics or reporting and realize they need a database to store the data. To that end, I think it is better for IT organizations to provide "self-service" tools to manage departmental workflows (see my post: In my CIO toolkit), or tools for self-service analytics/reporting.

  • "It turns out it is about dark data and how organizations should better consider their enterprise data handling." "Dark data can be a problem." - Indeed, that is really what the post is about. My definition of dark data is "Data that isn't documented or easily understood, data that can't easily be connected to other data sources, or data that can't easily be used in analytics.". So when you have poorly planned, siloed databases, then this is a dark data issue.
continue reading "The Problems with Siloed Databases Part 2"

Please, Stop Creating Microsoft Access Databases!

It all starts very simply and innocently with someone needing a place to store data that is a little bit more than what is convenient to store in Microsoft Excel. She thinks, "It's just a couple of tables and I already have MS Access on my desktop", so this shouldn't be too hard. The bad news is that if this database is "successful" it will likely draw others to it forcing the SadBA (self appointed database business analyst) to consider granting access to her desktop stored database, developing forms, and producing reports. Even worse is when new opportunities present themselves and she decides to create additional MS Access databases. She only calls in IT if she needs something scripted such as more advanced forms or jobs that can load and transform new data.

Flash forward a few years and consider if this behavior is repeated across multiple organizations and locations and you have a classic database mess. IT will probably be asked to perform heroics when a desktop fails and there isn't a sufficient backup, or when there is an MS Office upgrade being planned and these databases need testing, or when the SadBA is leaving the company and no one understands how to support these databases.

As big of a database mess this is, the underlying data mess can be a daunting maze to unwind. Consider even a single database, a trained DBA would need to understand the underlying data model, document any scripts or procedures loading data, and itemize reporting needs. If any forms were developed and especially if multiple people are using the database as part of a workflow, then you'll need a Business Analyst and possibly an Application Developer to consider how these business processes are accomplished.

Perhaps you've never had to read someone else's code?


Rebuilding a database when it likely has poor naming conventions, missing data relationships, and a complete lack of referential integrity requires a DBA with the skills of a linguistic anthropologist. Now tell this DBA that there are multiple databases that contain duplicate and related data and they'll need some special software tools to normalize the data model, load in data from multiple sources, and match, merge and de-duplicate records,  before even considering how to replicate existing functionality.


Why is this a Big Concern?


Even smaller companies are recognizing the benefits of analytics and Big Data processing. It's relatively easy for a business user to perform analysis on a single data source, or even a handful if the data relationships are understood. This can easily be done in MS Excel or even better, by selecting and correctly leveraging a self service BI tool. But if there are numerous databases stored all over the place with undocumented data dictionaries, unknown data quality, and little understanding of how to relate data sources, then it is virtually impossible to perform broad analytics on it. It is part of the company's dark data - data that exists but can't easily be analyzed for intelligence or insight.

Is this your company's sales data, customer data, marketing data, or financial data? More likely, the answer is yes because it's this data that business users work with the most. If the business user needed to perform a quick analysis and IT wasn't accessible, available, or had the necessary agility to solution, then it is likely that a SpreadSheet Jockey or a SadBA established a solution.

What is the first step to solving this issue? Please, stop creating MS Access Databases!

continue reading "Please, Stop Creating Microsoft Access Databases!"
Share

About Isaac Sacolick

Isaac Sacolick is President of StarCIO, a technology leadership company that guides organizations on building digital transformation core competencies. He is the author of Digital Trailblazer and the Amazon bestseller Driving Digital and speaks about agile planning, devops, data science, product management, and other digital transformation best practices. Sacolick is a recognized top social CIO, a digital transformation influencer, and has over 900 articles published at InfoWorld, CIO.com, his blog Social, Agile, and Transformation, and other sites. You can find him sharing new insights @NYIke on Twitter, his Driving Digital Standup YouTube channel, or during the Coffee with Digital Trailblazers.