Wednesday, April 10, 2013

Dark Data - A Business Definition

I mentioned dark data in one my recent posts covering the issues with creating Microsoft Access Databases and with siloed databases in general. I have a broad definition of dark data and more expansive then some posts that I've found -


All of these definitions are true, but somewhat limited. In What Is Big Data? The Real Challenges Beyond Volume, Velocity and Variety, I provided my definition of Big Data:

Big Data Defined - Big Data is not defined by its data management challenges, but by the organization's capabilities in analyzing the data, deriving intelligence from it, and leveraging it to make forward looking decisions. It should also be defined by the organization's capability in creating new data streams and aggregating them into its data warehouses. 
As such, my definition of Dark Data is fairly expansive
Dark Data Defined - Dark data is data and content that exists and is stored, but is not leveraged and analyzed for intelligence or used in forward looking decisions. It includes data that is in physical locations or formats that make analysis complex or too costly, or data that has significant data quality issues. It also includes data that is currently stored and can be connected to other data sources for analysis, but the Business has not dedicated sufficient resources to analyze and leverage. Finally (and this may be debatable), dark data also includes data that currently isn't captured by the enterprise, or data that exists outside of the boundary of the enterprise.
This basically demonstrates three conditions where data could be, but is not leveraged in Big Data analytics: (i) It could exist in a sufficient format, but the Business hasn't leveraged it yet, (ii) it exists, but it is too costly to clean or process, or (iii) it doesn't exist and it needs to be captured or acquired.

In some ways, Dark Data is the opposite of Big Data..



No comments:

Post a Comment