- Dark Data is like that furniture you have in that Dark Cupboard - Dark data is the cute name given to all that data an organization gathers that is not part of their day to day operations
- What Is Dark Data? - It’s that neglected data that accumulates in log files and archives that nobody knows what to do with
- Of Dark Data, Beware You Must - Dark data is usually defined as data that is kept “just in case” but hasn’t (so far) found a proper usage, or can be harvested and leveraged beyond its primary (intended) usage.
All of these definitions are true, but somewhat limited. In What Is Big Data? The Real Challenges Beyond Volume, Velocity and Variety, I provided my definition of Big Data:
Big Data Defined - Big Data is not defined by its data management challenges, but by the organization's capabilities in analyzing the data, deriving intelligence from it, and leveraging it to make forward looking decisions. It should also be defined by the organization's capability in creating new data streams and aggregating them into its data warehouses.As such, my definition of Dark Data is fairly expansive
Dark Data Defined - Dark data is data and content that exists and is stored, but is not leveraged and analyzed for intelligence or used in forward looking decisions. It includes data that is in physical locations or formats that make analysis complex or too costly, or data that has significant data quality issues. It also includes data that is currently stored and can be connected to other data sources for analysis, but the Business has not dedicated sufficient resources to analyze and leverage. Finally (and this may be debatable), dark data also includes data that currently isn't captured by the enterprise, or data that exists outside of the boundary of the enterprise.This basically demonstrates three conditions where data could be, but is not leveraged in Big Data analytics: (i) It could exist in a sufficient format, but the Business hasn't leveraged it yet, (ii) it exists, but it is too costly to clean or process, or (iii) it doesn't exist and it needs to be captured or acquired.
In some ways, Dark Data is the opposite of Big Data..