It all starts very simply and innocently with someone needing a place to store data that is a little bit more than what is convenient to store in Microsoft Excel. She thinks, "It's just a couple of tables and I already have MS Access on my desktop", so this shouldn't be too hard. The bad news is that if this database is "successful" it will likely draw others to it forcing the
SadBA (self appointed database business analyst) to consider granting access to her desktop stored database, developing forms, and producing reports. Even worse is when new opportunities present themselves and she decides to create additional MS Access databases. She only calls in IT if she needs something scripted such as more advanced forms or jobs that can load and transform new data.
Flash forward a few years and consider if this behavior is repeated across multiple organizations and locations and you have a classic database mess. IT will probably be asked to perform heroics when a desktop fails and there isn't a sufficient backup, or when there is an MS Office upgrade being planned and these databases need testing, or when the SadBA is leaving the company and no one understands how to support these databases.
As big of a database mess this is, the underlying
data mess can be a daunting maze to unwind. Consider even a single database, a trained DBA would need to understand the underlying data model, document any scripts or procedures loading data, and itemize reporting needs. If any forms were developed and especially if multiple people are using the database as part of a workflow, then you'll need a Business Analyst and possibly an Application Developer to consider how these business processes are accomplished.
Perhaps you've never had to read someone else's code?
Rebuilding a database when it likely has poor naming conventions, missing data relationships, and a complete lack of referential integrity requires a DBA with the skills of a linguistic anthropologist. Now tell this DBA that there are multiple databases that contain duplicate and related data and they'll need some special software tools to normalize the data model, load in data from multiple sources, and match, merge and de-duplicate records, before even considering how to replicate existing functionality.
Why is this a Big Concern?
Even
smaller companies are recognizing the benefits of analytics and Big Data processing. It's relatively easy for a business user to perform analysis on a single data source, or even a handful if the data relationships are understood. This can easily be done in MS Excel or even better, by selecting and correctly leveraging a
self service BI tool. But if there are numerous databases stored all over the place with undocumented data dictionaries, unknown data quality, and little understanding of how to relate data sources, then it is virtually impossible to perform broad analytics on it. It is part of the
company's dark data - data that exists but can't easily be analyzed for intelligence or insight.
Is this your company's sales data, customer data, marketing data, or financial data? More likely, the answer is yes because it's this data that business users work with the most. If the business user needed to perform a quick analysis and IT wasn't accessible, available, or had the necessary agility to solution, then it is likely that a
SpreadSheet Jockey or a SadBA established a solution.
What is the first step to solving this issue? Please, stop creating MS Access Databases!