Last week, I posted 5 Agile Leadership Practices Where CIO Can Help Data Scientists to provide solutions to messy data, data landfills, data silos, and other outdated data practices that lead to data issues. That post covered cultural practices and organizational principles that make agile teams and organizations successful and how to transfer them to a data science or analytics practice.
This post follows up on the technology side and what data management IT practices and services are essential to establishing or scaling a big data analytics program.
My focus is on scaling the organization's ability to train or hire new data scientists, introduce more analytical capabilities, improve data quality, and aggregate new data.
- Provide collaboration tools and change management practices - Almost nothing frustrates me more than seeing a complex spreadsheet being emailed between colleagues with the intent to collaboratively edit them. The sender gets back multiple versions of the original spreadsheet emailed back and will have the arduous task of merging them. There are far better ways to work on documents together or to share access to them including Office 365, Google Drive, Sharepoint, Jive, or Box. This isn't a technology issue today - it's a training issue and getting business users to phase out behaviors that create duplicate data and cumbersome (email driven) workflows requires ongoing participation of IT in select business processes to help foster change.
- Proactively monitor database performance - No one is happy when a query is slow, a dashboard takes too long to load, or there is a delay in processing a data feed. Not happy is putting it mildly, more like furious and frustrated. What can IT do? Be proactive! Monitor and track database query performance and dashboard load times to know that performance is degrading before users know. Track data load processes and define operational practices to address processes that are running behind. Best yet, leverage cloud instances and automate adding or shrinking capacity based on user activity and performance measures.
- Document databases - Relational diagrams, data flows, defined calculations, data dictionaries, database connection parameters - how many of these do you have documented across critical databases in simple formats that business users - not DBAs- can consume? How many of these are in formats that make it easy to update and maintain? Is the practice defined so that documentation is updated proactively? Organizations that aim to increase the number of data scientists or other analytical capabilities need documentation, and ideally tools for documentation in order to scale the practice.
- Provide Data Warehouse and ETL Services - Users can open a help desk ticket to procure new software, get help with remote access, or get support on using an enterprise application. Are database services as well defined? If a user just received a large spreadsheet, can they request support to load it into a database? If a new dashboard is running slow, can they get help tuning the data model, get assistance reviewing the query, or get help getting indexes built? If there is a new prospect list Marketing wants to leverage, is there a defined practice to connect to the source and load data in? To be competitive, IT departments need to take steps to transform commonly requested data practices into BAU services.
- Define data quality measurement practices - The analysts and data scientists working with data ave and make the best of the data's quality. Sometimes that means ignoring issues, other times they will create complex formulas and other operations to cleanse data. Vocal analytical teams will highlight data issues so that there is a better chance that they can be addressed earlier where the data is collected or processed.
What can IT do? IT can help automate queries and publish data quality metrics. How many bad emails have come in from different marketing lists? What sales people are entering the least amount of prospect metadata? What are the primary sources of duplicate records? The IT team can also review and recommend data quality tools that enable data stewards to develop cleansing rules and handle exceptions that require manual corrections.
Again, these data management practices are primary ones to invest in if scaling a data science or analytics practice. I can discuss data platforms, architecture, infrastructure/cloud scaling, data security and other technology areas in future posts.
No comments:
Post a Comment
Comments on this blog are moderated and we do not accept comments that have links to other websites.