What Technologies Work Best for Decentralized Data Scientists?


If data scientists, analysts, quants, or BI specialists are in a centralized department, then that group can staff and train its members to support one or more technologies based on business need. Technologies such as data processing, analytics, statistics, visualization, or data mining are good examples. 

But what happens when these resources are scattered across multiple departments. One department may have an expert data scientist, another may have a small group doing internal reporting, and a third group might have outsourced its analytic function. If data scientists in the organization are decentralized with different goals, skills, and operating models, can IT still provide a common set of Big Data and analytic tools and services to the organization and support these different functions?

The answer is yes, but decentralization leads to a different set of technologies and IT services. Since different users will have different goals, capability needs and skills, IT needs a Swiss army knife of data management and analytic technologies and related services

Self Service BI - The Analytic Swiss Army Knife?


That Swiss army knife has come with new technologies branded as "self service" BI that aim to enable business users - and not IT - to solve many data processing, analytics, or visualization tasks. The software companies developing these technologies recognize that IT can be a bottleneck to solving data challenges and have developed products that take the coding out of data tasks. With these tools, you can aggregate data sets, perform joins, cleanse data, map data, perform analytical calculations, identify trends, seek outliers, and develop dashboards - all with minimal coding!

Data scientists working in different departments can make great use of these tools. Imagine one in marketing that can blend their marketing database with a social networking feed to develop insights on prospects? Consider someone in sales ops who develops dashboards for sales directors making it easier to understand and action the sales pipeline? A financial analyst can develop common reporting dashboards and departmental specific reports.

But these tools deployed without defined practices and governance will create a new generation of potential data silos, bury analytical calculations, create another form of spreadsheet jockey, or produce too many dashboards. They will create work-arounds to performance issues or  duplicate data in order to make today's analysis more convenient. They might expose sensitive data to too many people in the organization or violate privacy or compliance constraints when moving or storing data.

The role of IT in Self Service BI Programs


So with these great tools comes even greater responsibilities. For brave technologists and CIOs embracing a decentralized data strategy, the task does not end with identifying talent, selecting and implementing "self service" technologies, and training. It must define new data practices and governance, clearly identifying the responsibilities of business users and demonstrating the value of IT by providing a matching set of data services.

Where are workbooks versioned? How are analytical calculations published, validated, and tested? How does one request assistance integrating new data or help solving a query performance issue? How are new tools evaluated and upgrades tested? What types of documentation is required, where is it published, and how often is it updated? How is security enabled? How does the organization measure data quality? What visualization standards will make it easier for enterprise users to leverage data and dashboards in their decisions making?

These questions need technology solutions and service definitions. The CIO needs to define a new set of data management practices and lead the organization to be more data driven.

I suspect that as organizations become more data driven, the more data science skills will be needed, the more likely they will be deployed across the organization and therefore more likely self-service BI programs will be established.

continue reading "What Technologies Work Best for Decentralized Data Scientists?"

Five Data Management Practices IT Needs to Better Support Data Driven Organizations


Last week, I posted 5 Agile Leadership Practices Where CIO Can Help Data Scientists to provide solutions to messy data, data landfills, data silos, and other outdated data practices that lead to data issues. That post covered cultural practices and organizational principles that make agile teams and organizations successful and how to transfer them to a data science or analytics practice.

This post follows up on the technology side and what data management IT practices and services are essential to establishing or scaling a big data analytics program.

My focus is on scaling the organization's ability to train or hire new data scientists, introduce more analytical capabilities, improve data quality, and aggregate new data.

  • Provide collaboration tools and change management practices - Almost nothing frustrates me more than seeing a complex spreadsheet being emailed between colleagues with the intent to collaboratively edit them. The sender gets back multiple versions of the original spreadsheet emailed back and will have the arduous task of merging them. There are far better ways to work on documents together or to share access to them including Office 365, Google Drive, Sharepoint, Jive, or Box. This isn't a technology issue today - it's a training issue and getting business users to phase out behaviors that create duplicate data and cumbersome (email driven) workflows requires ongoing participation of IT in select business processes to help foster change.

  • Proactively monitor database performance - No one is happy when a query is slow, a dashboard takes too long to load, or there is a delay in processing a data feed. Not happy is putting it mildly, more like furious and frustrated. What can IT do? Be proactive! Monitor and track database query performance and dashboard load times to know that performance is degrading before users know. Track data load processes and define operational practices to address processes that are running behind. Best yet, leverage cloud instances and automate adding or shrinking capacity based on user activity and performance measures. 

  • Document databases - Relational diagrams, data flows, defined calculations, data dictionaries, database connection parameters - how many of these do you have documented across critical databases in simple formats that business users - not DBAs- can consume? How many of these are in formats that make it easy to update and maintain? Is the practice defined so that documentation is updated proactively? Organizations that aim to increase the number of data scientists or other analytical capabilities need documentation, and ideally tools for documentation in order to scale the practice.

  • Provide Data Warehouse and ETL Services - Users can open a help desk ticket to procure new software, get help with remote access, or get support on using an enterprise application. Are database services as well defined? If a user just received a large spreadsheet, can they request support to load it into a database? If a new dashboard is running slow, can they get help tuning the data model, get assistance reviewing the query, or get help getting indexes built? If there is a new prospect list Marketing wants to leverage,  is there a defined practice to connect to the source and load data in? To be competitive, IT departments need to take steps to transform commonly requested data practices into BAU services.

  • Define data quality measurement practices - The analysts and data scientists working with data ave and make the best of the data's quality. Sometimes that means ignoring issues, other times they will create complex formulas and other operations to cleanse data. Vocal analytical teams will highlight data issues so that there is a better chance that they can be addressed earlier where the data is collected or processed. 

  • What can IT do? IT can help automate queries and publish data quality metrics. How many bad emails have come in from different marketing lists? What sales people are entering the least amount of prospect metadata? What are the primary sources of duplicate records? The IT team can also review and recommend data quality tools that enable data stewards to develop cleansing rules and handle exceptions that require manual corrections.

Again, these data management practices are primary ones to invest in if scaling a data science or analytics practice. I can discuss data platforms, architecture, infrastructure/cloud scaling, data security and other technology areas in future posts.

continue reading "Five Data Management Practices IT Needs to Better Support Data Driven Organizations"

5 Agile Leadership Practices Where CIOs Can Help Data Scientists

I've compiled a number of posts on data landfills and other bad data practices and have made a commitment, at least on Social, Agile, and Transformation to begin providing solutions.

I've always felt that other disciplines would benefit from well established technology practices. Agile practices have enabled software development teams to find sponsors, prioritize work, change the culture, insure work gets done, and market their accomplishments. I think data scientists face similar challenges and should benefit from many agile practices that have helped transform software development organizations.

I say elements, because while software development is often a collaborative practice performed by teams, that isn't always the case in data analytics work. Data scientists may not be in the same organization (team) and are often working individually or in pairs on different analytics. So while many agile practices are relevant to data science work, they have to be adapted to the nature of how this work gets done.

Also, in this post I've started with the leadership practices and might cover management practices in a follow up post. Key agile leadership practices are below:

  • Sponsor work - Data scientists, data geeks, quants, data analysts, bi specialists - all go by different names in different organizations but business leaders don't always know how to best engage their capabilities or services. The CIO can lead the way by sponsoring analytics projects or drawing attention to a team or individual's capabilities. The CIO also has access to the organization and can help network departments that have high value data analytics work and are ready to partner with or hire data scientists. Sponsoring the work begins to establish an "Owner" role, similar to an agile product owner role, that can define a vision, articulate business value, and prioritize work.  

  • Address the culture - Becoming a data driven organization is not just about having data scientists, it requires a commitment top down and bottom up to leverage data in decision making. This is often a culture change that requires leaders to educate the organization and find ways to align on simple practices. One of them, is to educate the organization to ask questions. Agile is not just a process - its a culture change that requires teams and organizations to think agile. 

  • Establish practices for prioritization - Prioritization is a key practice for agile technology teams that have to align their efforts on a product release or development sprint to features and fixes that provide the highest business value. Data scientists face the same challenge in determining what questions to answer or analytics to prioritize. Leveraging agile practices and tools to help make the data scientists' workload transparent and establishing practices to prioritize work is a good place for CIOs to add value. 

  • Review results and ask questions - Agile development teams will demo their work after the sprint and answer questions from sponsors. Data scientists would benefit by adopting a similar practice by schedule analytics reviews where they can showcase a visualization, tell a story, and suggest follow up work. CIOs can help by promoting these sessions, attending, participating, and asking good questions. 

  • Get out of the away - Agile has its self organizing principles, enabling teams to have some authority around how they organize work to get things done. Data scientists also need a little bit of freedom to be who they are - scientists. Sometimes that means blazing a trail in new areas - new technologies, new data sources for example - to determine if they are useful to get a job done. Sometimes that means creating some work arounds, or creating "data processing debt" (more on this in another post, but this is the data analogy to technical debt) in order to get a job done on time.  
While I don't think these are new concepts, I haven't heard too many data scientists and their managers describe their culture or work with these practices. Similarly, while CIOs are more often consumed by Big Data platforms, I rarely hear them talk about aiding data scientists with basic practices. Common ground?
continue reading "5 Agile Leadership Practices Where CIOs Can Help Data Scientists"

Killing Bad Data Practices - Acknowledging The Problem is Half The Battle

I posted on LinkedIn earlier this week The Big Data Challenges All Organizations Face summing some symptoms and solutions to siloed databases and ungoverned data practices. If you work on data, manage the databases, or rely on it to make decisions then you probably understand bad data issues and can relate to some of the symptoms:

Do you email spreadsheets between coworkers to edit and review? Are there only a select few people in the organization capable of pulling or interpreting data out of core systems because of different data quality issues? Do presentations provide insights backed by data sources and assumptions? Does it seem like you have hundreds of reports and dozens of dashboards but none of them suit your day to day needs? Do you start a new analysis by cutting a new data set, or are you able to leverage defined tools to connect to predefined data repositories?
My solutions involve better governed self service BI programs, partnering with departments that have critical data needs like the CMO and marketing, and getting alignment on the data platforms needed for growth.

Need some examples? I'm sure you can site some examples, but these Excel horror stories collected by the European Spreadsheet Risk Interest Group should frighten any data scientist, data architect, or data driven business executive.

My next posts on this subject will be solutions focused. Remember, acknowledging there is a problem and recognizing its impact is half the battle.




continue reading "Killing Bad Data Practices - Acknowledging The Problem is Half The Battle"
Share

About Isaac Sacolick

Isaac Sacolick is President of StarCIO, a technology leadership company that guides organizations on building digital transformation core competencies. He is the author of Digital Trailblazer and the Amazon bestseller Driving Digital and speaks about agile planning, devops, data science, product management, and other digital transformation best practices. Sacolick is a recognized top social CIO, a digital transformation influencer, and has over 900 articles published at InfoWorld, CIO.com, his blog Social, Agile, and Transformation, and other sites. You can find him sharing new insights @NYIke on Twitter, his Driving Digital Standup YouTube channel, or during the Coffee with Digital Trailblazers.