a data science skill shortage, so it's unlikely most organizations have sufficient data scientists to perform all the analytics. Many organizations simply can't afford data scientists or have the cache to recruit them, and enabling citizen development programs is one way CIO can address the technical skill gap. To be successful, most organizations need to consider training and outfitting "citizen" data scientists that can take develop analytics and mentor colleagues to use them in both strategic and tactical decision making.
Kicking off Citizen Data Science Programs
I've already blogged on how to kickoff a citizen data science program. Read this post to see how to find early adopters for the program, get buy-in to support the program, and start developing standards and practices. I've also shared what services citizen data scientists need to be successful, and how to assign data roles/responsibilities between data and technology teams. I also suggested best practices on developing dashboards and also laid out an agile process to finding value in dark data.
But I haven't spoken about technologies and platforms for citizen data scientists. Selecting platforms is a very important consideration in order to make the program both short and longer term successful. Keep in mind that all organizations already have some tools for business analysts to process data including Excel and other legacy BI platforms. Organizations should look beyond these tools if they are serious about citizen programs. From my post on data governance challenges around Microsoft Excel, "The issue is, that Excel always made it too easy for business users to create splintered and derivative data sets." This is in addition to the very long list of Excel horror stories aggregated by the European Spreadsheet Interest Group. The other issues CIO fear is that empowering citizens will lead to a proliferation of single-purpose reports and dashboards, similar to what many organizations implemented in their legacy BI solutions.
Data Visualization Tools Selection Criteria
So if you're going to outfit citizen data scientists, you have to consider some traditional business requirements and some newer, "big data" driven ones in order to pick tools that are appropriate for the size, scale, skill, and complexity of the organization, the underlying data, and the analytics required.
For starters, since becoming a data driven organization is key to successful digital transformation programs consider reading six critical strategies for selecting breakthrough digital transformation platforms and my other post on six digital criteria to evaluate superior technology. These posts highlight a number of generic considerations when selecting technologies such as (i) align on vision, strategic opportunity, and short term needs, (ii) use experts to define solution sets, (iii) perform detailed reviews of the user experiences, (iv) evaluate documentation and the health of the tool's ecosystem (data, integration, and developers), and (v) consider the organizational impact of the tool.
When selecting data visualization tools for data scientists, a number of more specific criteria emerge based on the people, data, analytics, and other constraints.
1. People, Skills, and Organizational Impact
These criteria require you to understand the needs of three types of users (i) citizen data scientists that will be the primary developers of dashboards and analytics, (ii) data scientists, quants and statisticians who may also use this tool but may have additional integration requirements, and (iii) end users of the completed dashboards and analytics
|Number of citizen data scientists||More training and governance will be required for larger teams.|
|Skill levels of the citizen data scientists||If low skill, less sophisticated tools with easy user experiences will yield faster results. Tools that heavily rely on programming models may be difficult for novice groups.|
|Organization also has skilled data scientists, quants, or statisticians||Decide on whether they are in scope and if yes, consider their integration needs. Advanced data scientists may prefer data visualization tools with programming models that offer more flexibility and integration capabilities.|
|Number of departments that will leverage completed dashboards||More departments imply disparate use cases. Consider tools that have programming models or have mechanisms that enable reusing visuals.|
|Number of users that will access completed dashboards||If large audiences, then user experience of the dashboards and visuals should be a top criteria.|
Bottom line: These criteria should help you decide whether you need a simple and easy tool for a small, less sophisticated group or a more comprehensive tool aimed at higher skilled developers and greater organizational needs.
2. Data Management Considerations
You can't complete data discovery work, perform analytics or create dashboards without some consideration of the underlying data sources and their complexities.
|Number of data sources||Quantity is important, but more important is whether citizen data scientists will be incorporating new data sources on a regular basis|
|Big data considerations?||Are you handling larger volumes, higher velocity, or greater variety of data sources and types?|
|Real time data?||Does your organization require processing data in real time?|
|Data quality, transformation, or master data consideration?||Are you connecting to relatively clean data sources or do you expect significant data processing and preparation will be needed? If yes, you may need a data preparation tools such as those offered by Informatica, Talend, Alteryx, and Trifacta.|
|Enterprise data sources?||Most organizations will look to secure and automate data connections from enterprise data sources.|
|SaaS data sources?||Many SaaS providers have APIs for pulling data. Review whether the data visualization tool offers a direct connection to your SaaS platforms or if one is available through platforms such as IFTTT or Zapier.|
|IoT data sources?||Sensors often produce a large volume and velocity of data. You'll likely need data storage and stream processing technologies to handle IoT sources before connecting a data visualization tool.|
|Confidential, privacy considerations||Will you need to consider mechanisms to secure data and manage entitlements? If yes, then you need to review the security capabilities of the data visualization tool and also consider adding tools that mask and encrypt data elements.|
Bottom line: These criteria all speak to whether you require additional data integration, preparation, processing or management tools in addition to any data visualization tools. Many of the data visualization tools come with some data preparation capabilities and some market themselves as an end to end data management tools. A few will try to sell you on the concept that all you need is them, no other databases, ETL, or data integration tools because they come with all the required capabilities.
So these criteria should help you flush out whether this is a realistic proposition. The more data sources, the bigger the data, the more enterprise sources, and the complexity of the data preparation work are all indicators that you will likely need additional data management tools. On the other hand, if you're working with relatively few, less complex data sources then make sure to evaluate the data preparation capabilities of the data visualization tools and see if they are "good" and "easy" enough.
Before getting to the heart of the analysis, you'll want to consider other selection constraints.
|Legacy tools||Does your scope include phasing out any legacy BI or reporting tools? If yes, you'll want to consider what dashboards, reports, or analysis are in scope for conversion and where there is flexibility to modify output formats.|
|Business model||In addition to overall cost, you'll want to consider how the vendor prices with usage and whether that will create higher than expected costs as usage increases. This is a very important criteria for customer facing analytics especially if customer will receive access to the data visualization tool.|
|Costs and budget||Pricing models may box out smaller organizations from selecting the more sophisticated tools. Can you afford it?|
|Regulation||Regulations may pose requirements on how and where data is stored and accessed. It may also require auditing, analytics lifecycle, documentation, and other data governance capabilities.|
|Hosting options||SaaS? Cloud? Data center? What options are available and your organization's requirements?|
Bottom line: Technology selections need to consider financial, legal, logistical and other constraints. It's best practice to identify these up front to help limit the scope of the review.
4. Data Visualization and Analytic Capabilities
You'll spend most of your time evaluating data visualization tools based on their visualization capabilities, ease of use, and sophistication of analysis.
|Chart types available||Every visualization tool comes with a toolkit of chart types. All will have bar charts, pie charts, data tables, etc. but some will include geo mapping, heat maps, node graphs and other more sophisticated visuals. What's required versus nice to have?|
|One time or ongoing analysis||If you're conducting more one time discovery work, then you'll want to consider how easy it to use "out of the box" analytics and review the tool's story telling capabilities. (Some good examples of story telling are here and here.)|
|Internal or customer facing||If you intend to develop customer facing analytics then this has implications on the type of delivery expected (direct access versus pdf outputs for example), whether there are style or branding considerations of the final product, security considerations (how to enable data entitlements), and performance considerations (speed becomes more critical).|
|Analytics needs||Aggregations? Trends? Modeling? Machine learning? You'' want to consider not only whether the tool has the capability, but how easy it is to use and whether you'll need to integrate with programming environments such as R or Python to implement these algorithms|
|Visual configuration needs?||It's one thing to have the chart types desired, but then you should consider how easy they are to configure and the overall configuration capabilities. If you're doing customer facing visuals, then reviewing the visual configuration capabilities is important to ensure that the output meets minimal customer expectations.|
|Reusability? Standards||If you plan to develop a large number of dashboards or analysis, you'll want to consider how to reuse and standardize elements such as dashboard layouts, chart configurations, calculations, expressions and other elements that are programmed.|
Bottom line: These criteria all address the core capabilities of the tools and separate out less sophisticated needs versus more flexibility and analytical capability. You'll want to invest considerable effort investigating these capabilities, but be prepared to make compromises. Most tools can't be all things to all people but many will try to sell you that they can handle your requirements. The best way to evaluate these tools is to run proof of concepts.
Data Visualization Tool Selection Process
- Define a tool selection committee and have them propose a charter - Keep this team small, but empower them to make decisions to avoid stakeholder conflicts.
- Use primary selection criteria to short list the tool set - There are a large number of data visualization tools in market today, so use the criteria from people/organization, data management, and constraints to help narrow down the list.
- Commission proofs of concepts to evaluate the visualizations and analytics - It's better than doing a paper evaluation, Have a small group of your proposed data scientist use the short listed tools against some of the short term needs and evaluate the output, effort, performance, and end user satisfaction.