Becoming Data-Driven and Improving Data Health with Talend’s CTO Krishna Tammana

Many people are experts in processing data, managing big data sets, performing data analytics, and storytelling with data. And then, there are true data experts who have experience working across industries, technologies, and different data complexities and provide lessons and wisdom on how organizations can become more data-driven.

Becoming Data-Driven and Improving Data Health with Isaac Sacolick

One of those data experts is Krishna Tammana, Talend’s CTO, who also has experience working at Splunk, Dun & Bradstreet, E*Trade, Sybase, and several startups.  So when I was given the opportunity to interview Krishna on becoming a data-driven organization and improving data health, I was excited to learn from one of the best in the industry.

You should watch the full interview here as there’s too much to cover in a single blog post. But here are some key learnings for organizations getting started on their data journeys.

Data-Driven Organizations Continuously Monitor Data Health

Krishna describes a data journey that starts with improving data quality, but because data is changing all the time, it requires organizations to trust data on an ongoing basis. Much like IT has network operations centers (NOCs) to monitor infrastructure, networks, and applications, and infosec has security operations centers (SOCs) to monitor and respond to security threats, organizations also need data operations centers to monitor data health and address dataops issues.

Recognizing the need to monitor data health is a key first step for scaling data-driven organizations and practices.

Another step organizations must take to become data-driven is to increase data literacy. Data is created all over the organization, and knowledge of what fields mean and how data analysts should use them often resides with one or a few subject matter experts. Data catalogs are important tools for centralizing data activity, sharing knowledge, and governing data policies.

How do data catalogs work? They become the hub of data activity in the organization where subject matter experts create data dictionaries and other essential documentation, while knowledge workers learn how to tap into the data they need to do their jobs. Data catalogs are thus a collaboration platform between experts, analysts, and decision-makers. They are the backbone for data-driven organizations, especially when role-based permissions give every employee access to data relevant to their jobs.

A third step is to assign a role and responsibility to monitoring data health, managing the data operations center, and improving data catalogs. Krishna and I agree that this is one of the primary responsibilities of chief data officers, and they often manage a team of data stewards who have the skills, tools, and responsibility to monitor data health.

This shift in responsibility is key for larger enterprises that seek to scale their data operations because relying on data analysts or subject matter experts to address data health is often viewed as a secondary responsibility. But it is equally important for medium-sized businesses and SMBs that seek to use data as a competitive differentiator.

Krishna states that one of his goals is to “enable knowledge workers to participate in the data journey seamlessly as opposed to creating silos. In our vision, we just call that self-service.”

Should You ETL or ELT the Data?

Krishna and I jumped into the data weeds of how, when, and where to address data health and transformations. Should you fix the data at the source or implement cleansing rules downstream in dataops? How should organizations leverage data lakes when data is created from IoT and other real-time data sources, stored in multiple clouds, and leveraged by data scientists in various machine learning experiments? How can marketers use a trust score to improve customer 360 data rather than just fixing CRM workflows or using a customer data platform’s limited data processing capabilities?

Krishna offers very practical advice on these questions as it’s not a one-size-fits-all architecture, solution, or data operation.  Krishna believes most organizations need to support “ETLT” because some transformations are more efficient to do upfront before the data is stored (ETL), while app developers and data scientists often need downstream transformations (ELT) specific to their analytics, machine learning algorithms, or customer experiences.

During the interview, I point out the importance of having a versatile platform that allows engineers to shift where and when to implement different transformations in the data operations. Unfortunately, we often label data integration processes as data pipelines. It connotates a rigid, build-once structure that is unlikely to change like the pipes in your house. The reality is that as the data changes, analytics use cases grow, and regulations evolve, organizations must continuously develop and support their data pipelines.

How Machine Learning is Simplifying Data Health

The origins of data quality are in rule and statistical-based methods that help data stewards normalize data sets and manage exceptions. But these approaches often don’t scale well for organizations adding new data sets regularly and when the data changes frequently. I wanted to know from Krishna how and where Talend is using machine learning to simplify and scale data health. 

Krishna replied, “I call it DQ with IQ. It’s data quality intelligence by using machine learning to find more data quality issues easier and then also make suggestions on how to correct them.”

Machine learning can also help data scientists reduce their time in data wrangling and provide new feature engineering capabilities.

Requirements for Trusting Data and Becoming Data-Driven

So becoming data-driven and trusting data has several requirements and implementation factors:

  • Improving data literacy by centralizing knowledge in data catalogs
  • Enabling data operations to monitor and correct data health issues continuously
  • Providing simple-to-use self-service data processing capabilities to scale utilization
  • Establishing nimble, multicloud data architectures as data pipelines evolve
  • Simplifying and automating data operations with machine learning capabilities

There’s a wealth of more information and insights from Krishna, and I hope you will watch the full interview.

This post is brought to you by Talend.

The views and opinions expressed herein are those of the author and do not necessarily represent the views and opinions of Talend.

 

No comments:

Post a Comment

Comments on this blog are moderated and we do not accept comments that have links to other websites.

Share