Many people are experts in processing data, managing big data sets, performing data analytics, and storytelling with data. And then, there are true data experts who have experience working across industries, technologies, and different data complexities and provide lessons and wisdom on how organizations can become more data-driven.
One of those data experts is Krishna Tammana, Talend’s CTO, who also has
experience working at Splunk, Dun & Bradstreet, E*Trade, Sybase, and
several startups. So when I was given the opportunity to interview
Krishna on becoming a data-driven organization and improving data health, I
was excited to learn from one of the best in the industry.
You should watch the full interview here as there’s too much to cover in a single blog post. But here are some key learnings for organizations getting started on their data journeys.
Data-Driven Organizations Continuously Monitor Data Health
Krishna describes a data journey that starts with improving data quality, but
because data is changing all the time, it requires organizations to trust data
on an ongoing basis. Much like IT has network operations centers (NOCs) to
monitor infrastructure, networks, and applications, and infosec has security
operations centers (SOCs) to monitor and respond to security threats,
organizations also need data operations centers to monitor
data health
and address dataops issues.
Recognizing the need to monitor data health is a key first step for scaling
data-driven organizations and practices.
Another step organizations must take to become data-driven is to increase data
literacy. Data is created all over the organization, and knowledge of what
fields mean and how data analysts should use them often resides with one or a
few subject matter experts. Data catalogs are important tools for centralizing
data activity, sharing knowledge, and governing data policies.
How do data catalogs work? They become the hub of data activity in the
organization where subject matter experts create data dictionaries and other
essential documentation, while knowledge workers learn how to tap into the
data they need to do their jobs. Data catalogs are thus a collaboration
platform between experts, analysts, and decision-makers. They are the backbone
for data-driven organizations, especially when role-based permissions give
every employee access to data relevant to their jobs.
A third step is to assign a role and responsibility to monitoring data health,
managing the data operations center, and improving data catalogs. Krishna and
I agree that this is one of the
primary responsibilities of chief data officers, and they often manage a team of data stewards who have the skills, tools,
and responsibility to monitor data health.
This shift in responsibility is key for larger enterprises that seek to scale
their data operations because relying on data analysts or subject matter
experts to address data health is often viewed as a secondary responsibility.
But it is equally important for medium-sized businesses and SMBs that seek to
use data as a competitive differentiator.
Krishna states that one of his goals is to “enable knowledge workers to
participate in the data journey seamlessly as opposed to creating silos. In
our vision, we just call that self-service.”
Should You ETL or ELT the Data?
Krishna and I jumped into the data weeds of how, when, and where to address
data health and transformations. Should you fix the data at the source or
implement cleansing rules downstream in dataops? How should organizations
leverage data lakes when data is created from IoT and other real-time data
sources, stored in multiple clouds, and leveraged by data scientists in
various machine learning experiments? How can marketers use
a trust score to improve customer 360 data
rather than just fixing CRM workflows or using a customer data platform’s
limited data processing capabilities?
Krishna offers very practical advice on these questions as it’s not a
one-size-fits-all architecture, solution, or data operation. Krishna
believes most organizations need to support “ETLT” because some
transformations are more efficient to do upfront before the data is stored
(ETL), while app developers and data scientists often need downstream
transformations (ELT) specific to their analytics, machine learning
algorithms, or customer experiences.
During the interview, I point out the importance of having a versatile
platform that allows engineers to shift where and when to implement different
transformations in the data operations. Unfortunately, we often label data
integration processes as data pipelines. It connotates a rigid, build-once
structure that is unlikely to change like the pipes in your house. The reality
is that as the data changes, analytics use cases grow, and regulations evolve,
organizations must continuously develop and support their data pipelines.
How Machine Learning is Simplifying Data Health
The origins of data quality are in rule and statistical-based methods that
help data stewards normalize data sets and manage exceptions. But these
approaches often don’t scale well for organizations adding new data sets
regularly and when the data changes frequently. I wanted to know from Krishna
how and where Talend is using machine learning to simplify and scale data
health.
Krishna replied, “I call it DQ with IQ. It’s data quality intelligence by
using machine learning to find more data quality issues easier and then also
make suggestions on how to correct them.”
Machine learning can also help data scientists reduce their time in data wrangling and provide new feature engineering capabilities.
Requirements for Trusting Data and Becoming Data-Driven
So becoming data-driven and trusting data has several requirements and implementation factors:
- Improving data literacy by centralizing knowledge in data catalogs
- Enabling data operations to monitor and correct data health issues continuously
- Providing simple-to-use self-service data processing capabilities to scale utilization
- Establishing nimble, multicloud data architectures as data pipelines evolve
- Simplifying and automating data operations with machine learning capabilities
There’s a wealth of more information and insights from Krishna, and I hope
you will watch the full interview.
This post is brought to you by Talend.
The views and opinions expressed herein are those of the author and do
not necessarily represent the views and opinions of Talend.
No comments:
Post a Comment
Comments on this blog are moderated and we do not accept comments that have links to other websites.