When someone says that the data integration process is automated, I suggest asking questions to clarify what they mean by automated. You'll probably conclude that the process is anything but automated let alone reliable, scalable, secure, or configurable.
To some, automation implies efficient and reliable but still including manual steps so long as they are performed quickly and easily. Others assume that if the process can be completed without IT's direct involvement then it is automated. Still others don't care whether it's automated but are angered when there is a breakage in the process or if it can't scale magically as more data is piped in. Lastly, there is an assumption that the daily process running on a volume of data will magically scale when existing data has to be reprocessed for a change in business logic or storage schema.
As hard as it is to modify software, modifying a semi-automated (in other words, partially manual) data integration can be even more daunting even if the steps are documented. For example, fixing data quality issues or addressing boundary conditions tend to be undocumented steps performed by subject matter experts.
So now you want to fix data integration. Take out the manual steps. Make the process more nimble, agile, reliable, etc. Why is it so hard to get business leaders on board with the investment needed in data integration?
Data processing and data integration technologies like ETL, Hadoop, Spark, Pig, Hive, IFTTT are difficult enough for technologists to fully understand and solution to the right data issues, but the jargon just frustrates business leaders. Many are clueless about the technologies (other than the over-hyped term Big Data) and are more surprised about the need to have them and invest time to build expertise in them. "Information technology" and processing data has been around a long time, so there is underlying assumption, even with Big Data, that integration is cheap and easy.
Now unless you are doing very basic, point A to B straightforward plumbing, data integration can become quite complex as new data sources are added, logic changed, and new applications developed. Data integration may start off as simple, but over time legacy data flows are difficult to understand, extend, or modify.
So now, whatever technologies IT selects and however it is presented to the business, it comes across as plumbing. All that time investment just to have a stable data flow? Analytics, visualization, application and for the most part doing anything useful with the data will cost extra?
The simple answer is that data integration is a key foundational technology capability if data is strategic to the business. It's not just a technology, it is a competency. Unfortunately, ranked against other data technologies like data warehousing (in RDBMS, NoSQL, etc) and delivery (including analytics, data visualization, mobile application development, etc.), data integration capabilities are often a distant third in priority. So it's no surprise that many processes are not fully automated and that business leaders don't "get" the importance of this capability.
Knowing you have a problem is the first step to solving it... More in my next post!
To some, automation implies efficient and reliable but still including manual steps so long as they are performed quickly and easily. Others assume that if the process can be completed without IT's direct involvement then it is automated. Still others don't care whether it's automated but are angered when there is a breakage in the process or if it can't scale magically as more data is piped in. Lastly, there is an assumption that the daily process running on a volume of data will magically scale when existing data has to be reprocessed for a change in business logic or storage schema.
As hard as it is to modify software, modifying a semi-automated (in other words, partially manual) data integration can be even more daunting even if the steps are documented. For example, fixing data quality issues or addressing boundary conditions tend to be undocumented steps performed by subject matter experts.
Business Leaders are Clueless About Data Integration
So now you want to fix data integration. Take out the manual steps. Make the process more nimble, agile, reliable, etc. Why is it so hard to get business leaders on board with the investment needed in data integration?
Data processing and data integration technologies like ETL, Hadoop, Spark, Pig, Hive, IFTTT are difficult enough for technologists to fully understand and solution to the right data issues, but the jargon just frustrates business leaders. Many are clueless about the technologies (other than the over-hyped term Big Data) and are more surprised about the need to have them and invest time to build expertise in them. "Information technology" and processing data has been around a long time, so there is underlying assumption, even with Big Data, that integration is cheap and easy.
Now unless you are doing very basic, point A to B straightforward plumbing, data integration can become quite complex as new data sources are added, logic changed, and new applications developed. Data integration may start off as simple, but over time legacy data flows are difficult to understand, extend, or modify.
The complexity slows down IT, and if data and analytics is strategic to the business, it frustrates business leaders that they can't just add a new data source, modify calculations, improve on data validation, or strategically change the downstream analytics.
So now, whatever technologies IT selects and however it is presented to the business, it comes across as plumbing. All that time investment just to have a stable data flow? Analytics, visualization, application and for the most part doing anything useful with the data will cost extra?
Data Integration = Core Big Data Infrastructure
The simple answer is that data integration is a key foundational technology capability if data is strategic to the business. It's not just a technology, it is a competency. Unfortunately, ranked against other data technologies like data warehousing (in RDBMS, NoSQL, etc) and delivery (including analytics, data visualization, mobile application development, etc.), data integration capabilities are often a distant third in priority. So it's no surprise that many processes are not fully automated and that business leaders don't "get" the importance of this capability.
Knowing you have a problem is the first step to solving it... More in my next post!
Great article Isaac! Any thoughts about how Segment's raw data access (S3, Redshift) change this?
ReplyDeleteIt's an old problem, getting worse as the data sources and quantity explode. Like a lot of IT, things are done to schedules or customer requirements that lend to a cobbled together solution.
ReplyDelete