Why is Data Sooo Messy and How to Avoid Data Landfills

I was surprised this morning to see an article about the janitorial" work data scientists have to perform to be able to find "nuggets" in big data. Actually, my only surprise is that the story is in the NY Times and that they are covering the least glamorous side of the "sexiest" job.



Why is data so messy?


Let's start with the past. The history of data science starts with complicated data warehouse, expensive BI tools, hundreds if not thousands ETLs moving data all over the place, and bloated expectations. On the other extreme, many organizations have siloed databases, DBAs largely skilled at keeping the lights on (future post?), and spreadsheet jockeys performing analytics. The janitorial work data scientists are performing partially exists because of the mess of databases and derivative data sources previous generations left behind.

And I'm not sure this generation will get it better. As I reported just a couple of months ago, with great power comes even greater responsibility. All the technologies and tools data scientists have at their finger tips also have the power to create a new set of data stashes - informal places where data is aggregated - or buried data mines - places where analytics are performed, but not automated or transparent to future scientists. 

If data scientists, DBAs, and CIOs are not careful the data stashes and buried data mines can slowly transform into full blown data landfills. 

DBAs know what I'm talking about. It's a combination of data warehouses, reports, dashboards, and ETLs that no one wants to touch. No one understands who is using what reports or dashboards in what business process for what purpose or benefit. ETLs look like a maze of buried unlabeled pipes developed using a myriad of materials (programming approaches) and with no standards to help future workers separate out plumbing from filters and valves.

Build Foundations, Not Landfills!


Data scientists and their partners, data stewards, DBAs, business analysts, developers and testers need to instill some discipline - dare I say data governance - and balance their time mining for nuggets with practices that establish data and analytics foundations. For an upcoming post... Remember, big data is a journey.

Until then, here are a few things one can learn about data science from a fourth grade class and think twice about creating another data source!
continue reading "Why is Data Sooo Messy and How to Avoid Data Landfills"

Three Advanced Practices for Agile Development Organizations

Sometime ago I saw a question posted on a social media site, "Why is Agile hard to adopt?" My initial gut response was surprise. The basic practices of agile are relatively straightforward, so what's hard? Driving an agile culture, or getting an organization to "operate with agility", yes, that's not trivial and it takes time for teams and individuals to "think agile".  My conclusion is that agile is not hard to adopt, but maturing a disciplined agile practice that leads to an agile culture, operating with agility, or achieving a truly agile organization takes significant disciplines and practice maturity. 

Let me illustrate this with a few examples. Below are three key practice areas that agile organizations need to mature in order to operate with agility.
  1. Getting stories and requirements written and reviewed on time - Teams new to agile will often site that changing stories mid-sprint or having ill defined stories committed to by the team is a key barrier to getting stories done and accepted at the end of a sprint. This is particular detrimental if an organization has multiple agile teams working collaboratively, or if the organization has distributed agile teams with members at different locations and time zones. The simple and obvious answer is to get stories "locked" at the beginning of the sprint, but this is easier said than done. It implies the team has an agile planning practice with a cadence aimed to finish writing stories, complete having acceptance criteria defined, insure the product owner accepts the story, and provide sufficient time for the team to review, ask questions, and size the story. If you're writing stories up to the last possible day before commitment, or worse, committing to placeholder stories that get better defined during the sprint, then shifting to more disciplined agile planning practice takes work.

  2. Insuring QA is part of the team - Agile teams commit together and get things done together with quality best defined by acceptance criteria and organizational standards. But teams saying things like, "The story is done, we just need QA to review it after the sprint" are missing a key discipline in agile software development - quality is the criteria for defining done. It implies that QA members are part of the team reviewing stories, asking questions and sizing them. Why? Because some stories have more significant testing implications than development tasks. It also means that teams have disciplined practices to insure QA members can do their jobs during the sprint such as developing unit tests, finishing stories early in the sprint, insuring frequent code check-ins and pushes to QA environments, and investing in QA automation.

  3. Partnering with the Product Owner - Partnering may be an elusive, difficult to achieve relationship depending on the organizational structure, pressures that the product owner feels on delivery timelines and scope, individual perceptions on business value and having a shared understanding of what is most important for the business. Partnering implies balance, for example, that technology teams can respond to questions on why a story is expensive or why addressing an identified area of technical debt is important and important now. It also implies that the product owner invests time to express her vision, that he can respond to questions on why a particular feature is important and prioritized, that there is an open and healthy dialogue to explore multiple solutions to a prioritized feature so that tradeoffs are considered, and that there is reasonable priority applied to technical and support needs. Does the product owner think markets or need help targeting minimal viable product? If not, agile technology leaders have some teaching and mentoring to consider.
Hope this helps!

continue reading "Three Advanced Practices for Agile Development Organizations"
Share

About Isaac Sacolick

Isaac Sacolick is President of StarCIO, a technology leadership company that guides organizations on building digital transformation core competencies. He is the author of Digital Trailblazer and the Amazon bestseller Driving Digital and speaks about agile planning, devops, data science, product management, and other digital transformation best practices. Sacolick is a recognized top social CIO, a digital transformation influencer, and has over 900 articles published at InfoWorld, CIO.com, his blog Social, Agile, and Transformation, and other sites. You can find him sharing new insights @NYIke on Twitter, his Driving Digital Standup YouTube channel, or during the Coffee with Digital Trailblazers.