Tuesday, August 19, 2014

Why is Data Sooo Messy and How to Avoid Data Landfills

I was surprised this morning to see an article about the "janitorial" work data scientists have to perform to be able to find "nuggets" in big data. Actually, my only surprise is that the story is in the NY Times and that they are covering the least glamorous side of the "sexiest" job.

Why is data so messy?

Let's start with the past. The history of data science starts with complicated data warehouse, expensive BI tools, hundreds if not thousands ETLs moving data all over the place, and bloated expectations. On the other extreme, many organizations have siloed databases, DBAs largely skilled at keeping the lights on (future post?), and spreadsheet jockeys performing analytics. The janitorial work data scientists are performing partially exists because of the mess of databases and derivative data sources previous generations left behind.

And I'm not sure this generation will get it better. As I reported just a couple of months ago, with great power comes even greater responsibility. All the technologies and tools data scientists have at their finger tips also have the power to create a new set of data stashes - informal places where data is aggregated - or buried data mines - places where analytics are performed, but not automated or transparent to future scientists. 

If data scientists, DBAs, and CIOs are not careful the data stashes and buried data mines can slowly transform into full blown data landfills. 

DBAs know what I'm talking about. It's a combination of data warehouses, reports, dashboards, and ETLs that no one wants to touch. No one understands who is using what reports or dashboards in what business process for what purpose or benefit. ETLs look like a maze of buried unlabeled pipes developed using a myriad of materials (programming approaches) and with no standards to help future workers separate out plumbing from filters and valves.

Build Foundations, Not Landfills!

Data scientists and their partners, data stewards, DBAs, business analysts, developers and testers need to instill some discipline - dare I say data governance - and balance their time mining for nuggets with practices that establish data and analytics foundations. For an upcoming post... Remember, big data is a journey.

Until then, here are a few things one can learn about data science from a fourth grade class and think twice about creating another data source!
continue reading "Why is Data Sooo Messy and How to Avoid Data Landfills"

Wednesday, August 06, 2014

Three Advanced Practices for Agile Development Organizations

Sometime ago I saw a question posted on a social media site, "Why is Agile hard to adopt?" My initial gut response was surprise. The basic practices of agile are relatively straightforward, so what's hard? Driving an agile culture, or getting an organization to "operate with agility", yes, that's not trivial and it takes time for teams and individuals to "think agile".  My conclusion is that agile is not hard to adopt, but maturing a disciplined agile practice that leads to an agile culture, operating with agility, or achieving a truly agile organization takes significant disciplines and practice maturity. 

Let me illustrate this with a few examples. Below are three key practice areas that agile organizations need to mature in order to operate with agility.
  1. Getting stories and requirements written and reviewed on time - Teams new to agile will often site that changing stories mid-sprint or having ill defined stories committed to by the team is a key barrier to getting stories done and accepted at the end of a sprint. This is particular detrimental if an organization has multiple agile teams working collaboratively, or if the organization has distributed agile teams with members at different locations and time zones. The simple and obvious answer is to get stories "locked" at the beginning of the sprint, but this is easier said than done. It implies the team has an agile planning practice with a cadence aimed to finish writing stories, complete having acceptance criteria defined, insure the product owner accepts the story, and provide sufficient time for the team to review, ask questions, and size the story. If you're writing stories up to the last possible day before commitment, or worse, committing to placeholder stories that get better defined during the sprint, then shifting to more disciplined agile planning practice takes work.

  2. Insuring QA is part of the team - Agile teams commit together and get things done together with quality best defined by acceptance criteria and organizational standards. But teams saying things like, "The story is done, we just need QA to review it after the sprint" are missing a key discipline in agile software development - quality is the criteria for defining done. It implies that QA members are part of the team reviewing stories, asking questions and sizing them. Why? Because some stories have more significant testing implications than development tasks. It also means that teams have disciplined practices to insure QA members can do their jobs during the sprint such as developing unit tests, finishing stories early in the sprint, insuring frequent code check-ins and pushes to QA environments, and investing in QA automation.

  3. Partnering with the Product Owner - Partnering may be an elusive, difficult to achieve relationship depending on the organizational structure, pressures that the product owner feels on delivery timelines and scope, individual perceptions on business value and having a shared understanding of what is most important for the business. Partnering implies balance, for example, that technology teams can respond to questions on why a story is expensive or why addressing an identified area of technical debt is important and important now. It also implies that the product owner invests time to express her vision, that he can respond to questions on why a particular feature is important and prioritized, that there is an open and healthy dialogue to explore multiple solutions to a prioritized feature so that tradeoffs are considered, and that there is reasonable priority applied to technical and support needs. Does the product owner think markets or need help targeting minimal viable product? If not, agile technology leaders have some teaching and mentoring to consider.
Hope this helps!

continue reading "Three Advanced Practices for Agile Development Organizations"

Tuesday, July 22, 2014

The Best Line of Code is the One You Didn't Have to Write!

This post isn't about code reuse or developing web services. Surely, you and your development organization understand the benefits of developing modular code, packing it in libraries, developing APIs and web services, insuring that test cases are automated, and hopefully starting to enable continuous delivery. Hopefully you have the architectural practices that recognize build once, leverage multiple times is critical to a software development organization's success in scaling to support multiple applications at higher levels of quality.

This is also not a post about code readability. Again, hopefully your organization has some basic methodologies on naming conventions, coding standards, code reviews, tools, and metrics to insure that code developed by one developer can be understood and maintained by other developers and teams.

How PaaS can Accelerate IT at Lower Investments

This post is about PaaS platforms. Platforms as a Service is a cloud computing platform or service. Whereas Infrastructure as a Service will give IT the low level infrastructure such as a Windows or Linux environment, PaaS platforms represent a computing container. Virtually all the major software vendors and cloud providers have PaaS offerings and Gartner has an extensive taxonomy of different service types.

The PaaS services that excite me as CIO enable light weight, ideally code-less applications. These are higher level environments above programmable database, application, or integration computing PaaS. The higher level PaaS platforms are often suited to specific types of applications such as workflow, analytics, data integration or document management. The platforms then provide tools to configure, develop business rules, or customize user interfaces without or with very minimal coding. The PaaS platforms that are most advanced enable a wide array of applications and can be developed with minimal technology skill set. In fact, some of them fit the "self service" category and can be developed by business users with proper training and IT provided governance practices. Mature platforms also include capabilities such as user directory integration, APIs, mobile and tablet views and standard methods to automate data integration. The most promising PaaS platforms demonstrate significant customer adoption, have proven scalability and performance records, and have low costs enabling IT teams to develop products off of them.

No Code = Speed to Market

In my experience, the best of these platforms accelerate time to market significantly as they enable teams to develop applications without a lot of software development (code) and testing. The very best ones are so light weight, they enable teams to be experimental and change implementations with minimal cost to unwind and rebuild.

Too good to be true? It's not, and the benefits are real, but it isn't trivial to achieve. The real issue is selecting the right platforms that offer the most flexibility for the expected needs with minimal functionality constraints and technology implementation complexities. You can't easily evaluate this by listening to sales people, reviewing analyst reports, or even doing some proof of concepts. I might have to develop a sequel to my top ten attributes of agile platforms to help identify strong contenders.

But for now, software developers should think beyond "good code" or even "great architecture" and think more about "smart solutions" that enable more capabilities with little or minimal code.

continue reading "The Best Line of Code is the One You Didn't Have to Write!"

Tuesday, July 15, 2014

Breakthrough Innovation: From Senseless Sensors to the Internet of Things and Everything

It's hard to escape the media hype on the Internet of Things, the Internet of Everything, or the Industrial Internet. Whether its valuation on the size of the industry ($7.1 trillion by 2020 according to IDC), estimates on the number of sensors(50 billion by 2020 according to Cisco), or futurists predicting the disruption and opportunities coming from IoT technologies that connect the physical and digital world, the media will leave you little doubt that this will be the next big, very big technology paradigm.

IoT Today

But today, what you are mostly seeing is more devices, more innovation in application context specific sensors, or very small networks of connected devices. Examples include home automation networks, wearable fitness devices, transit optimization, smart cities addressing growing energy needs, and beacons in retail stores. The costs of sensors have dropped, and the skills to develop the device and applications are more accessible, so entrepreneurs and corporations already in the device or sensor market can experiment and potentially break through with a market leading product.

Some of these devices will succeed, and the scale will create new technology challenges. Will network infrastructure keep up with the added bandwidth requirements for these devices? Will new security and privacy challenges created by these devices get addressed sufficiently before vulnerabilities are exposed?

Internet Enabled "Senseless Sensors" or Greater Intelligence?

Still, I think this is still the world of senseless sensors. The software in most of these devices largely have local context. Wearable gadgets largely benefit a single user and the parent corporation that gets access to the aggregation of all the data collected. Home automation connects devices in a home with no broader context around neighborhood. Beacons will enhance the experience in the store you are visiting and its parent, but do not provide a local context yet.

Now imagine a shopping mall that finds an intelligent way to pool data between retail outlets, aiming to keep shoppers spending time and money in the mall for longer time periods. What happens when cars communicate to neighboring ones to help avoid collisions? What happens when health monitoring sensors can be programmed to share selected data depending on context to family members or physicians?

Now, think what happens when these same sensors also have logic to respond to its environment. Your car alerts you to slow down, or your physician adjusts your medication levels. So the device not only measure and respond to local conditions, the rules implemented also include variables of greater context.

This to me, is the beginning of the promise of the internet of things.

Breakthrough Success Requires Partnership and Standards

Technology companies are now forming coalitions to pave the way to this future. There is the Open Internet Consortium led by Intel, Dell, and Samsung that is focused on sensor interoperability. Then there is the Industrial Internet Consortium led by IBM, AT&T, Cisco, and GE aimed to accelerate the growth of the industrial internet. In digital health there is an emerging battle between Google Fit, Apple HealthKit and the Samsung Digital Health Initiative all aiming to help control or share fitness data or "create a healthier world".

These coalitions and future partnerships will either accelerate IoT breakthough capabilities or create new barriers, or both. Only time will tell if device manufacturers and software developers will have multiple competing standards to contend with, or if these partnerships will establish an IoT data and integration backbone.

continue reading "Breakthrough Innovation: From Senseless Sensors to the Internet of Things and Everything"

Saturday, June 28, 2014

Friend or Foe? How Microsoft Excel 2013 Creates New Data Governance Challenges

How important is data quality to your Big Data strategy? I suspect that once you have business leaders asking good data driven questions and have the beginnings of big data tools implemented, data quality and governance will become a bigger business priority.

Microsoft has the most successful Big Data tool - it's called Excel. Most business users will jump into Excel first to do any kind of data discovery, analytical analysis, or visualization. Along the way, they may blend in multiple data sources, create formulas, cut/paste data into multiple worksheets, format the worksheet for presentation purposes, create pivots, or save the file into a private repository. Many of these steps make auditing the analysis or reusing the results in larger contexts a challenge for business managers, data scientists or technologists.

Recently, I had the opportunity to ask a Microsoft representative some questions on their Microsoft Excel 2013 and Office 365 products to see if their new functionality helps or hinders data quality practices.

Excel 2013 - With great power comes great responsibility

Excel 2013 comes with some new functions that make data scientists more self sufficient. Tools that were formally part of Microsoft Access are now finding their ways into Excel including creating Excel Data models. Pro versions of Excel also include Power Query to easily discover and connect to data from public and corporate data sources. Excel 2013 also combines in Power Pivot to perform analysis and visualizations as well as some new data quality functions. Excel 2013 Power Query also comes with a set of tools to perform some basic data merging and quality steps.

So with all these new functions, what's not to like? So much power in the hands of sophisticated data scientists who are able to do advanced analytics without IT's help.

The issue is, that Excel always made it too easy for business users to create splintered and derivative data sets. Pull data from enterprise databases, integrate other data sources, complete the analysis, and present findings - but no where in this workflow is there an easy way to incorporate the data, formulas, and other rules back into a central repository to be shared with others. This is a big issue for companies that are trying to leverage Big Data to transform their business or to become a more data driven organization. It is why I asked developers to Please Stop Creating Microsoft Access Databases because the work to integrate or cleanup a siloed database can be considerable.

The Unofficial Microsoft Response

So I asked the Microsoft rep for some answers, "How can businesses implement basic governance when the most popular business analytics tool makes it so easy to create derivative data sources?" Here were his answers

  • IT can revere engineer the Excel Worksheets - I stopped him quickly, almost laughing at this answer, because it's completely not realistic. Maybe your business users create simple spreadsheets, but if there are complex formulas, pivot tables, or copy/pasted data then good luck. Also, most organizations have many more spreadsheet jockeys versus DBAs, so it's highly unlikely that IT can keep up.
  • Saved spreadsheets to Office 365 are discoverable - That's a better answer. At least IT knows where files exist and how frequently they are used. In theory, better defined spreadsheets can get reused by others in the organization.
  • It's coming in Reporting Services - Well, I'm not sure what is coming... 

The new capabilities in Excel, if used without some discipline pose new data quality challenges. So why isn't Microsoft doing even more in their tools to insure analytics can be reproducible, audit-able and centralized? My sense is that is that this may not be in Microsoft's best interests. Microsoft has many competitors at the database, BI, Big Data, reporting and visualization technology spaces, and less (no?) competition for analytic tools that compete directly with Excel. So why make it easier to move the analytics outside of this tool?

continue reading "Friend or Foe? How Microsoft Excel 2013 Creates New Data Governance Challenges"

Monday, June 02, 2014

Five Tips on How To Manage Disruptive Projects in an Agile Organization without going Crazy!

Don't Panic, Posted by J Linwood
Agile teams are best organized when team responsibilities are clearly defined and separated from other teams, when business deliverables can be accomplished with a minimal number of teams, and when there is clear process to handle the communication and dependency between teams. Anyone who has led or participated in an agile organization knows that the easiest deliverables are ones that can be fulfilled by a single team. When multiple teams are needed, you need well defined responsibilities and coordination practices which add to overhead and risk.

A well designed agile organization defines teams to minimize cross-team dependencies. But what happens when a project, epic or other business need requires the coordination of teams that is outside the norm? What happens if these projects have dependencies on vendors or other teams that are not part of your agile organization?

Disruptive Projects in Agile Organizations

I call these Disruptive Projects because they disrupt the natural cadence that agile organizations target. These projects often have blocks, stop and go rhythms, unclear lines of ownership, and probably ill defined technical "nonfunctional" requirements. Examples include infrastructure upgrades, multi-organization workflow changes, ERP driven projects, security investments - basically horizontal projects that cut across multiple departments and technology platforms. These projects might require relatively small, but critical contributions from individual teams and where teams must collaborate differently than business as usual programs.

I have seen many of these projects, and they can drive teams crazy! So if you are leading or on one of these projects, here are some tips that I pass on to my teams.

  1. Break Dependencies - When you hear people say, I'm waiting for another team to finish their deliverable - or that other teams are running behind, my response is, "So what!" Are you fully prepared to run things once another team completes, their work? More often teams are ill prepared on their own responsibilities and use scheduling unknowns or risks to divert responsibility. This behavior is often non intentional and it more often reflects the team's difficulty to break dependencies and to conceive their own responsibilities, requirements and solutions before receiving complete direction. I tell teams, that to improve Agile Velocity, have at least two sprints of fully defined stories in the backlog - one tactic to help the blocks or stop/go rhythms of disruptive projects.

  2. Focus On What You Control - This is related to (1) and sounds academic, but teams often forget this when faced with uncertainty or ill defined requirements. Teams should ask themselves, what can we do today, this sprint, for the next milestone that will improve the health of the project, remove a risk, or realize a new opportunity. When individual teams think this way on a continual basis, they will either improve collaboration (by looking internally first), improve the project's execution (by identifying risks early and working on solutions) or even innovate (by establishing a new tool or capability that is beyond or tangential to the project scope).

  3. Don't Panic! - Faced with a complex project, unknowns, and risks team members sometimes resort to bad behavior including blaming, backtalking, passive aggressiveness, or just expressing their stress. This can be magnified if the project has significant business implications, the team is under stress to hit a deadline, there are a difficult set of requirements, or a combination of these factors. Despite the pressure, it is important for leaders especially in IT to stay calm, logical, and collaborative. I often quote the famous Douglas Adams line, "Don't Panic!" - the planet is not about to blow up.  Teams in this situation have usually failed at (1) and/or (2) and bad behaviors materialize instead. This can be brought under control, but relieving the pressure (Don't panic!), and addressing the fundamental issues (1 and 2).

  4. Ask "Thinking" Questions - Thinking questions should be designed to help teams or individuals recognize gaps in plans or in testing. "Have you thought about..." or "How are you testing for..." or "What will happen if..." type questions force teams to think through their solutions, processes, and testing strategies for completeness.

  5. In the 3rd Period, Change Up Your Lines - In hockey, you'll sometimes see coaches move players from one line to another in the hope that a different combination will surprise the opponent and score goals. This strategy sometimes works for horizontal projects, especially toward their ends (the 3rd period) if the amount of coordination across teams exceeds the amount of work that needs to be completed. Sometimes, it's better to regroup individuals from across multiple teams with the goal of getting the project done.  
Hope you find this helpful!

continue reading "Five Tips on How To Manage Disruptive Projects in an Agile Organization without going Crazy!"

Tuesday, May 20, 2014

Succeeding in Big Data Transformation - It is a Journey, Not a Destination!

Charting the Big Data Journey
Last week I participated in an Executive Boardroom with other CIOs discussing their challenges formulating, partnering and executing their Big Data and next-generation analytic strategies. There were some common themes challenging the CIOs - so if you're a CIO then you're not alone. Big Data can be transformational if you focus on the right questions, develop the talent, adopt practices that are still evolving, and deliver value through selective application of new technologies. It's not an easy path - effectively like driving a car while you're still building it, learning how to operate it efficiently, and contemplating what parts of the engine to upgrade.

One area we debated was whether to define an end state. I took a position on this question and reminded CIOs that getting value from Big Data is a journey and not a destination. It requires similar practices as  innovation and ideation, specifically, an agile approach to identifying the questions of greatest value, data science experimentation to find answers, and retrospective analysis to decide the next step in the journey.

I caught some nods from participants, others who prefer getting the governance defined, and others who would rather see well defined business goals. You decide.

Other Big Data Discussions

  • What's Driving Big Data - Many CIOs are helping Marketing and other business partners develop a 360 customer view. Some are working with Operations and hope that they can better forecast issues and opportunities in their supply chains. Still others are still "trying to get the governance right" and are starting to formally define the business owners and data stewards for their existing data repositories.

  • Demystifying Governance - CIOs have the challenge to address governance issues, but without making these efforts as complete prerequisites to moving the business forward. CIOs recognize that they have to train business leader on new terminology (data steward), technologies (data visualization, data quality), and practices (change management, master data) while still demonstrating business value.

  • Pragmatic Challenges - CIOs debated whether investing in one size fits all dashboards was a better approach versus supporting too many customized reports. Even better, many are trying to determine how to move from historical reporting to more predictive applications. Some are still trying to determine, "What to do with all the data" and "How do we find the nuggets".

All good questions.... It's a journey. Special thanks to Evanta for hosting a great event.

continue reading "Succeeding in Big Data Transformation - It is a Journey, Not a Destination!"