7 Lessons from IT Leaders on Operating at Digital Speeds with AIOps

 IT leaders have been managing a tremendous amount of change over the past several years and significantly more so over this last year because of COVID. On the one hand, are all the business-driven changes for improving customer experiences, enabling machine learning capabilities and improving workflow efficiencies with automation. The other hand is trying to accelerate IT skills, processes, and culture to support cloud migrations, DevOps automations, SRE functions, and AIOps to improve system reliability and performance.

Digital Speed AIOps Isaac Sacolick


There was a lot to learn from BigPanda’s Resolve ‘21 and Pandapalooza event, and my first post on it covered 3 AIOps secrets that boost quick business impacts. This post shares where to find quick wins in automation, incident management, and growing business stakeholder involvement in IT operations.

These leaders also shared many lessons as they’ve adjusted to digital speeds and leveraged AIOps. Here are some takeaways.

1. Technology is Changing the IT Operating Model

Sean Mack, CIO/CISO at Wiley Publishing, kicks off his session, acknowledging that some of the past’s tied and true operating IT principles require reinvention. He states in his opening remarks, “Technology continues to evolve, and as leaders, we must too. If we don’t continue to evolve as leaders, we’re sure to stifle the progress of our teams and our businesses.”

He shares examples such as how the shift from unique infrastructures to ephemeral and disposable cloud environments changes how IT manages and monitors environments.

Most importantly, technology is now a core business capability, and Sean states that business and technology are inseparable. Understanding how to deliver small, incremental capability updates and become more customer-focused are table stakes for today’s IT organizations.

2. Digital Requires Driving Fast; Technical Debt is the Friction

Nag Vaidyanathan is CTO of OneMain Financial, America’s largest personal installment loan company, with fifteen-hundred branches, six contact centers, and ninety-five hundred employees. Nag acknowledged that “Many people think that because we are an old company, it is very hard for us to make changes.”

But he goes on to share many examples of how the bank needed to adapt their loan origination, call routing, pricing, and other practices to adjust to customer needs during COVID. He confesses, “When I reflect back, it looks like, how in the world did we do all these things?”

Technical Debt - Isaac Sacolick

Part of their success included automating CI/CD pipelines, building loosely-coupled business services, and migrating to cloud-native datastores. But Nag acknowledges, “You never realize the impact technical debt can have when you need to accelerate.”

All race cars have to pit to fill the tank, change tires, and check the engine. Driving a fast digital engine requires IT leaders to make smart prioritizations on what areas of technical debt create the most friction and how IT should address them proactively.

3. Agile Practices and Culture Enable Digital’s Velocity

IT leaders recognize that agile practices and culture must extend beyond application development teams into business functions and IT operations.

Scott Johnson, SVP of Infrastructure as a Service at Equifax, shared many insights during his panel on becoming an analytics company. “We’ve moved to a full agile-driven engineering and operations organization and embraced a product mindset for our products that we deliver to the organization, such as our certified pipeline.”

Scott’s colleague, Dan Grace, Global Technology Operations Leader at Equifax, acknowledged the transformation’s scope. “It’s a huge culture shift going from waterfall to an agile mindset at a one-hundred-year-old company with the people, technology, and the partners.”

Sean Mack also disclosed how Wiley realigned to an agile organization. “We moved from rival teams to collaboration and teams of teams. The cross-functional delivery team includes developers and QA, but also SREs, and database reliability engineers.”

Part of the realignment requires elevating how people in IT understand customers and products. Sean recommends that “People can be deeply skilled, but need a broad sense of the context of their work around the product and customer.”

4. The Impact of Speed and an Always-On IT Operations

It’s probably time to retire IT Ops terms and practices like scheduled downtime, blackout periods, and manual failovers. If digital transformations didn’t change the IT operating model, then surely COVID has accelerated how important reliable, secure, and high-performance IT systems are to business operations.

Automation AIOps - Isaac Sacolick

Dan Grace from Equifax shared some of the changes and impacts. They pushed the gas pedal and decreased the MTTR from the hour recoveries accepted a decade ago to minutes. It required getting more people certified on their public cloud technologies and shifting everyone’s mindset that environments are always on.

Dan states a clear objective, “We have to drive automation into everything that we do.”

5. Seek Single-Pane Tools to Tame Hybrid Cloud’s Complexities

Hybrid and multicloud may sound sexy, but it adds significant complexities to IT operations. Tools optimized for one cloud technology may offer productivity and innovation, but the aggregate of supporting multiple cloud-specific technologies can become a nightmare for IT operations.

The complexities are most pronounced with monitoring tools. It’s where AIOps can significantly impact ITSM teams that must respond and resolve a growing variety of incidents faster and more reliably.

Scott Johnson of Equifax shared the realities of operating the hybrid cloud. “Running an always-on cloud-native paradigm as well as running on-prem is an extremely tough environment to be in. Troubleshooting, event correlation, did a change something you did on the on-prem side blow up something in the cloud? Being able to manage in that hybrid state is tough.”

Organizations may have different cloud strategies, but one commonality is the growing number of monitoring tools used to capture data and alert on problems. AIOps with open-box machine learning capabilities helps IT correlate alerts into manageable incidents.

6. The Importance of Emphasizing a Blameless Culture

I’m going to call a spade a spade. We’ve all seen how IT operations get all the punches thrown at them if there’s an outage, when resolutions take too long, or why communications during a major incident’s bridge call miss expectations.

Learning in IT - Isaac Sacolick

Of all the principles tied to devops cultures and SRE practices, many presenters at BigPanda’s Pandapalooza emphasized the importance of spearheading a blameless culture. Not only does it promote more positive behaviors in IT, but it’s doubly important to encourage this behavior with business stakeholders and leaders.

Sean Mack of Wiley rationalized the importance of a blameless culture. “The focus is on learning and less about preventing mistakes.”

Learnings help IT prioritize fast and longer-term remediations, while behaviors aimed at preventing mistakes are overly defensive given today’s importance of system reliability and performance.

7. Simplify to Fewer and Straightforward ITSM KPIs

I know too many CIOs that work hard defining KPIs that are meaningful by functional areas and disciplines. It’s a tall order instrumenting all the metrics and processes, scheduling time to review them, and ensuring that priorities target meaningful improvements.

Sometimes, less is more, and easier can be more meaningful. Nag Vaidyanathan takes this approach and applies three straightforward ITSM KPIs to measure his organization’s operational performance: System availability, the mean time to recovery (MTTR), and the change success rate.

If only driving digital was like driving a car with a few simple dials and AI to handle complexities. We’re not there yet in IT, but these progressive leaders are heading in the right direction.

This post is brought to you by BigPanda

The views and opinions expressed herein are those of the author and do not necessarily represent the views and opinions of BigPanda.

No comments:

Post a Comment

Comments on this blog are moderated and we do not accept comments that have links to other websites.

Share