3 AIOps Secrets that Boost Quick Business Impacts

AIOps can help IT Ops manage today’s considerable challenges of scaling operational practices and exceeding business expectations.

Digital transformation is a strategic priority, and that almost always means launching new cloud-native applications, scaling data platforms, and improving employee experiences. At the same time, IT must address more complexities such as supporting legacy enterprise systems, establishing multicloud capabilities, addressing increasing security issues, and responding to global regulations.


AIOps Business Impacts by Isaac Sacolick

In other words, business leaders expect IT operations to deliver more capabilities faster, more reliably, and without dramatically adding costs, technical debt, or complicated operating procedures.

At BigPanda’s Resolve ’21 and Pandapalooza, I heard from several leaders paving a new IT operating model based on automation, machine learning, and integrated ITSM workflows while empowering people to make smarter and faster decisions. These are cornerstone AIOps capabilities, and they can help IT reduce incidents, improve the mean time to resolution (MTTR), and increase system reliability by automating tasks and orchestrating processes.

While this may be an operational pivot and a cultural transformation for people working in IT, these leaders found ways to deliver business impacts early in their journeys. Here are three recommendations I picked out from leaders blazing AIOps trails in their organizations.

1. Start Slowly and Learn Automation Capabilities Before Scaling

Didier Le Tien is the VP of Application Development at US Foods, a 150-year-old company with a long history of transforming to meet the new waves of business challenges.

As a head of application development, business stakeholders expect Didier to deliver new application capabilities and frequently release changes. But Didier also recognizes the importance of methodically improving DevOps practices while also solving operational problems.

Delivering new applications and functionality is not about throwing code into production, and Didier has this recommendation for agile development teams. “Improving velocity doesn’t always mean writing code faster. It also means you need better testing or automation to deploy the code faster.”

He also shared this advice on modernizing IT operations. “Solving problems holistically is very key and looking at bottlenecks in operations, not as silos, but more end to end. When it comes to infrastructure as code, go slow before you go too fast. Take the time to learn the tools and make mistakes before you scale.”

Didier stresses the importance of learning on the job so that people in IT get acclimated to new ways of working and solving IT’s growing challenges. It may be counterintuitive, but starting slower can deliver better results in the long run. And pairing AIOps with other DevOps automations such as CI/CD and IaC ensures that while teams deploy capabilities faster, there are optimized incident management and automation practices to identify operational issues and enable more accurate responses.

2. Develop Dictionaries of Actionable Alerts for AIOps in Incident Management

We all know the saying about data, “Garbage in, garbage out,” which certainly applies when system and application alerts are aggregated and centralized. The objective of open box machine learning algorithms in AIOps solutions is to correlate alerts into manageable incidents, but standardizing alert metadata, developing event dictionaries, and cleansing the alert data can significantly improve the results.

Troy Clifton, Sr. Technical Program Manager at The Expedia Group, shared their approach.  “We built an event dictionary that would allow us to have specific alert triage definitions. The dictionary allows us to leverage an actionable only alerts concept for our operations team. For us, we want to make sure the dictionary, alerts, and tooling are a reality for our operations center and even for some of the small dev teams that want to monitor and operate on their own.”

Developing this operational dictionary is a critical step for enterprises with multiple business units and a larger portfolio of customer-facing and mission-critical applications like The Expedia Group. Having too many unfiltered and unactionable alerts often produces “alert fatigue,” and creating event dictionaries is a prudent step when correlating alerts into actionable incidents.

Sean Mack, CIO/CISO at Wiley Publishing, also recommends cleaning up the number of alerts. During his talk on IT Ops Modernization, he stated, “We undertook a concerted effort to reduce unwanted alerts reducing them by over 50%.” The approach was critical as demand for Wiley’s learning platforms jumped significantly in 2020 because of COVID and the shift to remote learning.

3. Share data with business leaders and enable better decisions

Sean also provided advice on using AIOps and other IT modernization efforts to engage business leaders. In his presentation, he said, “We rapidly deployed a business continuity dashboard which combined system and key business metrics. By sharing our technical and business teams’ data, we used transparency to allow our business leaders to make better decisions in light of a rapidly evolving and changing environment during unprecedented times.”

Sean’s advice is important. Sharing data, dashboards, and insights with business leaders early in the shift to AIOps platforms and automations enables rapid feedback and encourages more people to enroll in the journey. It helps prioritize what business processes, applications, and alerts to focus on and ensures engineers optimize dashboards for IT operations and business use cases.

Ben Narramore, Senior Manager Network Operations at Sony PlayStation, reminded attendees why sharing information during times of volatile changes is critical to business operations. During the closing executive fireside chat, Ben shared how operations changed for Sony Playstation during the pandemic. “Our peaks used to be holiday season and big game releases. Since COVID, we’re living peak season 7x24, and it’s the new normal. The new normal for us is, everything is a peak, users are online all the time, and the numbers are higher than ever before.”

Partnering early with business leaders helps them witness real-time changes in how customers and end-users utilize applications and better understand IT operational impacts. Even when dashboards are works in progress, data normalization requires ongoing efforts, and automation takes time to implement, getting stakeholder participation helps deliver quick wins and lasting support.

So, these leaders advise that the secrets to delivering speedy business impacts from AIOps are to start slow, improve data quality, and engage business stakeholders early in the implementation process.

This post is brought to you by BigPanda

The views and opinions expressed herein are those of the author and do not necessarily represent the views and opinions of BigPanda.

No comments:

Post a Comment

Comments on this blog are moderated and we do not accept comments that have links to other websites.

Share