10+ Awesome LLM and Generative AI Capabilities for DevOps and IT Ops

Over the next few years, we’ll see a seismic shift in how devops organizations, agile development teams, site reliability engineers, and IT Ops will achieve an increasingly complex mission.

How can IT improve reliability, performance, and security while deploying more innovations at increasing release frequency and with fewer incidents and defects?  

10+ Awesome LLM and Generative AI Capabilities for DevOps and IT Ops

Over the last ten years, solutions have included migrating to the cloud, centralizing observability data, automating operations, leveraging machine learning, and deploying other AIOps capabilities.

And for the next several years (One? Three? Five? What’s your estimate?), we’ll see new generative AI platforms emerge, and existing platforms add LLM capabilities that will transform how IT teams operate.   

“In platforms targeting DevOps, IT Ops, and ITSM, the remarkable capabilities of GPT and LLM are transforming operations,” says Vijay Iyer, president of Americas at Mastek. “With advanced problem-solving abilities, GPT and LLM platforms empower organizations to efficiently address complex issues, optimize efficiency, and drive innovation in the IT landscape.”

What can IT, DevOps, SREs, and developers do today with gen AI and LLM capabilities to improve IT operations? Here’s a list:

1. Generate service level objectives

Kit Merker, chief growth officer of Nobl9, has an optimistic viewpoint on generative AI’s impact on DevOps, SRE, and IT Ops. “I don’t believe that GPT technologies will put developers or DevOps folks out of a job soon — to the contrary, it will create more jobs! — a lot of mundane and repetitive code-adjacent tasks can be further automated using specialty LLMs,” he says.

Merker shares a great example of how generative AIs can capture reliability data, and help site reliability engineers create service-level objectives. “SLOgpt.ai is an example of this, which uses Google Vertex AI and PaLM2, and is trained to understand reliability engineering concepts and can even answer questions about a Service Level Objective (SLO) generated from a user-uploaded screenshot of an observability metric,” he says. “You can ask SLOgpt.ai to create an OpenSLO yaml or to write a song about your SLO; the choice is yours.”

2. Propose incident root cause

Marko Anastasov, co-founder of Semaphore CI/CD, says that instead of gathering in war rooms and organizing bridge calls to review mounds of operational data, IT Ops can use LLMs to identify the root cause of incidents. “In this field, GPT and LLM can be used to automate incident response by providing real-time insights into the root cause of an incident and suggesting remediation steps,” he says. “This reduces the time to solve incidents, improves customer satisfaction, and makes the lives of support staff much easier.”

3. Grind out troubleshooting, creating documentation, and managing policies

Working in IT has many bright spots to showcase innovations, automate processes, and improve system reliability, but some responsibilities are time-consuming drudgeries. Tony Johnson, CI/CD Engineer III at Rise8, says Gen AI can be a powerful assistant. “With the evolution of GPT and LLMs, DevOps, IT Ops, and ITSM platforms now house predictive troubleshooting, automated documentation, and real-time policy enforcement capabilities, unleashing new heights in operational efficiency and resilience,” he says.

Read more from Rise8 on achieving impactful software and user joy, spearheading digital transformation with action and ambition, and shipping with continuous delivery.

4. Query log files to find anomalies

One user generates expensive queries undermining performance for all active users – how do you find the needle in the haystack? Emily Arnott, content marketing manager at Blameless, suggests using an LLM to query log files to find the answers. “A capability on the close horizon for LLMs is parsing huge log files that typical regex searching can’t make sense of,” she says. “Operations people often end up with a huge surplus of data and want to find any patterns or anomalies that can be detected in them. LLMs make this easy: even if you don’t know exactly what you’re looking for, they’re sophisticated enough to highlight things worth seeing.”

5. Migrate scripts and automations across platforms

When you need to change platforms, do you have to rewrite all the scripts and automations or hire someone to do all the work to port code across platforms? Not so, says Andrew Amann, CEO of NineTwoThree Studio. “We’ve recently leveraged ChatGPT’s innate ability to translate from one language to another to convert Terraform scripts to CloudFormation,” he says. “ChatGPT reduced 90% of the effort, requiring minimal edits and freeing time to test ported scripts thoroughly. We also did the opposite (CloudFormation to Terraform) for another client to become cloud agnostic.”

Coffee with Digital Trailblazers hosted by Isaac Sacolick Digital Trailblazers!  Join us Fridays at 11am ET for a live audio discussion on digital transformation topics:  innovation, product management, agile, DevOps, data governance, and more!

6. Reduce time to resolve incidents

When there’s a major incident causing an outage or slow performance, IT Ops feels their customers and employers breadth on their necks to resolve issues and restore systems.

“We’ve recently added AI-driven insights to our open-source-powered platform via a large language model (LLM) to reduce the critical mean time to recovery (MTTR) metric while extracting more value from less telemetry data and lower storage costs,” says Asaf Yigal, co-founder and VP of product at Logz.io. “Generative AI is proving to be key at helping engineering, DevOps and ITOps teams optimize cloud applications and infrastructure to handle new and emerging availability, performance, resilience and security issues.”

Logz.io recently announced the integration of generative AI into its Open 360 Platform and AI alert recommendations to reduce MTTR.

7. Search to find new technologies and platforms

If you’re researching new technologies, platforms, and services, Amann suggests trying an LLM search like Bing to find answers quickly. He asks, “Do you find yourself looking for a step-by-step guide, not knowing the exact name of the technology?” New Bing Search works when you ask questions like: ‘Find me a recent guide to set up access from GCP to an on-prem SQL database, similar to what is called site-to-site VPN on AWS.’”

8. Remediate issues with an LLM’s recommendations

“Houston, we have a problem,” and now comes the tough task of recommending the appropriate fixes that are easy to execute and are low risk of breaking other services. Matt Riley, GM of enterprise search at Elastic, suggests, “With generative AI technology taking the world by storm, we’ve already seen several leading DevOps solutions add copilots to their toolkits that enable teams to move from just observing and monitoring their data to also receiving effective remediation steps immediately when they need to resolve an issue.”

Developing automations is a key step for creating recipes to remediate issues. Riley adds, “In ITOps, large language models like GPT—especially when augmented by enhanced search capabilities—are helping teams quickly find the information they need and automate previously manual processes.”

9. Add an AI observability assistant to your NOC

Every network operations center (NOC) is under pressure to support more mission-critical applications, increase uptime, and resolve issues faster. Camden Swita, senior product manager at New Relic, says New Relic Grok is the world’s first generative AI observability assistant and “making observability practices ubiquitous by removing barriers to adoption, like learning bespoke query languages or navigating the massive amount of telemetry most engineers confront every day.”

With every new app, database, and service comes up more observability data, and “all that data may overwhelm a human,” says Swita. “Pairing New Relic’s unified telemetry data platform with OpenAI’s LLMs works -we take the reasoning power of the LLM, give it tools to translate plain language into queries, look for deviations, and more. Now, engineers don’t need to slog through data manually—they can just ask, ‘What’s on fire?’ and cut right to the chase.”

10. Increase dev velocity with code examples

“If you’re a software developer or a devops engineer, you might experiment with generative AI tools and wonder what it will mean for your profession and how it will change your work,” I wrote in a recent article on ChatGPT and software development.

More code = more code to test, so look out for my upcoming article on how LLMs will impact continuous testing.

11. Innovate and deliver new natural language query capabilities

The most exciting area for IT is identifying generative AI and LLM innovation areas. Here are some of my suggestions from recent articles:

Isaac Sacolick AI is evolving quickly, and I plan to add to this article as platforms release new generative AI and LLM capabilities targeting DevOps and IT Ops. Please sign up for the Driving Digital Newsletter to access all of my thought leadership, including updated versions of this post. Also, please consider joining us for a future session of Coffee with Digital Trailblazers, where we discuss topics for aspiring transformation leaders, including AI, DevOps, and digital transformation.

No comments:

Post a Comment

Comments on this blog are moderated and we do not accept comments that have links to other websites.


About Isaac Sacolick

Isaac Sacolick is President of StarCIO, a technology leadership company that guides organizations on building digital transformation core competencies. He is the author of Digital Trailblazer and the Amazon bestseller Driving Digital and speaks about agile planning, devops, data science, product management, and other digital transformation best practices. Sacolick is a recognized top social CIO, a digital transformation influencer, and has over 900 articles published at InfoWorld, CIO.com, his blog Social, Agile, and Transformation, and other sites. You can find him sharing new insights @NYIke on Twitter, his Driving Digital Standup YouTube channel, or during the Coffee with Digital Trailblazers.