In order to monitor and manage more dynamic and modern IT environments, it is becoming essential to use artificial intelligence (AI) within IT operations. This process is then called AIOps.
AIOps reinforces IT Ops and DevOps teams to do smarter and faster work, so they can detect issues earlier and resolve them quickly. With AIOps, Ops teams are able to manage a large quantity of data generated by modern IT environments. It is expected that AIOps will become more and more popular in the future, even rise up to 30% by 2023, according to Gartner.
Hence, we have talked to experts in the industry to shed light on this topic and see what the future holds for AIOps.
What is AIOps?
First of all, we have asked them to explain what is AIOps and what can it do.
Hitesh Khodani, QA Leader, defines AIOps as applying Artificial Intelligence to improve IT operations.
Indeed, AIOps leverages Machine learning, big data & analytics capabilities to:
- Get the data load of operations data from multiple IT infrastructure components, applications, and performance-monitoring tools (i.e., Splunk)
- Intelligently remove the noise from the collected data and identify patterns & events concerning system issues and performance.
- Identify the issues and report to IT for rapid action and response.
Wayne Ariola, DevOps thought leader, reinforces this point by highlighting that, as the name “AIOps” suggests, the focus of the data analysis is to improve operations. But, it is vital to know that the concept of AIOps has far-reaching value across all value-stream in an organization. For instance, Curiosity Baseline applies these AIOps techniques to the vast array of software quality data that resides throughout the software development lifecycle.
George Ukkuru, Head of Quality Engineering at UST, also says that AIOps is all about applying machine learning and analytics to augment IT Operations. According to him, AIOps can be used for simple tasks like finding the best engineer to fix an issue to performing auto-healing.
He continues by saying that you can use AIOps for various things such as identifying anomalies and make predictions by analyzing patterns in log files, co-relating events to identify root causes, providing automatic resolutions to problems based on continuous learning, etc.
Hitesh also emphasizes that, in legacy operations monitoring systems, multiple IT operations tools were used to detect and alert any operations issues, but all of this is still done manually and involves coordinating with multiple teams. With AIOps, big data is used to aggregate siloed operations data in one single place. Hence, the data collected can range from systems logs, past performance and event data, Network data, past Incident data, and resolutions notes, knowledge articles, etc.
On the collected data, he continues, AIOps applies machine learning and analytics to filter the noise and identify critical events and raise alerts; identify root causes and propose solutions based on past and current information. Besides, it also automates the responses and recommends solutions based on the past resolution data. Machine learning capability provides the system to predict the events much before they happen and propose solutions in advance.
Therefore, AIOps can help grow and consolidate businesses and teams as well as improving their performances and productivity.
Why do we need AIOps?
Wayne points out that we need AIOps so we can prevent an issue instead of having to fix it. Yet, we also need to realize that computing power, data access, and AI techniques have allowed humans to take another incremental step forward with automation.
Hitesh adds that most organizations and businesses are now migrating to new infrastructure capabilities like cloud, hybrid leveraging virtualized services that can scale and handle the demand instantly. Hence, applications across these platforms generate a huge amount of data, which the existing manual IT operations processes cannot cope with.
AIOps alongside Machine learning, big data & analytics capabilities, on the other hand, can consume volumes of data across all infrastructure and logically analyze the data to report significant events pertaining to performance degradation, outages, and trigger alerts automatically to be actioned by the IT operations staff.
Finally, George highlights 3 main reasons why AIOps is essential:
- The dependency on infrastructure is very high critical applications are accessed from the cloud, and the availability of 5 9’s is the new norm
- The application environment has become very complex due to an increase in scale and elasticity
- Engineering leaders are looking at deriving insights from the vast amount of data that is available in disparate formats
Implementing AIOps
After we’ve seen why should enterprises use AIOps, let’s explore how to implement it.
According to George, the first step to implement AIOps is to identify the pain points in operations and convert them to a use case.
Indeed, in order to do this, you have to carry out a cost-benefit analysis to see whether there will be sufficient returns in solving the problem. When doing so, you need to evaluate whether you need to leverage AI, ML, or Big Data to solve the problem. The next step is to look at the data you can use to find a solution to the problem, and this may exist in various formats such as log files, resolution data, device data, etc. Then, you have to review the quality of data and eliminate bad data.
You may need to define, train and refine the machine learning model. To do this, you need to start with simple models and then improve the complexity, by selecting the best algorithm and build an interface for visualizations or interactions.
Once the system is ready, George points out, it is time to test and observe the behavior. You may need to fine-tune the data models and algorithms to improve the accuracy before putting the solution to production use. But be careful to do enough change management to weed out concerns among employees around Bots replacing humans.
For Hitesh, here are the main steps to follow in order to implement in AIOps:
- Identifying the pain points in current IT operations and coming with use cases for implementation,
- Map current tooling and infrastructure,
- Socialize the plan for AIOps implementation with involved teams
- Identify data requirements
- Configure solution around existing tools and infra
- Setup and monitor
- Review and refine based on learnings
Hence, Hitesh highlights that the AIOps solution integrates existing tools and processes of an organization. IT teams use multiple tools for monitoring for various purposes. AIOps ties them all together and delivers seamlessly shared visibility across all tools, teams, and domains.
Wayne emphasizes that, for most organizations, AIOps will be implemented with the assistance of a platform that provides an interface to simplify the three steps: data access, data analysis, and event management.
Besides, he continues, there are software vendors who are leveraging known patterns within specific value-streams of an organization which will assist teams to get to valuable business outcomes faster than trying to build AI algorithms by themselves.
Wayne also recommends that organizations should invest in training critical employees about AI and AIOps – the more depth in the knowledge that an organization possesses the more ‘realistic’ outcomes will prevail.
‘Just like any new technology, AIOps represents a culture change that cannot be ignored. Teams will need to play nice in the sandbox with other teams. Traditional organizational silos will be challenged, and the “answers” provided by predictive systems will precipitate some uncomfortable discussions.’
The benefits…
Implementing AIOps also comes with many advantages…
According to Wayne, the entire goal of AIOps is to manage complexity.
Indeed, our interconnected world provides a roadmap for the simplification of both mundane and complex tasks. Hence, humanity needs a way to leverage data to expose patterns and risks that would go unnoticed using manual techniques. Patterns that might be exposed over years or decades could be highlighted in days.
‘AIOps is another step in the evolutionary journey that started with smoke signals and has evolved to AI.’
For Hitesh, AIOps can help modernize the IT operations and operations teams, predict management, and speed up the MTTR (Mean time to resolution).
Indeed, AIOps can modernize the IT operations by bringing intelligence to the alerting system by only reporting issues which are worthy of reporting with complete diagnostic details and the best possible solution. It also keeps learning with each alert raised making future diagnosis easier and helps keep the lights on. Besides, AIOps tools perform continuous monitoring without the need to rest or sleep. This helps the IT operations team to focus on serious, complex issues and initiatives that can then increase business stability and performance.
Moreover, AIOps helps predictive management as current operation processes are mainly reactive due to the action being taken post facto. Hence, AIOps brings the whole process to be predictive and identify problems before they become major outages.
Finally, Hitesh points out that it can offer faster MTTR as AIOps can identify root causes and propose solutions faster and more accurately than manual processes. This then enables organizations to set and achieve previously unthinkable MTTR goals.
George highlights four key benefits regarding AIOps:
- Enable faster decision making and provide insights
- Improve the availability and reliability of applications
- Reduce cost by proactively fixing issues
- Decrease the MTTR
… And the challenges
But what are the challenges that come with it?
For George, the major drawback is that you may need to spend a fair amount of effort customizing existing data models or creating a new data model for the use case you are trying to implement. It might take time to get to a reasonable accuracy level so that your solution is 100% reliable. Besides, he adds that you should always look at the ROI (Return of Investment) and benefit that can be achieved from each use case before investments are made.
Wayne also adds that, just like all transformative technologies, you need to take into account the human component. Indeed, humans must learn that the predictive analysis that AIOps delivers is the new normal. There will then be a shift in the way we consume information and act upon the information.
According to Hitesh, there are a few challenges to consider regarding AIOps including:
- AIOps is as good as the data it is fed and hence has limitations.
- There is a steep learning and implementation curve as the initial setup and maintenance require significant effort.
- Too much dependency on diverse data sources, as well data retention, protection, and storage.
- AIOps can’t report every type of software monitoring or management task in real-time. There will always be situations where manual intervention is required and can delay the resolution.
- Cannot be relied on for more complex issues which are a bit tricky and involve lots of critical systems.
The future of AIOps
Hitesh believes that AIOps will evolve continuously and will be used extensively in large organizations which are currently undergoing digital transformation and platform modernizations.
‘With improved algorithms, patterns, and large datasets, AIOps will continue to see large adoption across the industry.’ Hitesh says.
He adds that AIOps is a powerful solution, but we need to be informed that it cannot solve all problems. Indeed, AIOps will possibly be used more in resolving simple and routine issues which will free up significant human time to focus on more critical and complex issues.
George also thinks that AIOps will grow more powerful in the years to come. Preventing outages and improving customer satisfaction of digital customers will be the number one priority of every CIO, he points out. Hence, AIOps could help by increasing innovations and investments in this area going forward.
Finally, Wayne believes that AIOps is an evolutionary step in data analysis that can make all organizations “data-driven.”
Organizations that are not investigating or are slow in adopting these techniques will fall behind in a few ways. Indeed, according to him, organizations will be first disconnected from the actual customer’s experience versus their competition using AIOps. Then, they will be culturally behind in understanding how to leverage data across a broader swath of an organization’s value stream. Finally, due to the lack of data insights, innovation will be relatively riskier versus an organization’s peer group which will be a distinct disadvantage.
It then seems that AIOps is here to stay and will only grow more powerful in the coming years…
Special thanks to Hitesh Khodani, Wayne Ariola, and George Ukkuru for their insights on the topic!