When I first started weightlifting, I’d watch these massive guys with lifting belts, chalk-covered hands, and special shoes, and I didn’t get it. Why would someone so strong need a belt? Didn’t that mean they weren’t as strong as they thought? But, as I got more into powerlifting, I realized the truth: using a belt didn’t make me weaker—it made me stronger. It improved my form, taught me how to breathe and brace effectively, and prepared my body for heavier loads. The right tool, used at the right time and in the right way, isn’t a crutch—it’s a catalyst for growth. Now, even lifting without a belt, I’m stronger and more skilled because of what I learned while using it.
That’s exactly how I think about artificial intelligence (AI) and machine learning (ML). Be warned: This article uses these terms somewhat interchangeably, much like our industry does. These aren’t replacements for human skills—they’re tools that make us better at what we do.
By now, we have all dabbled in LLM or generative Artificial Intelligence (AI). We know it can help write our Disaster Recovery Plan (DRP), but most have dismissed it as just another writing partner.
In the infamous words of Ron Popeil, “But wait, there’s more!”
The goal of disaster recovery is resilience: The ability to continue functioning even when a disaster strikes and to withstand, adapt to and recover from challenges. Achieving true resilience isn’t just about having a DRP on paper—it’s about leveraging every tool available to anticipate risks, streamline responses, and minimize downtime. That’s where AI comes in. AI offers various capabilities to enhance your DRP, from predictive modeling to real-time optimization.
Why Do We Need AI to Enhance DRP?
AI can sift through mountains of data in seconds, spot and test patterns we might miss, and handle tedious tasks so we can focus on what’s really important. It’s like a belt for heavy lifting—it supports us, enhances our work, and helps us achieve things that would otherwise be impossible. When used thoughtfully, AI doesn’t take anything away; it amplifies our capabilities.
How To Train
Warmups
Where you start matters. You’ll want to prep your environment for implementing AI. Before I do my deadlifts, I like to warm up. Personally, I have some hip tightness issues, so I like to do some banded and jumping squats to get those hips moving and also to assess what tolerance they have for the day.
Simulated Annealing
- Start with broad exploration by trying different lifts
- Allows for “mistakes” when you use novice techniques
You need to restore multiple systems after a disaster. Still, some are more critical than others, and resource constraints (like bandwidth or compute power) limit how many can be restored simultaneously. The challenge is finding the best sequence to restore systems to minimize downtime and impact.
- Start with a random order of system restoration.
- Slightly tweak the restoration sequence (e.g., swap the order of two systems or shift one system earlier/later).
- If the new sequence reduces the total downtime impact, accept it.
- If it increases the cost, accept it with a probability based on the current “temperature” — allowing exploration of potentially better solutions.
Light Weight
Once I have warmed up my joints and figured out how I would move that day, I will start with lightweight. I don’t just pile up to my one-rep max; I start at 50-60% to test out my form. I like to do a set of 5. I focus on bracing, bending the bar, moving my hips and knees together, and pushing through my heels. I also like to try different variations of my stance. Sometimes, a wide-legged sumo feels stronger than a close-legged standard deadlift. Sometimes, doing some Romanian Deadlifts helps me focus on strengthening my lower back. Variations can help me strengthen what’s weak and also help me find my perfect form for the actual lifts.
Reinforcement Learning
- By trying different techniques in lifting weights, I am honing in on how to do it well.
- At first, I don’t know the best way to lift, so I just experimented. I might try lifting super fast or super slow, close-legged, or in a sumo stance until my body finds the right feeling.
You operate a cloud-based system with multiple interconnected services. When a disaster strikes—such as a server failure or network outage—the system must quickly decide which backup resources to activate, which services to prioritize, and how to allocate bandwidth and compute resources efficiently. The ultimate goal is to minimize downtime and maintain service availability while adhering to cost and resource constraints.
- RL learns to prioritize actions like activating backups, redirecting traffic, or scaling resources based on the system’s current state, ensuring real-time adaptability during disasters.
- By trial and error, RL discovers strategies that minimize downtime, maintain critical services, and reduce costs, effectively balancing competing priorities.
- RL agents improve over time by simulating various disaster scenarios and refining their responses to handle complex and evolving situations more efficiently.
Increase Weight
As I get closer to my one-rep max, I start adding the belt and chalk and resting more between sets (usually 5 minutes once I start feeling it). Here, the belt helps me focus more on bracing and breathing. The chalk helps me grip the bar better as it gets heavier. These little tweaks make me more efficient and stronger.
Hyperparameter Tuning
- Testing incremental increases to find the limit without failure.
- This type of incremental change will get to precision and peak performance.
Let’s say your system automatically decides how to back up data based on current load. If backups happen too frequently, it slows other processes, but if they are infrequent, you risk losing critical data in a disaster. Hyperparameter tuning would help find balance by adjusting the “backup frequency” parameter.
Monitor and Adjust
I can’t get to a personal record (PR) unless I know where I am and what affects my strength. Monitoring all variables helps me determine what my next course of action is and what my controls need to be. So, if I try going for my PR and I have only had 3 hours of sleep and 1500 calories—sometimes you’re the windshield, sometimes you’re the bug—and I fail, I will likely look at those variables and adjust for the next time I try my PR.
Bayesian Networks
- A node in a Bayesian Network looks at factors (Food, Sleep, Mood).
- An arrow looks at the relationship between them
- Probabilities can then be assessed (if I slept poorly, my energy is lower)
- Predictions will use the probabilities to calculate how I will likely do. (I slept well, ate well, and rested since my last PR attempt, so I will get five extra pounds on this lift).
- Learning and updating—Over time, I can see that sleep is a larger factor for me than food in terms of strength, and I can adjust my nodes to perform maximally.
If you have a multi-cloud environment and must decide whether to reroute traffic during an outage, a Bayesian Network could model how factors like server response times, network latency, and user traffic load are interconnected. Realistically, this is like monitoring and adjusting, which you must do for all systems. But it’s faster.
Accessory Lifts and More Gym Fun
I can get bored of deadlifting on deadlift day. But I can do other accessory lifts that also enhance my deadlifts. Leg curls and good mornings are two of my favorites.
There are many implementations of ML and AI, and they are fun to do. Figuring out how little or how much training (AI implementation) you need for your environment is part of the fun. But don’t get stuck using them just to write your DRP. Have fun with the tools and leverage them to enhance your program.
To whet your appetite, here are 13 AI-driven approaches that can enhance your disaster recovery strategy.
Hadas Cassorla, JD, MBA, CISSP has a lot of letters after her name, but the three letters she cares the most about are Y-E-S. Marrying her improv and legal background into technology and business, she helps organizations build strong, actionable and implementable security programs by getting buy-in from investors, the boardroom and employees. She has founded her own business, Scale Security Group, and has built corporate security offices from ground-up.