The Ultimate Guide to Understanding Model Drift in Machine Learning

Mandatory concepts every data scientist should know before it’s too late.

The Ultimate Guide to Understanding Model Drift in Machine Learning

I always believed model development is the hardest part. Data cleaning — the most time-consuming. Isn’t that we all were taught? Recently I learned something else is even more important.

The performance of the model was great. It’s foolproof; we checked it multiple times. So when we deployed our prediction system and automated the pipeline, we thought we were good.

Oh, how wrong we were.

We had no idea what was waiting for us. The results were good for a few months. Then it happened. Nothing’s changed from our end, but the predictions didn’t make any sense anymore. We had vaguely heard of model drift and how it can change the results but had no idea what’s next.

This article will introduce model drift, talk about different types, and discuss how to address this issue after deploying your machine learning project. And we will go real deep into the concepts and techniques, ensuring you understand everything intuitively.

In short, this is what I wish I had known before we shipped any real-world machine learning project.

What is Model Drift?

Let me give you an intuitive explanation of the concept:

When we say we trained and built a machine learning model, what do we really do? We learn a mapping between a set of input features and the output. We test the mapping on another separate data we kept aside initially. (The data we used to learn the mapping is referred to as the training set, and the data used to test the mapping as a test set).y = f(x) where y->target and x-> input

We iterate and optimize until we get satisfactory results.

When we finally deploy the model, we assume that the features and targets' distributions will remain fairly constant in the future. Though this assumption is fair, this doesn’t always hold. Why?

“The only constant in life is change.”- Heraclitus

I’m not getting philosophical here, but this is true for all real-world machine learning models we build. This concept is called Model Drift. We can’t build, deploy and forget about the models. We need to ensure they stay relevant for the problem we initially set out to solve.

I will outline specific techniques on how we can do this, but first, it’s critical to understand what changes should be expected in the models.

3 Types of Model Drifts Every Data Scientist Should Understand

Whether you have already deployed the model or haven’t started building them yet, you must understand these 3 types of changes. That way, you’ll know what to expect.

Let’s understand these concepts first! (Photo by Karolina Grabowska from Pexels)

1. Concept Drift

Concept drift occurs when the target variable's statistical properties, which the model is trying to predict, change over time in unforeseen ways.y' = f(x) where y'!= y

A great example of this scenario is spam detection. Mail providers must constantly update their models since spammers change their strategies to evade the algorithms not to get detected. User/customer preferences change over time, and recommendation algorithms need to adapt to this change.

2. Data Drift

Data drift — one of the top reasons model accuracies degrades over time. In this scenario, the statistical properties of the input variables change over time. This again affects the mapping that the model established.y = f(x') where x'!= x

Trends and seasonalities change over the years. Unexpected scenarios such as a pandemic will heavily change multiple dimensions of human behaviour. These data are often fed into the model as input features and impact the model predictions. With time, the data we used to train the model gets outdated.

3. Upstream Data Changes

This is one of the concepts that most data scientists don’t account for. Data is never readily available for the models. The models we build depends on the data engineering pipelines that generate the input and target variables.

Due to various business and technical decisions, changes ought to happen upstream, affecting the features or the models' target variables. Some common examples for these are change in encodings and newly missing values in the data. These often are common yet go unnoticed but affect model predictions.

Now that we know the possible model drifts that could happen let’s think about how we could attack the drifts and keep the models relevant over time.

How to Attack Model Drifts After Deployment

I have been researching how to do this for the past few months. Tweaking, refining and stabilizing the model and its output, I’ve tried it all. After attending countless workshops from experts in the industry, I’ve summarized what could be the best strategies of all.

Get ready to fight back! (Photo by Anthony Shkraba from Pexels)

1. Monitoring the Key Performance Metrics

If you don’t track, you’ll never know.

We defined some of the important metrics during our development cycle. We ensured they perform well during development, but do they continue to remain good?

I had picked MAPE (mean absolute percentage error) and kept tracking it for weeks after deployment. When a change above a certain threshold occurs, that’s a red flag. This would be the single most important metric to monitor but isn’t sufficient.

Including assertive statements on data types and unit tests throughout the pipeline can help identify upstream data changes. Even if the pipeline runs smoothly, these checks can signal that something is wrong before it’s too late.

We also need to monitor the statistical distributions of the target variable as well as the input features. Ideally, we should have a quantities measure that compares the training set's distribution (on which we established the mapping) to the current incoming real-world data.

Population Stability Index (PSI) is the metric to measure how much a variable has shifted in distribution between two samples or over time. Here’s a detailed guide on PSI, on how to calculate the values, and about the thresholds that can be used during monitoring.

Besides, it’s worth setting up a dashboard that lets you inspect at a glance all the metrics we discussed above. And give access to everyone in the team, and you may set up automatic alerts for certain thresholds. In a nutshell, you should know before someone figures out something wrong with the results.

Since this is an important part, I want to re-iterate this:

  • Pick and monitor one of the most important performance metrics for your problem and model. e.g., MAPE
  • Monitor the unit tests and assertive statements. e.g., data type checks
  • Monitor statistical distributions of target and input feature using a quantitative metric. e.g., PSI
  • Set up a dashboard with all the above metrics, and set up alerts.

You’ve done the monitoring part perfectly. You get an alert; what do you do next?

2. Intervening When Required

Are you ready to do it all over again?

If there are any upstream data changes, track them down. Inform the data engineers of the impact, and see if you can reverse it. If you had set up tests and data type assertions, it’d be much easier for you to update the code to reflect these data changes.

Sometimes the ultimatum happens: the distributions have changed, and the model isn’t relevant anymore. To put in other words: Your machine learning models have become stale.

There’s only one fix to it: Model retraining — be ready to do it all over again.

How to Refresh Your Machine Learning Model

Follow these steps once you’ve identified the need to retrain.

Eventually, it’ll work out fine!( Photo by Andrea Piacquadio from Pexels)

Communicate the need for model retraining.

First things first, we need to stop using the stale model for making realtime decisions. Retraining/refreshing/recalibrating the model is a significant commitment in terms of time and effort required. Create a plan of action document and communicate the same to all the stakeholders in detail.

There’s nothing worse than someone still using the stale model to make decisions because you hadn’t informed them. Being as transparent as possible help; they would eventually understand certain machine learning algorithms' nature.

Once they have approved, you may go ahead and start the refreshing process.

Use the most relevant/recent data.

We have already identified data drifts and concept drifts during the monitoring process. It’s best to use the most recent and relevant data. If some unusual events occurred during the time period (e.g., pandemic panic buying, once in four-year sporting event etc.), it’s often best to leave out the data and train on the normal data. The idea behind this is to keep the data generalized yet recent to avoid having to retrain again in a short time period.

Attempt hyper-parameter tuning (quick-win).

The model's poor performance can often be fixed by hyper-parameter tuning the model to accommodate the most relevant data. Experimentation is the key here.

I like to keep a log of all the experiments using MLflow, so if ever I have referred back to the experiments I’ve conducted before, along with the results, I can. Experimentation can be time-consuming, but definitely a quick win to get the model up and running as soon as possible.

Stick to the same model.

If hyper-parameter tuning weren’t sufficient, you’d have to do more feature engineering to improve the performance. Ask yourself what has changed: Is there some newly available data that you can take advantage of? Are any features you originally included which don’t seem to be relevant anymore?

The key is to stick to the same model (e.g., XGBoost or Random Forest etc.) you’ve already built, but change the features you use to train it. I was lucky enough to gain back the performance with this stage, so I recommend spending time here.

If nothing above works, you might have to go wild and try different models from scratch. I hope you don’t have to do this, but that option is also on the books.

Final Thoughts

I’m going to be honest with you: there is no one-size-fits-all strategy to combat model drifts. The idea is to understand the possibilities and approaches that we could use to handle the changes over time.

We looked into ways machine learning models can change and why it’s inevitable. We analysed how we can monitor this change and detailed a process to refresh the model when it becomes stale.

You shouldn't have to pause everything (like me) and pull your hair off to figure out what went wrong. You don’t have to attend countless workshops (like me) to get a grasp of the situation. You need to understand these concepts now, and your future self will thank you.

I hope this article provided a much-needed overview of the concepts and will keep you prepared. Please feel free to reach out on LinkedIn if I could be of any help. This is the article I wish I had read before I deployed my most-recent machine learning model. I’m so glad you have it!

For more helpful insights on breaking into data science, exciting collaborations and mentorships, consider joining my private list of email friends.