This quote has been variously attributed to Yogi Berra, Neils Bohr, Mark Twain and others. Nevertheless, it contains a nugget of wisdom about predictive models, especially those dealing with social systems rather than physical ones: people change, the underlying environment changes, and the only thing that remains constant is the difficulty of predicting behaviour in a changing environment!
Let us hark back to one of our earlier posts about model building (Everything but the model). There, we took an example of predicting the propensity of each customer to transact in the next 7 days, and accomplished this by going back a week, characterizing what we knew of each customer back then, and then building a model to map those features to whether or not they transacted this past week. The underlying assumption here was: whatever relationship existed between what we knew about a customer at a point in time, and whether or not they transacted in the following week, continues to hold true. The fundamental things remain as time goes by, as the song goes.
But then, didn’t we also just say, no more than one paragraph ago, that things keep changing? In other words, doesn’t this relationship between input (features) and output (future transactions) keep changing? The technical term for this in machine learning literature is concept drift.
How, then, do we reconcile the two? Here are some pointers.
The nature of the drift matters
Concept drift can be gradual (due to normal market evolution), accelerated (due to a seismic change with immediate after-effects), abrupt (due to a structural change in the environment, like a big merger/acquisition or a price cap) or seasonal (due to, well, seasonal patterns). Empirically, we have observed that changes are rarely accelerated, and abrupt change is, almost by definition, rare.
So, what this means for us is this:
- Gradual change may be small enough that a model built a week ago would still remain largely relevant. But just to be sure, we add a back-testing step wherein we go back a week further, build an initial model, and then test it on last week’s data. This out-of-time validation step gives us an idea of whether the model is robust enough to withstand gradual change.
- Seasonal change (like due to festivals, EOSS etc) can be explicitly accounted for by learning from the equivalent period in the past. If we have two equivalent periods (like the past two EOSS), then we can even test whether the relationship between input and output from two EOSS ago held true in the last EOSS.
The passage of time matters
In case of gradual drift, there is an incremental effect: small changes over time add up to a big change. If you trained a 7-day model about 3 months ago, and used the same model to score customers today, then you run the risk of the model being a bit outdated. So frequent retraining helps.
The other implication of the passage of time is that the length of the response window. For instance, if you’re building a model that predicts response over the next 3 months, you naturally must go back at least 3 months to build the model. The longer the response window, the further back you must go to come up with a model, and therefore the greater the chance of the model being a bit outdated. Therefore, find inventive ways to reduce the response window.
For instance, if you’re building an EOSS model and the EOSS typically lasts a month, then the instinct is to build a 30-day model. However, you could also build a model for the first week of EOSS, then move everything forward by a week and refresh the model to predict response in the second week of EOSS, and so on.
The evaluation metric matters
Remember, you are predicting a score for each customer, but you’re not deciding one customer at a time. More likely, you have a marketing campaign and want to talk to 30% of your customers rather than the entire 100%, and you want a model to tell you which 30% matters. At that level of granularity, you don’t need to be precise, just good enough that this model-based targeting strategy makes sense. For instance, in a gradually drifting environment, if your recall at the 3rd decile (% of actual responders captured in the top 30% of scores on a validation sample) is at 75% on your historical data, then even if it drops in reality to 65%, you’re still assured a response rate that is at least twice as good as a random targeting strategy (or if you had spoken to everyone).
In conclusion…
There’s no way around the reality that things will keep changing. However, if you understand the nature of the change, and adopt the appropriate techniques to tackle it, you should be fine!