Analysis splits

We have already seen that an ML job can be split based on any categorical field. As such, we can individually model behavior separately for each instance of that field. This could be extremely valuable, especially in the case where each instance needs its own separate model.

Take, for example, the case where we have data for different regions of the world:

Whatever data this is (sales KPIs, utilization metrics, and so on), clearly it has very distinctive patterns that are unique to each region. In this case, it makes sense to split any analysis we do with ML for each region to capitalize on this uniqueness. We would be able to detect anomalies in the behavior that's specific to each region.

Let's also imagine that, within each region, a fleet of servers support the application and transaction processing, but they are load balanced and contribute equally to the performance/operation. In that way, there's nothing unique about each server's contribution with a region. As such, it probably doesn't make sense to split the analysis per server.

We've naturally come to the conclusion that splitting by region is more effective than splitting by server. But what if a particular server within a region is having problems contributing to the anomalies that are being detected? Wouldn't we want to have this information available immediately, instead of having to manually diagnose further? This is possible to know via an implementation in ML called influencers.

Table of Contents for Analysis splits

Create new playlist

Sign In

Sign Up

Table of Contents for
Analysis splits