Chapter 7
Sensitivity Analysis of Experiment Conclusions

Experiment conclusions may be convincing, but this does not make them immune to instability, especially if they stem from a small or a somewhat unrepresentative dataset. To tackle this fairly serious matter, we employ a particular method called sensitivity analysis, a practice common to all scientific experiments, but not particularly common to data science. In this chapter, we will look at why it is very important in our field, how it is related to the well-known butterfly effect, the most common ways of performing sensitivity analysis using resampling, and how “what if” questions fit into all of this.

The Importance of Sensitivity Analysis

No matter how well we design and conduct our experiments, there is always some bias there, be it in the data we use, the implicit assumptions, or in how we analyze the results and draw the conclusions of these processes. Unfortunately, this bias is a kind of mistake that often goes by unnoticed, and unless we are aware of this issue and take action, our conclusions may not hold true in the future.

Imagine that you created a model based on the answers of certain questions about the data at hand, only to find out that these answers were not as reliable as they seemed. You probably would not want to put such a model into production!

These issues and uncertainties that may threaten the trustworthiness of your work are bound to disappear with just a bit of extra validation work in the form of sensitivity analysis. As a bonus, such a process is likely going to provide you with additional understanding of the experiments analyzed and help you gain deeper insight into the dynamics of the data you have. The fact that many people don’t use sensitivity analysis techniques because they don’t know much about it should not be a reason for you to avoid it too.

There are two broad categories of techniques you can use for sensitivity analysis depending on the scope of the analysis: global and local. We will look into each one of these later on in this chapter. Before we do so, let’s take a look at a very interesting phenomenon called the butterfly effect, which captures the essence of the dynamics of such systems that need sensitivity analysis in the first place.

The Butterfly Effect

This is probably one of the most easily recognizable terms of modern science, with various versions of it present in pop culture (particularly films). Unfortunately, its interpretations in the non-scientific media may lead you to believe that it is something very niche, akin to the sophisticated field of science from which it came: Chaos Theory. However, even though it became known to scientists through the study of complex systems (e.g. the weather, the stock market, and anything with behavior that is too chaotic to predict accurately), the butterfly effect is present in many places, including data science experiments.

In essence, the butterfly effect entails a noticeable change in the result stemming from a minute change in the initial conditions of a simulation. It came about in weather prediction models in the early days of computers, when a researcher noticed that the same inputs in a model resulted in entirely different results. After some investigation, it turned out the initial inputs were slightly different (some rounding had taken place), and this small difference was gradually magnified in the time-series model for the weather forecast until it became evident in the later predictions of it.

The butterfly effect makes its appearance in data science experiments usually when a change in the sample alters the conclusions of the tests performed on it. It may also appear in cases of models when there is a set of parameters they rely on. Whatever the case, it isn’t something good, especially if you plan to act on the conclusions of your experiments for the development of your data analytics models.

Global Sensitivity Analysis Using Resampling Methods

One popular and effective way of tackling the inherent uncertainty of experimental results in a holistic manner is resampling methods (there are other methods for global sensitivity analysis, but these are beyond the scope of this book, as they are not as popular). Resampling methods are a set of techniques designed to provide a more robust way of deriving reliable conclusions from a dataset by trying out different samples of the dataset (which is also a sample of sorts). Although this methodology has been traditionally statistics-based, since the development of efficient simulation processes, it has come to include different ways of accomplishing that too. Apart from purely statistical methods for resampling, such as bootstrapping, permutation methods, and jackknife, there is also Monte Carlo (a very popular method in all kinds of approximation problems). Let’s look at each category in more detail.

Bootstrapping

Also known as the Bootstrap method, this resampling technique attempts to do something seemingly impossible. Namely, it tries to gain more information about the population of the data of the sample, using the same sample, much like trying to elevate yourself by lifting the straps of the boots you are wearing (hence the method’s peculiar name).

Although additional samples independent of the original one would clearly provide more information about the population, a single sample can also accomplish this. Bootstrapping manages this by randomly selecting sub-samples of the original sample with replacement. Given a large enough number of sub-samples (at least 10,000), it is possible to generate a reliable confidence interval of the metric we wish to measure (e.g. the p-value of a t-test). This allows us to see if the value of the metric in the original sample is stable or not, something that no statistical test could tell us beforehand. Naturally, the larger the range of the confidence interval, the more unstable the particular metric is.

Permutation Methods

Permutation methods are a clever way to perform resampling while making sure that the resulting samples are different from one another, much like the bootstrapping method. The main differences are that in the case of permutation methods, the sampling is without replacement. Additionally, the process aims to test against hypotheses of “no effect” rather than find confidence intervals of a metric. The number of permutations possible is limited, since no two sub-samples should be the same. However, as the number of data points in each sub-sample is fairly small compared to the number of data points in the original sample, the number of possible permutations is very high. As for the number of sub-samples needed, having around 10,000 sub-samples is plenty for all practical purposes.

Permutation methods are also known as the randomization technique and it is very established as a resampling approach. Also, it is important to note that this set of methods is assumption-free. Regardless of the distribution of the metrics we calculate (e.g. the p-value of a t-test), the results of this meta-analysis are guaranteed to be reliable.

Jackknife

Jackknife is a lesser-known method that offers an alternative way to perform resampling, different than the previous ones, as it focuses on estimating the bias and the standard error of a particular metric (e.g. the median of a variable in the original sample). Using a moderate amount of calculations (more than the previous resampling methods), it systematically calculates the required metric by leaving a single data point out of the original sample in each calculation of the metric.

Although this method may seem time-consuming in the case of a fairly large original sample, it is very robust and allows you to gain a thorough understanding of your data and its sensitivity to the metric you are interested in. If you delve deeper into this approach, you can pinpoint the specific data points that influence this metric. For the purpose of analyzing data stemming from an experiment, it is highly suitable, since it is rare that you will have an exceedingly large amount of data points in this kind of data.

Monte Carlo

This is a fairly popular method for all kinds of approximations, particularly the more complex ones. It is used widely for simulations of the behavior of complicated processes and is very efficient due to its simplicity and ease of use.

When it comes to resampling, Monte Carlo is applied as the following process:

  1. Create a simulated sample using a non-biased randomizing method. This sample is based on the population whose behavior you plan to investigate.
  2. Create a pseudo-sample emulating a real-life sample of interest (in our case, this can be a predictive model or a test for a question we are attempting to answer)
  3. Repeat step 2 for a total of N times
  4. Compute the probability of interest from the aggregate of all the outcomes of the N trials from steps 2 and 3

From this meta-testing, we obtain a rigorous insight regarding the stability of the conclusions of our previous experiments, expressed as a p-value, just like the statistical tests we saw in the previous chapter.

Local Sensitivity Analysis Employing “What If?” Questions

“What If?” questions are great, even if they do not lend themselves directly to data-related topics. However, they have a place in sensitivity analysis, as they can prove valuable in testing the stability of a conclusion and how specific parameters relate to this. Namely, such a question can be something like, “What if parameter X increases by 10%? How does this affect the model?” Note that these parameters often relate to specific features, so these questions are also meaningful to the people not directly involved in the model (e.g. the stakeholders of the project).

The high-level comprehensiveness of this approach is one of its key advantages. Also, as this approach involves the analysis of different scenarios, it is often referred to as scenario analysis and is common in even non-data science related situations. Finally, they allow you to delve deeper into the data and the models that are based on it, oftentimes yielding additional insights to complement the ones stemming from your other analyses.

Some Useful Considerations on Sensitivity Analysis

Although sensitivity analysis is more of a process that can help us decide how stable a model is or how robust the answers to your questions are, it is also something that can be measured. However, in practice we rarely put a numeric value to it, mainly because there are more urgent matters that demand our time. Besides, in the case of a model, whatever model we decide on will likely be updated or even replaced altogether by another model, so even if it is not the most stable model in the world, that’s acceptable.

However, if you are new to this and find that you have the time to delve deeper into sensitivity analysis when evaluating the robustness of a model, you can calculate the sensitivity metric, an interesting heuristic that reflects how sensitive a model is to a particular parameter (we will look into heuristics more in Chapter 11). You can accomplish this as follows:

  1. Calculate the relative change of a parameter X
  2. Record the corresponding change in the model’s performance
  3. Calculate the relative change of the model’s performance
  4. Divide the outcome of step 3 by the outcome of step 1

The resulting number is the sensitivity corresponding to parameter X. The higher it is, the more sensitive (i.e. dependent) the model’s performance is to that parameter.

In the previous chapter, we examined K-fold cross validation briefly. This method is actually a very robust way to tackle instability issues proactively, since it reduces the risk of having a good performance in a model due to chance in the sample used to train it. However, the chance of having a lucky streak in sampling is not eliminated completely, which is why in order to ensure that you have a truly well-performing model, you need to repeat the K-fold cross validation a few times.

Playing around with sensitivity analysis is always useful, even if you are an experienced data scientist. The fact that many professionals in this fields choose not to delve too much into it does not make it any less valuable. Especially in the beginning of your data science career, sensitivity analysis can help you to better comprehend the models and understand how relative the answers to your questions are once they are tested with data-based experiments. Even if a test clears an answer for a given question on your data, keep in mind that the answer you obtained may not always hold true, especially if the sample you have is unrepresentative of the whole population of the variables you work with.

Summary

Sensitivity analysis is a rigorous way to check if the conclusions of an experiment are stable enough to be reliable.

The butterfly effect is a term denoting the potential large effects of minor changes in the initial conditions of an experiment. This phenomenon is something very practical in everyday situations when it comes to data science problems, as it manifests in many experiments and data analytics models.

Resampling methods are a popular and effective way to perform sensitivity analysis for a particular test on the data at hand without needing additional data. Resampling methods include bootstrapping, permutation methods, Jackknife, and Monte Carlo.

“What if” questions are useful for sensitivity analysis, as they allow for pinpointing how specific changes in the data are influencing the end result.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset