Calculating variance and standard deviation

In probability theory and statistics, standard deviation and variance give us a feel of how far some numbers are spread out from their mean. Let's briefly examine each.

Measuring variance

Variance gives us a feel for the overall amount of spread of the values from the mean. It is defined as follows:

Essentially, this is stating that for each measurement, we calculate the value of the difference between the value and the mean. This can be a positive or negative value, so we square the result to make sure that negative values have cumulative effects on the result. These values are then summed up and divided by the number of measurements minus one, giving an approximation of the average value of the differences.

In pandas, the variance is calculated using the .var() method. The following code calculates the variance of the price for both stocks:

Finding the standard deviation

Standard deviation is a similar measurement to variance. It is determined by calculating the square root of the variance and is defined as follows:

Remember that the variance squares the difference between all measurements and the mean. Because of this, the variance is not in the same units and the actual values. By using the square root of the variance, the standard deviation is in the same units as the values in the original dataset.

The standard deviation is calculated using the .std() method, as demonstrated here:

Determining correlation

Covariance can help determine whether values are related, but it does not give a sense of the degree to which the variables move together. To measure the degree to which variables move together, we need to calculate the correlation. Correlation is calculated by dividing the covariance by the product of the standard deviations of both sets of data:

Correlation standardizes the measure of interdependence between two variables and consequently tells you how closely the two variables move. The correlation measurement, called the correlation coefficient, will always take a value between one and -1, and the interpretation of this value is as follows:

  • If the correlation coefficient is 1.0, the variables have a perfect positive correlation. This means that if one variable moves by a given amount, the second moves proportionally in the same direction. A positive correlation coefficient of less than 1.0 but greater than 0.0 indicates a less-than-perfect positive correlation, with the strength of the correlation growing as the number approaches 1.0.
  • If the correlation coefficient is 0.0, no relationship exists between the variables. If one variable moves, you can make no predictions about the movement of the other variable.
  • If the correlation coefficient is -1.0, the variables are perfectly negatively correlated (or inversely correlated) and move opposite to each other. If one variable increases, the other variable decreases proportionally. A negative correlation coefficient greater than -1.0 but less than 0.0 indicates a less-than-perfect negative correlation, with the strength of the correlation growing as the number approaches -1.

Correlations in pandas are calculated using the .corr() method. The following code calculates the correlation of MSFT to AAPL:

This shows that the prices for MSFT and AAPL during this period demonstrate a high level of correlation. This does not mean that they are causal, with one affecting the other, but that there are likely shared influences on the values, such as being in similar markets.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset