Many Python packages such as SciPy
come with several variants of regression functions. In particular, the statsmodels
package is a complement to SciPy
with descriptive statistics and estimation of statistical models. The official page for statsmodels
is http://statsmodels.sourceforge.net/.
In this example, we will use the ols
function of the statsmodels
module to perform an ordinary least squares regression and view its summary.
Let's assume that you have implemented an APT model with seven factors that return the values of Y. Consider the following set of data collected over 9 time periods, to . X1 to X7 are independent variables observed at each period. The regression problem is therefore structured as:.
A simple ordinary least squares regression on values of X and Y can be performed with the following code:
""" Least squares regression with statsmodels """ import numpy as np import statsmodels.api as sm # Generate some sample data num_periods = 9 all_values = np.array([np.random.random(8) for i in range(num_periods)]) # Filter the data y_values = all_values[:, 0] # First column values as Y x_values = all_values[:, 1:] # All other values as X x_values = sm.add_constant(x_values) # Include the intercept results = sm.OLS(y_values, x_values).fit() # Regress and fit the model
Let's view the detailed statistics of the regression:
>>> print results.summary()
The OLS regression results will output a pretty long table of statistical information. However, our interest lies in the particular section that gives us the coefficients of our APT model:
============================================= coef std err t --------------------------------------------- const 0.5224 0.825 0.633 x1 0.0683 0.246 0.277 x2 0.1455 1.010 0.144 x3 -0.2451 0.330 -0.744 x4 0.5907 0.830 0.712 x5 -0.3252 0.256 -1.271 x6 -0.2375 0.788 -0.301 x7 -0.1880 0.703 -0.267
Similarly, we can use the params
function to display our coefficients of interest:
>>> print results.params [ 0.52243605 0.06827488 0.14550665 -0.24508947 0.5907154 -0.32515442 -0.23751989 -0.18795065]
Both the function calls produce the same coefficient values for the APT model in the same order.