This chapter is devoted to estimating various factor models. Factors are variables/attributes that in the past were correlated with (then future) stock returns and are expected to contain the same predictive signals in the future.
These risk factors can be considered a tool for understanding the cross-section of (expected) returns. That is why various factor models are used to explain the excess returns (over the risk-free rate) of a certain portfolio or asset using one or more factors. We can think of the factors as the sources of risk that are the drivers of those excess returns. Each factor carries a risk premium and the overall portfolio/asset return is the weighted average of those premiums.
Factor models play a crucial role in portfolio management, mainly because:
By the end of this chapter, we will have constructed some of the most popular factor models. We will start with the simplest, yet very popular, one-factor model (which is the same as the Capital Asset Pricing Model when the considered factor is the market return) and then explain how to estimate more advanced three-, four-, and five-factor models. We will also cover the interpretation of what these factors represent and give a high-level overview of how they are constructed.
In this chapter, we cover the following recipes:
In this recipe, we learn how to estimate the famous Capital Asset Pricing Model (CAPM) and obtain the beta coefficient. This model represents the relationship between the expected return on a risky asset and the market risk (also known as systematic or undiversifiable risk). CAPM can be considered a one-factor model, on top of which more complex factor models were built.
CAPM is represented by the following equation:
Here, E(ri) denotes the expected return on asset i, rf is the risk-free rate (such as a government bond), E(rm) is the expected return on the market, and is the beta coefficient.
Beta can be interpreted as the level of the asset return’s sensitivity, as compared to the market in general. Below we mention the possible interpretations of the coefficient:
CAPM can also be represented as:
In this specification, the left-hand side of the equation can be interpreted as the risk premium, while the right-hand side contains the market premium. The same equation can be further reshaped into:
Here, and .
In this example, we consider the case of Amazon and assume that the S&P 500 index represents the market. We use 5 years (2016 to 2020) of monthly data to estimate the beta. In current times, the risk-free rate is so low that, for simplicity’s sake, we assume it is equal to zero.
Execute the following steps to implement the CAPM in Python:
import pandas as pd
import yfinance as yf
import statsmodels.api as sm
RISKY_ASSET = "AMZN"
MARKET_BENCHMARK = "^GSPC"
START_DATE = "2016-01-01"
END_DATE = "2020-12-31"
df = yf.download([RISKY_ASSET, MARKET_BENCHMARK],
start=START_DATE,
end=END_DATE,
adjusted=True,
progress=False)
X = (
df["Adj Close"]
.rename(columns={RISKY_ASSET: "asset",
MARKET_BENCHMARK: "market"})
.resample("M")
.last()
.pct_change()
.dropna()
)
covariance = X.cov().iloc[0,1]
benchmark_variance = X.market.var()
beta = covariance / benchmark_variance
The result of the code is beta = 1.2035
.
# separate target
y = X.pop("asset")
# add constant
X = sm.add_constant(X)
# define and fit the regression model
capm_model = sm.OLS(y, X).fit()
# print results
print(capm_model.summary())
Figure 8.1 shows the results of estimating the CAPM model:
Figure 8.1: The summary of the CAPM estimated using OLS
These results indicate that the beta (denoted as the market here) is equal to 1.2, which means that Amazon’s returns are 20% more volatile than the market (proxied by S&P 500). Or in other words, Amazon’s (excess) return is expected to move 1.2 times the market (excess) return. The value of the intercept is relatively small and statistically insignificant at the 5% significance level.
First, we specified the assets we wanted to use (Amazon and S&P 500) and the time frame. In Step 3, we downloaded the data from Yahoo Finance. Then, we only kept the last available price per month and calculated the monthly returns as the percentage change between the subsequent observations.
In Step 5, we calculated the beta as the ratio of the covariance between the risky asset and the benchmark to the benchmark’s variance.
In Step 6, we separated the target (Amazon’s stock returns) and the features (S&P 500 returns) using the pop
method of a pandas
DataFrame. Afterward, we added the constant to the features (effectively adding a column of ones to the DataFrame) with the add_constant
function.
The idea behind adding the intercept to this regression is to investigate whether—after estimating the model—the intercept (in the case of the CAPM, also known as Jensen’s alpha) is zero. If it is positive and significant, it means that— assuming the CAPM model is true—the asset or portfolio generates abnormally high risk-adjusted returns. There are two possible implications: either the market is inefficient or there are some other undiscovered risk factors that should be included in the model. This issue is known as the joint hypothesis problem.
We can also use the formula notation, which adds the constant automatically. To do so, we must import statsmodels.formula.api
as smf
and then run the slightly modified line: capm_model = smf.ols(formula="asset ~ market", data=X).fit()
. The results of both approaches are the same. You can find the complete code in the accompanying Jupyter notebook.
Lastly, we ran the OLS regression and printed the summary. Here, we could see that the coefficient by the market variable (that is, the CAPM beta) is equal to the beta that was calculated using the covariance between the asset and the market in Step 5.
In the example above, we assumed there was no risk-free rate, which is a reasonable assumption to make nowadays. However, there might be cases when we would like to account for a non-zero risk-free rate. To do so, we could use one of the following approaches.
The market premium (rm - rf ) and the risk-free rate (approximated by the one-month Treasury Bill) can be downloaded from Professor Kenneth French’s website (please refer to the See also section of this recipe for the link).
Please bear in mind that the definition of the market benchmark used by Prof. French is different from the S&P 500 index—a detailed description is available on his website. For a description of how to easily download the data, please refer to the Implementing the Fama-French three-factor model recipe.
The second option is to approximate the risk-free rate with, for example, the 13-Week (3-month) Treasury Bill (Yahoo Finance ticker: ^IRX
).
Follow these steps to learn how to download the data and convert it into the appropriate risk-free rate:
N_DAYS = 90
df_rf = yf.download("^IRX",
start=START_DATE,
end=END_DATE,
progress=False)
rf = df_rf.resample("M").last().Close / 100
rf = ( 1 / (1 - rf * N_DAYS / 360) )**(1 / N_DAYS)
rf = (rf ** 30) - 1
rf.plot(title="Risk-free rate (13-Week Treasury Bill)")
Figure 8.2 shows the visualization of the risk-free rate over time:
Figure 8.2: The risk-free rate calculated using the 13-Week Treasury Bill
The last approach is to approximate the risk-free rate using the 3-Month Treasury Bill (Secondary Market Rate), which can be downloaded from the Federal Reserve Economic Data (FRED) database.
Follow these steps to learn how to download the data and convert it to a monthly risk-free rate:
import pandas_datareader.data as web
rf = web.DataReader(
"TB3MS", "fred", start=START_DATE, end=END_DATE
)
rf = (1 + (rf / 100)) ** (1 / 12) - 1
rf.plot(title="Risk-free rate (3-Month Treasury Bill)")
We can compare the results of the two methods by comparing the plots of the risk-free rates:
Figure 8.3: The risk-free rate calculated using the 3-Month Treasury Bill
The above lets us conclude that the plots look very similar.
Additional resources are available here:
In their famous paper, Fama and French expanded the CAPM model by adding two additional factors explaining the excess returns of an asset or portfolio. The factors they considered are:
Please see the See also section for a reference to how the factors are calculated.
The model can be represented as follows:
Or in its simpler form:
Here, E(ri ) denotes the expected return on asset i, rf is the risk-free rate (such as a government bond), and is the intercept. The reason for including the intercept is to make sure its value is equal to 0. This confirms that the three-factor model correctly evaluates the relationship between the excess returns and the factors.
In the case of a statistically significant, non-zero intercept, the model might not evaluate the asset/portfolio return correctly. However, the authors stated that the three-factor model is “fairly correct,” even when it is unable to pass the statistical test.
Due to the popularity of this approach, these factors became collectively known as the Fama-French Factors or the Three-Factor Model. They have been widely accepted in both academia and the industry as stock market benchmarks and they are often used to evaluate investment performance.
In this recipe, we estimate the three-factor model using 5 years (2016 to 2020) of monthly returns on Apple’s stock.
Follow these steps to implement the three-factor model in Python:
import pandas as pd
import yfinance as yf
import statsmodels.formula.api as smf
import pandas_datareader.data as web
RISKY_ASSET = "AAPL"
START_DATE = "2016-01-01"
END_DATE = "2020-12-31"
ff_dict = web.DataReader("F-F_Research_Data_Factors",
"famafrench",
start=START_DATE,
end=END_DATE)
The downloaded dictionary contains three elements: the monthly factors from the requested time frame (indexed as 0
), the corresponding annual factors (indexed as 1
), and a short description of the dataset (indexed as DESCR
).
factor_3_df = ff_dict[0].rename(columns={"Mkt-RF": "MKT"})
.div(100)
factor_3_df.head()
The resulting data should look as follows:
Figure 8.4: Preview of the downloaded factors
asset_df = yf.download(RISKY_ASSET,
start=START_DATE,
end=END_DATE,
adjusted=True)
y = asset_df["Adj Close"].resample("M")
.last()
.pct_change()
.dropna()
y.index = y.index.to_period("m")
y.name = "rtn"
factor_3_df = factor_3_df.join(y)
factor_3_df["excess_rtn"] = (
factor_3_df["rtn"] - factor_3_df["RF"]
)
ff_model = smf.ols(formula="excess_rtn ~ MKT + SMB + HML",
data=factor_3_df).fit()
print(ff_model.summary())
The results of the three-factor model are presented below:
Figure 8.5: The summary of the estimated three-factor model
When interpreting the results of the three-factor model, we should pay attention to two issues:
In our case, the intercept is positive, but not statistically significant at the 5% significance level. Of the risk factors, only the SMB factor is not significant. However, a thorough literature study is required to formulate a hypothesis about the factors and their direction of influence.
We can also look at the F-statistic that was presented in the regression summary, which tests the joint significance of the regression. The null hypothesis states that coefficients of all features (factors, in this case), except for the intercept, have values equal to 0. We can see that the corresponding p-value is much lower than 0.05, which gives us reason to reject the null hypothesis at the 5% significance level.
In the first two steps, we imported the required libraries and defined the parameters – the risky asset (Apple’s stock) and the considered time frame.
In Step 3, we downloaded the data using the functionality of the pandas_datareader
library. We had to specify which dataset (see the There’s more... section for information on inspecting the available datasets) and reader (famafrench
) we wanted to use, as well as the start/end dates (by default, web.DataReader
downloads the last 5 years’ worth of data).
In Step 4, we selected only the dataset containing monthly values (indexed as 0
in the downloaded dictionary), renamed the column containing the MKT factor, and divided all the values by 100. We did it to arrive at the correct encoding of percentages; for example, a value of 3.45 in the dataset represents 3.45%.
In Steps 5 and 6, we downloaded and wrangled the prices of Apple’s stock. We obtained the monthly returns by calculating the percentage change of the end-of-month prices. In Step 6, we also changed the formatting of the index to %Y-%m
(for example, 2000-12
) since the Fama-French factors contain dates in such a format. Then, we joined the two datasets in Step 7.
Finally, in Step 8, we ran the regression using the formula notation—we do not need to manually add an intercept when doing so. One thing worth mentioning is that the coefficient by the MKT variable will not be equal to the CAPM’s beta, as there are also other factors in the model, and the factors’ influence on the excess returns is distributed differently.
We can use the following snippet to see what datasets from the Fama-French category are available for download using pandas_datareader
. For brevity, we only display 5 of the approximately 300 available datasets:
from pandas_datareader.famafrench import get_available_datasets
get_available_datasets()[:5]
Running the snippet returns the following list:
['F-F_Research_Data_Factors',
'F-F_Research_Data_Factors_weekly',
'F-F_Research_Data_Factors_daily',
'F-F_Research_Data_5_Factors_2x3',
'F-F_Research_Data_5_Factors_2x3_daily']
In the previous edition of the book, we also showed how to directly download the CSV files from Prof. French’s website using simple Bash commands from within a Jupyter notebook. You can find the code explaining how to do that in the accompanying notebook.
Additional resources:
In this recipe, we learn how to estimate the three-factor model in a rolling fashion. What we mean by rolling is that we always consider an estimation window of a constant size (60 months, in this case) and roll it through the entire dataset, one period at a time. A potential reason for doing such an experiment is to test the stability of the results. Alternatively, we could also use an expanding window for this exercise.
In contrast to the previous recipes, this time, we use portfolio returns instead of a single asset. To keep things simple, we assume that our allocation strategy is to have an equal share of the total portfolio’s value in each of the following stocks: Amazon, Google, Apple, and Microsoft. For this experiment, we use stock prices from the years 2010 to 2020.
Follow these steps to implement the rolling three-factor model in Python:
import pandas as pd
import numpy as np
import yfinance as yf
import statsmodels.formula.api as smf
import pandas_datareader.data as web
ASSETS = ["AMZN", "GOOG", "AAPL", "MSFT"]
WEIGHTS = [0.25, 0.25, 0.25, 0.25]
START_DATE = "2010-01-01"
END_DATE = "2020-12-31"
factor_3_df = web.DataReader("F-F_Research_Data_Factors",
"famafrench",
start=START_DATE,
end=END_DATE)[0]
factor_3_df = factor_3_df.div(100)
asset_df = yf.download(ASSETS,
start=START_DATE,
end=END_DATE,
adjusted=True,
progress=False)
asset_df = asset_df["Adj Close"].resample("M")
.last()
.pct_change()
.dropna()
asset_df.index = asset_df.index.to_period("m")
asset_df["portfolio_returns"] = np.matmul(
asset_df[ASSETS].values, WEIGHTS
)
factor_3_df = asset_df.join(factor_3_df).drop(ASSETS, axis=1)
factor_3_df.columns = ["portf_rtn", "mkt", "smb", "hml", "rf"]
factor_3_df["portf_ex_rtn"] = (
factor_3_df["portf_rtn"] - factor_3_df["rf"]
)
def rolling_factor_model(input_data, formula, window_size):
coeffs = []
for start_ind in range(len(input_data) - window_size + 1):
end_ind = start_ind + window_size
ff_model = smf.ols(
formula=formula,
data=input_data[start_ind:end_ind]
).fit()
coeffs.append(ff_model.params)
coeffs_df = pd.DataFrame(
coeffs,
index=input_data.index[window_size - 1:]
)
return coeffs_df
For a version with a docstring explaining the input/output, please refer to this book’s GitHub repository.
MODEL_FORMULA = "portf_ex_rtn ~ mkt + smb + hml"
results_df = rolling_factor_model(factor_3_df,
MODEL_FORMULA,
window_size=60)
(
results_df
.plot(title = "Rolling Fama-French Three-Factor model",
style=["-", "--", "-.", ":"])
.legend(loc="center left",bbox_to_anchor=(1.0, 0.5))
)
Executing the code results in the following plot:
Figure 8.6: The coefficients of the rolling three-factor model
By inspecting the preceding plot, we can see the following:
In Steps 3 and 4, we downloaded data using pandas_datareader
and yfinance
. This is very similar to what we did in the Estimating the Fama-French three-factor model recipe, so at this point we will not go into too much detail about this.
In Step 6, we calculated the portfolio returns as a weighted average of the returns of the portfolio constituents (calculated in Step 5). This is possible as we are working with simple returns—for more details, please refer to the Converting prices to returns recipe in Chapter 2, Data Preprocessing. Bear in mind that this simple approach assumes that, at the end of each month, we have exactly the same asset allocation (as indicated by the weights). This can be achieved with portfolio rebalancing, that is, adjusting the allocation after a specified period of time to always match the intended weights’ distribution.
Afterward, we merged the two datasets in Step 7. In Step 8, we defined a function for estimating the n-factor model using a rolling window. The main idea is to loop over the DataFrame we prepared in previous steps and for each month, estimate the Fama-French model using the last 5 years’ worth of data (60 months). By appropriately slicing the input DataFrame, we made sure that we only estimate the model from the 60th month onward, to make sure we always have a full window of observations.
Proper software engineering best practices would suggest writing some assertions to make sure the types of the inputs are as we intended them to be, or that the input DataFrame contains the necessary columns. However, we have not done this here for brevity.
Finally, we applied the defined function to the prepared DataFrame and plotted the results.
In this recipe, we implement two extensions of the Fama-French three-factor model.
First, Carhart’s four-factor model: The underlying assumption of this extension is that, within a short period of time, a winner stock will remain a winner, while a loser will remain a loser. An example of a criterion for classifying winners and losers could be the last 12-month cumulative total returns. After identifying the two groups, we long the winners and short the losers within a certain holding period.
The momentum factor (WML; Winners Minus Losers) measures the excess returns of the winner stocks over the loser stocks in the past 12 months (please refer to the See also section of this recipe for references on the calculations of the momentum factor).
The four-factor model can be expressed as follows:
The second extension is Fama-French’s five-factor model. Fama and French expanded their three-factor model by adding two factors:
The five-factor model can be expressed as follows:
Like in all factor models, if the exposure to the risk factors captures all possible variations in expected returns, the intercept () for all the assets/portfolios should be equal to zero.
In this recipe, we explain monthly returns on Amazon from 2016 to 2020 with the four- and five-factor models.
Follow these steps to implement the four- and five-factor models in Python:
import pandas as pd
import yfinance as yf
import statsmodels.formula.api as smf
import pandas_datareader.data as web
RISKY_ASSET = "AMZN"
START_DATE = "2016-01-01"
END_DATE = "2020-12-31"
# three factors
factor_3_df = web.DataReader("F-F_Research_Data_Factors",
"famafrench",
start=START_DATE,
end=END_DATE)[0]
# momentum factor
momentum_df = web.DataReader("F-F_Momentum_Factor",
"famafrench",
start=START_DATE,
end=END_DATE)[0]
# five factors
factor_5_df = web.DataReader("F-F_Research_Data_5_Factors_2x3",
"famafrench",
start=START_DATE,
end=END_DATE)[0]
asset_df = yf.download(RISKY_ASSET,
start=START_DATE,
end=END_DATE,
adjusted=True,
progress=False)
y = asset_df["Adj Close"].resample("M")
.last()
.pct_change()
.dropna()
y.index = y.index.to_period("m")
y.name = "rtn"
# join all datasets on the index
factor_4_df = factor_3_df.join(momentum_df).join(y)
# rename columns
factor_4_df.columns = ["mkt", "smb", "hml", "rf", "mom", "rtn"]
# divide everything (except returns) by 100
factor_4_df.loc[:, factor_4_df.columns != "rtn"] /= 100
# calculate excess returns
factor_4_df["excess_rtn"] = (
factor_4_df["rtn"] - factor_4_df["rf"]
)
# join all datasets on the index
factor_5_df = factor_5_df.join(y)
# rename columns
factor_5_df.columns = [
"mkt", "smb", "hml", "rmw", "cma", "rf", "rtn"
]
# divide everything (except returns) by 100
factor_5_df.loc[:, factor_5_df.columns != "rtn"] /= 100
# calculate excess returns
factor_5_df["excess_rtn"] = (
factor_5_df["rtn"] - factor_5_df["rf"]
)
four_factor_model = smf.ols(
formula="excess_rtn ~ mkt + smb + hml + mom",
data=factor_4_df
).fit()
print(four_factor_model.summary())
Figure 8.7 shows the results:
Figure 8.7: The summary of the estimated four-factor model
five_factor_model = smf.ols(
formula="excess_rtn ~ mkt + smb + hml + rmw + cma",
data=factor_5_df
).fit()
print(five_factor_model.summary())
Figure 8.8 shows the results:
Figure 8.8: The summary of the estimated five-factor model
According to the five-factor model, Amazon’s excess returns are negatively exposed to most of the factors (all but the market factor). Here, we present an example of the interpretation of the coefficients: an increase by 1 percentage point in the market factor results in an increase of 0.015 p.p. In other words, for a 1% return by the market factor, we can expect our portfolio (Amazon’s stock) to return 1.5117 * 1% in excess of the risk-free rate.
Similar to the three-factor model, if the five-factor model fully explains the excess stock returns, the estimated intercept should be statistically indistinguishable from zero (which is the case for the considered problem).
In Step 2, we defined the parameters—the ticker of the considered stock and timeframes.
In Step 3, we downloaded the necessary datasets using pandas_datareader
, which provides us with a convenient way of downloading the risk factor-related data without manually downloading the CSV files. For more information on this process, please refer to the Estimating the Fama-French three-factor model recipe.
In Steps 4 and 5, we downloaded Amazon’s stock prices and calculated the monthly returns using the previously explained methodology.
In Steps 6 and 7, we joined all the datasets, renamed the columns, and calculated the excess returns. When using the join
method without specifying what we want to join on (the on
argument), the default is the index of the DataFrames.
This way, we prepared all the necessary inputs for the four- and five-factor models. We also had to divide all the data we downloaded from Prof. French’s website by 100 to arrive at the correct scale.
SMB factor in the five-factor dataset is calculated differently compared to how it is in the three-factor dataset. For more details, please refer to the link in the See also section of this recipe.
In Step 8 and Step 9, we estimated the models using the functional form of OLS regression from the statsmodels
library. The functional form automatically adds the intercept to the regression equation.
For details on the calculation of the factors, please refer to the following links:
For papers introducing the four- and five-factor models, please refer to the following links:
In the previous recipes, we have covered estimating different factor models using a single asset or portfolio as the dependent variable. However, we can estimate the factor models for multiple assets at once, using cross-section (panel) data.
Following this approach, we can:
Knowing the risk premiums, we can then estimate the returns for any portfolio provided we can approximate that portfolio’s exposure to the risk factors.
While estimating cross-sectional regression, we can encounter multiple problems due to the fact that some assumptions of linear regression might not hold. We might encounter the following:
To solve those issues, we can use a technique called the Fama-MacBeth regression, which is a two-step procedure specifically designed to estimate the premiums rewarded by the market for the exposure to certain risk factors.
The steps are as follows:
In this recipe, we estimate the Fama-MacBeth regression using five risk factors and the returns of 12 industry portfolios, also available on Prof. French’s website.
Execute the following steps to estimate the Fama-MacBeth regression:
import pandas as pd
import pandas_datareader.data as web
from linearmodels.asset_pricing import LinearFactorModel
START_DATE = "2010"
END_DATE = "2020-12"
factor_5_df = (
web.DataReader("F-F_Research_Data_5_Factors_2x3",
"famafrench",
start=START_DATE,
end=END_DATE)[0]
.div(100)
)
portfolio_df = (
web.DataReader("12_Industry_Portfolios",
"famafrench",
start=START_DATE,
end=END_DATE)[0]
.div(100)
.sub(factor_5_df["RF"], axis=0)
)
factor_5_df = factor_5_df.drop("RF", axis=1)
five_factor_model = LinearFactorModel(
portfolios=portfolio_df,
factors=factor_5_df
)
result = five_factor_model.fit()
print(result)
Running the snippet generates the following summary:
Figure 8.9: The results of the Fama-MacBeth regression
The results in the table are average risk premiums from the T cross-sectional regressions.
We can also print the full summary (containing the risk premiums and each portfolio’s factor loadings). To do so, we need to run the following line of code:
print(result.full_summary)
In the first two steps, we imported the required libraries and defined the start and end date of our exercise. In total, we will be using 11 years of monthly data, resulting in 132 observations of the variables (denoted as T ). For the end date, we had to specify 2020-12
. Using 2020
alone would result in the downloaded datasets ending with January 2020.
In Step 3, we downloaded the five-factor data set using pandas_datareader
. We adjusted the values to express percentages by dividing them by 100.
In Step 4, we downloaded the returns on 12 Industry Portfolios from Prof. French’s website (please see the link in the See also section for more details on the dataset). We have also adjusted the values by dividing them by 100 and calculated the excess returns by subtracting the risk-free rate (available at the factor dataset) from each column of the portfolio dataset. We could do that easily using the sub
method as the time periods are an exact match.
In Step 5, we dropped the risk-free rate, as we will not be using it anymore and it will be easier to estimate the Fama-MacBeth regression model with no redundant columns in the DataFrames.
In the last step, we instantiated an object of the LinearFactorModel
class and provided both datasets as arguments. Then, we used the fit
method to estimate the model. Lastly, we printed the summary.
You might notice a small difference between linearmodels
and scikit-learn
. In the latter, we provide the data while calling the fit
method. With linearmodels
, we had to provide the data while creating an instance of the LinearFactorModel
class.
In linearmodels
, you can also use the formula notation (as we have done when estimating the factor models using statsmodels
). To do so, we need to use the from_formula
method. An example could look as follows: LinearFactorModel.from_formula(formula, data)
, where formula
is the string containing the formula and data
is an object containing both the portfolios/assets and the factors.
We have already estimated the Fama-MacBeth regression using the linearmodels
library. However, it might strengthen our understanding of the procedure to carry out the two steps manually.
Execute the following steps to carry out the two steps of the Fama-MacBeth procedure separately:
from statsmodels.api import OLS, add_constant
factor_loadings = []
for portfolio in portfolio_df:
reg_1 = OLS(
endog=portfolio_df.loc[:, portfolio],
exog=add_constant(factor_5_df)
).fit()
factor_loadings.append(reg_1.params.drop("const"))
factor_load_df = pd.DataFrame(
factor_loadings,
columns=factor_5_df.columns,
index=portfolio_df.columns
)
factor_load_df.head()
Running the code generates the following table containing the factor loadings:
Figure 8.10: First step of the Fama-MacBeth regression—the estimated factor loadings
We can compare those numbers to the output of the full summary from the linearmodels
library.
risk_premia = []
for period in portfolio_df.index:
reg_2 = OLS(
endog=portfolio_df.loc[period, factor_load_df.index],
exog=factor_load_df
).fit()
risk_premia.append(reg_2.params)
risk_premia_df = pd.DataFrame(
risk_premia,
index=portfolio_df.index,
columns=factor_load_df.columns.tolist())
risk_premia_df.head()
Running the code generates the following table containing the risk premiums over time:
Figure 8.11: Second step of the Fama-MacBeth regression—the estimated risk premiums over time
risk_premia_df.mean()
Running the snippet returns:
Mkt-RF 0.012341
SMB -0.006291
HML -0.008927
RMW -0.000908
CMA -0.002484
The risk premiums calculated above match the ones obtained from the linearmodels
library.
linearmodels
library can be a good resource for learning about panel regression models (and not only that—it also contains utilities for instrumental variables models and so on) and their implementation in Python: https://bashtage.github.io/linearmodels/index.htmlFurther reading about the Fama-MacBeth procedure:
In this chapter, we have constructed some of the most popular factor models. We have started with the simplest one-factor model (the CAPM) and then explained how to approach more advanced three-, four-, and five-factor models. We have also described how we can use the Fama-MacBeth regression to estimate the factor models for multiple assets with appropriate cross-section (panel) data.