knitr::opts_knit$set(
  root.dir = normalizePath('..')
)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  echo = TRUE,
  warning = FALSE,
  error = FALSE
)

Introduction

These notes aim at reproducing several works on factors modeling and their comparisons. The framework is the one adopted by \textcite{hou-al-2018}, who systematically examine how seemingly different factor models are related among each other. In particular, their approach is focused on comparing their contributed q-factor models against an assorted set of emblematic models from various strains of the asset pricing literature, on both conceptual and empirical grounds. These benchmark (fundamental) factor models include "standard" ones as the Fama-French five-factor model \parencite{fama-french-2015} and the Fama-French six-factor model \parencite{fama-french-2018}, models methodologically divergent from the Fama-French approach such as \textcite{stambaugh-yuan-2017} and \textcite{daniel-hirshleifer-sun-2020}, and also the "hybrid" model of \textcite{barillas-shanken-2018}. Analyses involving q-factor models and "traditional" models, that is the Fama-French models, are of particular interest given latter authors' pioneering work in the field and thus the fact their models are the de facto benchmarks in the broader literature. In what follows we first of all introduce target models and their theoretical grounds, then proceed with spanning regressions for their statistical comparison and performance assessments. We can largely reproduce \textcite{hou-al-2018}'s empirical results, therefore confirming most of their findings.

Data and methodology

Authors make available data for factors returns analyzed in their research at global-q.org. Importantly, data sets are open-sourced and updated regularly. Similarly to the well known Fama-French academic library, data series they provide span several frequencies including daily, weekly (calendar, Friday close to Friday close), weekly (Wednesday-to-Wednesday, Wednesday close to Wednesday close), monthly, quarterly, and annual. In what follows we limit our attention to monthly data, likely the most common time series frequency adopted in the literature.

# Load data sets ready to use
data(
  list=c(
    'FF5.monthly', 'FF6.monthly', 
    'Q4.monthly', 'Q5.monthly', 
    'SY4.monthly',
    'DHS3.monthly'
  )
)

# Assemble Barillas-Shanken (2018)'s 6-factor data set
# 
# No out-of-the-box replication data set found for Barillas-Shanken (2018)'s 6-factor 
# model, we obtain data from several sources
# 
## Fama-French's MKT.RF and SMB factors
bs.mktrf <- FF5.monthly$MKT.RF
bs.smb <- FF5.monthly$SMB
## Asness-Frazzini's HML.DEV and UMD (MOM) factors
source(file.path('inst', 'parsers', 'Value--Devil-in-HMLs-Details.R'))


# not sure we need to call xts (was throwing error for "needing proper time based object and HML_Devil.HML.DEV$USA is already an xts object). Added code so 'bs.hml' end up being a subset by column "USA". Same change for 'bs.mom' object.

#bs.hml <- xts::xts(HML_Devil.HML.DEV$USA, HML_Devil.HML.DEV$DATE)
#bs.mom <- xts::xts(HML_Devil.UMD$USA, HML_Devil.UMD$DATE)
bs.hml <- HML_Devil.HML.DEV$USA
bs.mom <- HML_Devil.UMD$USA
rm(
  HML_Devil.HML_FF, HML_Devil.ME_1, HML_Devil.RF, 
  HML_Devil.MKT, HML_Devil.SMB, HML_Devil.HML.DEV, HML_Devil.UMD
)
## Hou-Xue-Zhang's IA and ROE factors
bs.ia <- Q4.monthly$IA
bs.roe <- Q4.monthly$ROE

# Our Barillas-Shanken data set
BS6.monthly <- merge(bs.mktrf, bs.smb, bs.hml, bs.mom, bs.ia, bs.roe)
BS6.monthly <- BS6.monthly[complete.cases(BS6.monthly), ]
colnames(BS6.monthly) <- c('MKT.RF', 'SMB', 'HML.DEV', 'MOM', 'IA', 'ROE')
rm(bs.mktrf, bs.smb, bs.hml, bs.mom, bs.ia, bs.roe)

It should be noted that there exist small nuances between the risk-free interest rates (one month T-Bill, RF) downloadable from French's website and those one can obtain from global-q.org. These differences do not seem to be due to import conversions or numerical rounding, they may come directly from the source. In our context, they propagate to the market proxy factor (MKT.RF) and thus if anything we are forced to keep the two separate although they shall be conceptually the same (depending on the assets universe considered).

Q-factor models

Their q-factor model is inspired by the investment-based approach to the asset pricing theory.

q-factor model

In this section we first of all summarize research and the model introduced by \textcite{hou-al-2015}, then aim at reproducing it. It consists of four factors: the market factor ($MKT$), a size factor ($ME$), an investment factor ($IA$), and a profitability factor ($ROE$). The factor regressions specification used to assess the q-factor model performance is $$ r_{i,t} - r_{f, t} = {\alpha}{i,q} + {\beta}{MKT,i}{MKT}{t} + {\beta}{ME,i}r_{ME,t} + {\beta}{I/A,i}r{I/A,t} + {\beta}{ROE,i}r{ROE,t} + {\epsilon}{i,t} $$ where, as usual, $r{i,t} - r_{f, t}$ stands for the returns on excess of the risk-free rate. Authors find their model to largely capture the cross section of average stock returns.

We analyze and share a factors monthly data set authors kindly made available on their website as an .RData object, further details can be found at ?Q4.monthly.

$q^5$ factor model

In this section we first of all summarize research and the model introduced by \textcite{hou-al-2020}, then aim at reproducing it. This model is an extension of the previous four-factor q model in that it additionally includes the expected growth factor, $EG$. The factor regressions specification used to assess the $q^5$-factor model performance is then $$ r_{i,t} - r_{f, t} = {\alpha}{i,q} + {\beta}{MKT,i}{MKT}{t} + {\beta}{ME,i}r_{ME,t} + {\beta}{I/A,i}r{I/A,t} + {\beta}{ROE,i}r{ROE,t} + {\beta}{EG,i}r{EG,t} + {\epsilon}_{i,t} $$

We analyze and share a factors monthly data set authors kindly made available on their website as an .RData object, further details can be found at ?Q5.monthly.

Benchmark models

Fama-French five-factor model

The Fama-French five-factor model was introduced by \textcite{fama-french-2015}. In this model Fama and French include the RMW (Robust Minus Weak) and CMA (Conservative Minus Aggressive) to their standard \textcite{fama-french-1993} three-factor model. Factors are constructed using the six value-weight portfolios formed on size, the six value-weight portfolios formed on size and operating profitability, and the six value-weight portfolios formed on size and investment.

Fama-French six-factor model

The Fama-French six-factor model is by \textcite{fama-french-2018}. It builds on the just discussed Fama-French five-factor model and adds the MOM momentum factor, which is also commonly referred to as Up Minus Down (UMD). At this time we are unable to study the "alternative Fama-French six-factor model", which has the RMWc cash-based factor in place of the RMW factor, because authors apparently discontinued its publication. Readers willing to study this model would thus need to construct it themselves.

Stambaugh-Yuan four-factor model

We provide a brief introduction to \textcite{stambaugh-yuan-2017} factors construction, focusing on their original contributed ones. In later sections the model is studied in comparison with the q-factor and q5 models, on a monthly basis. We share the factors monthly data set authors kindly made available on their website as an .RData object, further details can be found at ?SY4.monthly.

Stambaugh-Yuan analyze a set of 11 anomalies. They categorize these anomalies in two clusters: First cluster: net stock issues, composite equity issues, accruals, net operating assets, asset growth, and investment to assets. Second cluster: distress, O-score, momentum, gross profitability, and return on assets.

Authors construct factors based on equally-weighted averages of stocks' anomaly rankings, in the perspective of having a less noisy mispricing measure for each stock across anomalies. In particular, stock's rankings are averaged with respect to the available anomaly measures within each of the two clusters. Thus, each month a stock has two composite mispricing measures, $P1$ and $P2$. Mispricing factors are then constructed by applying a $2 \times 3$ sorting procedure, similarly to \textcite{fama-french-2015}: First, NYSE, AMEX, and NASDAQ stocks (excluding the ones with a price lower than 5\$) are sorted and split into two groups based on the NYSE median size breakpoint; Second, stock's are sorted by both $P1$ and $P2$ independently, and assigned to three groups ("low", "middle", and "high") with the 20th and 80th percentiles of the NYSE/AMEX/NASDAQ as breakpoints (rather than the commonly used 30th and 70th percentiles of the NYSE). The motivation authors provide for this methodological choice on breakpoints is that relative mispricing in the cross-section is considered to be "more a property of the extremes than of the middle". Finally, value-weighted returns of each of the four portfolios formed by the intersection of the two size categories with high and low categories of either $P1$ or $P2$ sorts are averaged and constitute their two mispricing factors, MGMT and PERF*, respectively.

The \textcite{fama-french-2015}'s market proxy and size factors complete the four-factor model.

Daniel-Hirshleifer-Sun three-factor model

We provide a brief introduction to \textcite{daniel-hirshleifer-sun-2020} (DHS) factors construction, focusing on their original contributed ones. In later sections the model is studied in comparison with the q-factor and q5 models, on a monthly basis. We share the factors monthly data set authors kindly made available on their website as an .RData object, further details can be found at ?DHS3.monthly.

Simplifying, DHS factors construction is the following procedure: First, all NYSE, AMEX, and NASDAQ common stocks (CRSP 10 or 11 share codes, excluding financial firms and firms with negative book equity). Second, at June end firms are assigned to one of two size groups ("small" and "big"), depending on their ME being below or above the NYSE median size breakpoint. Finally, firms are also independently sorted into one of three financing groups ("low", "middle", "high") based on: either stocks' 1-year NSI and 5-year CSI financing measures rankings for the FIN factor, or the 4-day cumulative abnormal return around the most recent quarterly earnings announcement date (CAR) for the PEAD* factor. Both sorts are with respect to NYSE 20th and 80th percentiles breakpoints.

The \textcite{fama-french-2015}'s market proxy factor completes the three-factor model.

Barillas-Shanken six-factor model

We provide a brief introduction to \textcite{barillas-shanken-2018}. In later sections the model is studied in comparison with the q-factor and q5 models, on a monthly basis. To our knowledge there is not an out-of-the-box replication data set readily available for this model, but we rather assemble the model factors from other data sets available. At this time we are unable to share the complete data set as an .RData object to simply load into the environment.

In brief, six factors included in the model are: \textcite{fama-french-2015}'s market proxy excess returns (MKT.RF) and size (SMB) factors, \textcite{asness-frazzini-2013}'s "HML devil" (HML.DEV) and momentum (MOM, or equivalently UMD) factors, and finally \textcite{hou-al-2015}'s investment (IA) and profitability (ROE) factors.

Models empirical comparisons

This section goal is twofold. First, closely following authors, we run spanning regressions in order to compare the q-factor models across the set of benchmark models considered and vice versa. Second, we compute a correlation matrix that includes the complete set of factors involved in all the models discussed.

Let us begin with spanning regressions. It is important to notice that more often than not \textcite{hou-al-2018} construct factors included in the models studied (or alternative versions). The reason behind such replications is that models present deviations from the standard construction procedure of Fama-French, thus they strive to make the models wholly comparable. Because we only use factors data sets disseminated by respective authors themselves, this may lead to slight deviations or inability to provide results (for unavailable alternative versions data).

SpanningRegressions <- function(Y, X, ...) {
  Yn <- rep(Y, each=length(X)) 
  X <- rep(X, length(Y))
  # Models regressions
  fits <- mapply(
    function(y, x) {
      mod <- formula(
        paste(y, paste(x, collapse='+'), sep='~')
      )
      fit <- lm(mod, ...)
      return(fit)
    },
    as.list(Yn), X,
    SIMPLIFY=FALSE
  )
  # Newey-West t-statistics
  nwts <- lapply(fits, function(x) {
    lmtest::coeftest(x, vcov.=sandwich::NeweyWest(x, ...), ...)
  })
  # TODO: names overlap, workaround or longer names
  # Naming by regressand
  # names(fits) <- paste(Yn, 'reg', sep='.')
  # names(nwts) <- paste(Yn, 'nwts', sep='.')
  return(list(fits=fits, nwts=nwts))
}
q4.vars <- c('MKT.RF.1', 'ME', 'IA', 'ROE')
q5.vars <- c(q4.vars, 'EG')
ff5.vars <- c('MKT.RF', 'SMB', 'HML', 'RMW', 'CMA')
ff6.vars <- c(ff5.vars, 'MOM')
data.qff <- merge(FF6.monthly, Q5.monthly)
data.qff <- data.qff[complete.cases(data.qff), c(ff6.vars, q5.vars)]
# Sample period
data.qff <- data.qff['1967-01/2016-12']

# Panel A: Explaining q and q5 factors
q.on.ff <- SpanningRegressions(q5.vars, list(ff5.vars, ff6.vars), data=data.qff)
lapply(q.on.ff$fits, summary)

# Panel B: Explaining Fama–French factors
ff.on.q <- SpanningRegressions(ff6.vars, list(q4.vars, q5.vars), data=data.qff)
lapply(ff.on.q$fits, summary)

\textcite{stambaugh-yuan-2017} series available begin from 1963-01. \textcite{hou-al-2018} compare models on the 1967-01 to 2016-12 sample period. Also, since the SY factors construction presents deviations from the traditional approach, \textcite{hou-al-2018} construct and provide results for the SY model replicated with standard approach. At this time we are not constructing factors and thus we are unable to include such additional set of results.

sy4.vars <- c('MKT.RF', 'SMB', 'MGMT', 'PERF')
data.qsy <- merge(SY4.monthly, Q5.monthly)
data.qsy <- data.qsy[complete.cases(data.qsy), c(q5.vars, sy4.vars)]
# Sample period
data.qsy <- data.qsy['1967-01/2016-12']

# Panel A: Explaining q and q5 factors
q.on.sy <- SpanningRegressions(q5.vars, list(sy4.vars), data=data.qsy)
lapply(q.on.sy$fits, summary)

# Panel B: Explaining the Stambaugh–Yuan factors
sy.on.q <- SpanningRegressions(sy4.vars, list(q4.vars, q5.vars), data=data.qsy)
lapply(sy.on.q$fits, summary)

We have DHS series available on the 1972-07 to 2018-12 period at most, of course series can be cut off on the sample period \textcite{hou-al-2018} consider, i.e. 1972-07 to 2016-12. Also, \textcite{hou-al-2018} raise some concerns on DHS factors construction, especially when it deviates from the traditional approach. For this reason they construct and provide results for the DHS model obtained with the standard approach, on the 1967-01 to 2016-12 sample period. At this time we are not constructing factors and thus we are unable to include such additional set of results.

dhs3.vars <- c('MKT.RF', 'PEAD', 'FIN')
data.qdhs <- merge(DHS3.monthly, Q5.monthly)
data.qdhs <- data.qdhs[complete.cases(data.qdhs), c(q5.vars, dhs3.vars)]
# Sample period
data.qdhs <- data.qdhs['1972-07/2016-12']

# Panel A: Explaining q and q5 factors
q.on.dhs <- SpanningRegressions(q5.vars, list(dhs3.vars), data=data.qdhs)
lapply(q.on.dhs$fits, summary)

# Panel B: Explaining the Daniel-Hirshleifer-Sun factors
dhs.on.q <- SpanningRegressions(dhs3.vars, list(q4.vars, q5.vars), data=data.qdhs)
lapply(dhs.on.q$fits, summary)
bs6.vars <- c('MKT.RF', 'SMB', 'HML.DEV', 'MOM', 'IA', 'ROE')
data.qbs <- merge(BS6.monthly, Q5.monthly)
data.qbs <- data.qbs[complete.cases(data.qbs), unique(c(q5.vars, bs6.vars))]
# Sample period
data.qbs <- data.qbs['1967-01/2016-12']

# Panel A: Regressing non-overlapping q5 factors on Barillas–Shanken factors
q5.on.bs <- SpanningRegressions(c('ME', 'EG'), list(bs6.vars), data=data.qbs)
lapply(q5.on.bs$fits, summary)

# Panel B: Regressing Asness–Frazzini HML factor on the q-factor and q5 models
hmldev.on.q <- SpanningRegressions(c('HML.DEV'), list(q4.vars, q5.vars), data=data.qbs)
lapply(hmldev.on.q$fits, summary)

Next, the correlation matrix, computed with factors series from July 1972 to December 2016. Also, we run a correlation coefficient test at a 95% confidence level -- with the null hypothesis being a statistically indistinguishable from zero coefficient -- and report p-values obtained.

CorFactors <- function(data, ...) {
  # Correlation matrix of the full set of factors
  cor.estimates <- cor(data, ...)
  # Correlation coefficient test
  vars <- colnames(data)
  nvars <- length(vars)
  vars.pairs <- combn(vars, 2)
  cor.pvalues <- apply(vars.pairs, 2, function(j) {
    cor.test(data[, j[1]], data[, j[2]])$p.value
  })
  cor.pvalues <- matrix(cor.pvalues, nvars, nvars, byrow=TRUE)
  rownames(cor.pvalues) <- colnames(cor.pvalues) <- vars
  return(
    list(estimates=cor.estimates, pvalues=cor.pvalues)
  )
}

vars <- unique(c(q5.vars, ff6.vars, sy4.vars, dhs3.vars, bs6.vars))
data.all <- merge(FF6.monthly, Q5.monthly, SY4.monthly, DHS3.monthly, BS6.monthly)
data.all <- data.all[complete.cases(data.all), vars]
factors.correlation <- CorFactors(data.all)


JustinMShea/ExpectedReturns documentation built on Sept. 9, 2023, 9:41 p.m.