View source: R/SetFactorModel.R
SetFactorModel | R Documentation |
In data analyses and data mining, there are procedures regularly carried to prepare the data sets for analyses. These procedures may be simply aimed at carrying basic checks on data sets, or at applying preliminary analyses to "modify" the initial data set (among which data cleaning is perhaps the best known). This helper function aims to prepare factor model data for further analyses.
SetFactorModel(data, lrhs, clean.method, clean.bounds, across.panel, ...)
data |
A |
lrhs |
A character vector specifying the following |
clean.method |
A character string. One of |
clean.bounds |
A character vector indicating |
across.panel |
A boolean. Would you like to clean |
... |
Any additional pass through parameter. TODO: param lagged A boolean. |
A data.frame
with values on which the selected procedures have been applied.
TODO: crucial checks on cross-section consistency
The function is implemented to carry several data cleaning procedures. These procedures are often needed in empirical analyses because financial data are tipically subject to outliers. Common statistical analyses tend to suffer the effects of these extreme data points, in the sense that their output may result unreliable. Several methods, mostly in the realm of Robust Statistics, are designed to detect and alleviate the undue effects of such biases on the phenomena being analyzed. Engle et al. (2016) illustrates commonly adopted techniques in empirical finance:
Winsorization
Truncation
These methods are summarized below to the extents of our implementation. Additional information is provided to give some background and further guidance.
This technique consists in setting "the values of a given variable that are above or below a certain cutoff to that cutoff". The objective is clearly that of dealing with "moderate" variables, to the extents the phenomena under investigation is not being substancially altered. The cutoff at which winsorization should be performed depends mainly on how noisy is the variable being analyzed, more noisy variables tends to be winsorized at a higher cutoff.
Similar to Winsorization, except that the values of a given variable that are above or below a certain cutoff are removed altogether.
Winsorization and Truncation are usually conducted symmetrically, meaning that both series ends levels are equal. However this needs not to be. It is possible to carry the cleaning procedures at arbitrarily asymmetric levels, depending on how noisy is financial data being analyzed. This a researchers' decision.
There are two ways to perform either cleaning technique:
Cross-sectionally. Percentiles are based on all values of the given variables cross-section.
Time-indexed. Percentiles are computed based on each time period separately.
Which to choose depends on the type of statistical analysis to be carried. Engle et al. (2016) suggest that:
if a single-stage analysis will be performed on the entire panel of data, the first method is most appropriate;
in two-stage analyses the second approach is usually preferable.
They also suggest that if any of these choices is assessed to be substantially influence analyses results, the methodology should be seen with suspicion.
Whether to use either one is a difficult question to answer in general as some outliers are "legitimate" while others may be data errors. Most empirical asset pricing researchers choose to use Winsorization instead of truncation as it resembles more closely the robust approach to statistic analyses. In other words, Winsorization preserves the number of observations in the panel being analyzed and this is a good reason to prefer it. It remains, however, a researchers' decision.
Vito Lestingi
Bali, T.G., Engle, R.F., and Murray, S. (2016). Empirical Asset Pricing. The Cross Section of Stock Returns. Wiley.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.