Estimate Variance Matrix via Statistical Factors

Description

Creates a variance matrix based on the principal components of the variables that have no missing values.

Usage

1
2
3
4
5
6
factor.model.stat(x, weights = seq(0.5, 1.5, length.out = nobs), 
	output = "full", center = TRUE, frac.var = 0.5, iter.max = 1, 
	nfac.miss = 1, full.min = 20, reg.min = 40, sd.min = 20, 
	quan.sd = 0.9, tol = 0.001, zero.load = FALSE, 
	range.factors = c(0, Inf), constant.returns.okay = FALSE, 
	specific.floor = 0.1, floor.type = "quantile", verbose=2)

Arguments

x

required. A numeric matrix. The rows are observations and the columns are the variables. In finance, this will be a matrix of returns where the rows are times and the columns are assets. For the default value of weights the most recent observation should be the last row. The number of columns may exceed the number of rows, and missing values are accepted. A column may even have all missing values.

weights

a vector of observation weights, or NULL.

Equal weights can be specified with NULL or with a single positive number.

Otherwise, the length must be equal to either the original number of rows in x or the number of rows in x minus the number of rows that contain all missing values.

output

a character string indicating the form of the result. It must partially match one of: "full", "systematic", "specific" or "factor".

center

either a logical value or a numeric vector with length equal to the number of columns in x. If center is TRUE, then the mean of each column is used as the center. If center is FALSE, then the center for each variable is taken to be zero.

frac.var

a control on the number of factors to use – the number of factors is chosen so that the factors account for (just over) frac.var of the total variability.

iter.max

the maximum number of times to iterate the search for principal factors of the variables with complete data.

nfac.miss

a vector of integers giving the number of factors to use in regressions for variables with missing values. The number of factors used is equal to the i-th element of nfac.miss where i is the number of missing values for the variable. Thus the values in the vector should be non-increasing. The last value is used when the number of missing values is greater than the length of nfac.miss.

full.min

an integer giving the minimum number of variables that must have complete data.

reg.min

the minimum number of non-missing values for a variable in order for a regression to be performed on the variable.

sd.min

the minimum number of non-missing values for a variable in order for the standard deviation to be estimated from the data.

quan.sd

the quantile of the standard deviations to use for the standard deviation of variables that do not have enough data for the standard deviation to be estimated.

tol

a number giving the tolerance for the principal factor convergence (using the assets with full data). If the maximum change in uniquenesses (in the correlation scale) is less than tol from one iteration to the next, then convergence is assumed and the iterations end.

zero.load

a logical value. If TRUE, then loadings for variables with missing values are zero except for those estimated by regression. If FALSE, then loadings for variables with missing values are the average loading for the factor (when they are not estimated by regression).

range.factors

a numeric vector that gives the maximum and minimum number of factors that are allowed to be used.

constant.returns.okay

a logical vector: if TRUE, then a column with all of its non-missing values equal does not cause an error.

if the true variance is thought to be non-zero, then a better alternative to setting this to TRUE is to set all the values in the column of x to be NA.

specific.floor

a number indicating how much uniquenesses should be adjusted upwards. The meaning of this number depends on the value of the floor.type argument.

floor.type

a character string that partially matches one of: "quantile" or "fraction".

If the value is "quantile", then all uniquenesses are made to be at least as big as the specific.floor quantile of the uniquenesses.

If the value is "fraction", then all uniqueness are made to be at least specific.floor.

verbose

a number indicating the level of warning messages desired. This currently controls warnings:

If at least 1, then a warning will be issued if all the values in x are non-negative. In finance this is an indication that prices rather than returns are input (an easy mistake to make).

If at least 1, then a warning will be issued if there are any assets with constant returns (unless constant.returns.okay is FALSE in which case an error is thrown).

If at least 2, then a warning will be issued if there are any specific variances that are adjusted from being negative.

Value

if output is "full", then a variance matrix with dimensions equal to the number of columns in the input x. This has two additional attributes: number.of.factors that says how many factors are used in the model, and timestamp that gives the date and time that the object was created.

if output is "systematic", then a matrix with dimensions equal to the number of columns in the input x that contains the systematic portion of the variance matrix.

if output is "specific", then a diagonal matrix with dimensions equal to the number of columns in the input x that contains the specific variance portion of the variance matrix. The full variance matrix is the sum of the systematic and specific matrices.

If output is "factor", then an object of class "statfacmodBurSt" which is a list with components:

loadings

a matrix of the loadings for the correlation matrix.

uniquenesses

the uniquenesses for the correlation matrix. That is, the proportion of the variance that is not explained by the factors.

Note that if there are uniquenesses that have been modified via the specific.floor argument, then the actual proportion is the stated proportion divided by one plus the modification.

sdev

the standard deviations for the variables.

Note that if there are uniquenesses that have been modified via the specific.floor argument, then the corresponding standard deviations in sdev are smaller than the actual standard deviations in the answer.

constant.names

A character vector giving the names of the variables that are constant (if any).

cumulative.variance.fraction

numeric vector giving the cumulative fraction of the variance explained by (all) the factors.

timestamp

character string giving the date and time the calculation was completed.

call

an image of the call that created the object.

Details

Observations that are missing on all variables are deleted. Then a principal components factor model is estimated with the variables that have complete data.

For variables that have missing values, the standard deviation is estimated when there are enough obeservations otherwise a given quantile of the standard deviations of the other assets is used as the estimate. The loadings for these variables are set to be either the average loading for the variables with no missing data, or zero. The loadings for the most important factors are modified by performing a regression with the non-missing data for each variable (if there is enough data to do the regression).

The treatment of variables with missing values can be quite important. You may well benefit from specializing how missing values are handled to your particular problem. To do this, set the output to "factor" – then you can modify the loadings (and per force the uniquenesses), and the standard deviations to fit your situation. This may include taking sectors and countries into account, for example.

The default settings for missing value treatment are suitable for creating a variance matrix for long-only portfolio optimization – high volatility and average correlation. Take note that the proper treatment of missing values is HIGHLY dependent on the use to which the variance matrix is to be put.

OBSERVATION WEIGHTS. Time weights are quite helpful for estimating variances from returns. The default weighting seems to perform reasonably well over a range of situations.

FACTOR MODEL TO FULL MODEL. This class of object has a method for fitted which returns the variance matrix corresponding to the factor model representation.

Warning

The default value for weights assumes that the last row is the most recent observation and the first observation is the most ancient observation.

Research Issues

The method of handling missing values used in the function has not been well studied. It seems not to be the worst approach, but undoubtedly can be improved.

The default method of boosting the result away from singularity is completely unstudied. For optimization it is wise to move away from singularity, just how to do that best seems like a research question.

Revision

This help was last revised 2014 March 09.

Author(s)

Burns Statistics

See Also

fitted.statfacmodBurSt, var.shrink.eqcor, cov.wt, slideWeight.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: 
varian1 <- factor.model.stat(retmat)

varfac <- factor.model.stat(retmat, nfac=0, zero=TRUE, output="fact")

varian2 <- fitted(varfac) # get matrix from factor model

varian3 <- factor.model.stat(retmat, nfac=rep(c(5,3,1), c(20,40,1)))

## End(Not run)