Estimate Variance Matrix via Statistical Factors
Description
Creates a variance matrix based on the principal components of the variables that have no missing values.
Usage
1 2 3 4 5 6  factor.model.stat(x, weights = seq(0.5, 1.5, length.out = nobs),
output = "full", center = TRUE, frac.var = 0.5, iter.max = 1,
nfac.miss = 1, full.min = 20, reg.min = 40, sd.min = 20,
quan.sd = 0.9, tol = 0.001, zero.load = FALSE,
range.factors = c(0, Inf), constant.returns.okay = FALSE,
specific.floor = 0.1, floor.type = "quantile", verbose=2)

Arguments
x 
required.
A numeric matrix.
The rows are observations and the columns are the variables.
In finance, this will be a matrix of returns where the rows are
times and the columns are assets.
For the default value of 
weights 
a vector of observation weights, or Equal weights can be specified with Otherwise, the length must be equal to either
the original number of rows
in 
output 
a character string indicating the form of the result.
It must partially match one of: 
center 
either a logical value or a numeric vector with length equal to
the number of columns in 
frac.var 
a control on the number of factors to use – the number of factors
is chosen so that the factors account for (just over) 
iter.max 
the maximum number of times to iterate the search for principal factors of the variables with complete data. 
nfac.miss 
a vector of integers giving the number of factors to use in regressions
for variables with missing values.
The number of factors used is equal to the ith element of

full.min 
an integer giving the minimum number of variables that must have complete data. 
reg.min 
the minimum number of nonmissing values for a variable in order for a regression to be performed on the variable. 
sd.min 
the minimum number of nonmissing values for a variable in order for the standard deviation to be estimated from the data. 
quan.sd 
the quantile of the standard deviations to use for the standard deviation of variables that do not have enough data for the standard deviation to be estimated. 
tol 
a number giving the tolerance for the principal factor convergence
(using the assets with full data).
If the maximum change in uniquenesses (in the correlation scale) is
less than 
zero.load 
a logical value.
If 
range.factors 
a numeric vector that gives the maximum and minimum number of factors that are allowed to be used. 
constant.returns.okay 
a logical vector: if if the true variance is thought to be nonzero, then
a better alternative to setting this to 
specific.floor 
a number indicating how much uniquenesses should be adjusted upwards.
The meaning of this number depends on the value of the

floor.type 
a character string that partially matches one of:
If the value is If the value is 
verbose 
a number indicating the level of warning messages desired. This currently controls warnings: If at least 1, then a warning will be issued if all the values
in If at least 1, then a warning will be issued if there are any
assets with constant returns (unless If at least 2, then a warning will be issued if there are any specific variances that are adjusted from being negative. 
Value
if output
is "full"
, then a variance matrix with dimensions
equal to the number of columns in the input x
.
This has two additional attributes: number.of.factors
that says
how many factors are used in the model, and timestamp
that gives
the date and time that the object was created.
if output
is "systematic"
, then a matrix with dimensions
equal to the number of columns in the input x
that contains
the systematic portion of the variance matrix.
if output
is "specific"
, then a diagonal matrix with dimensions
equal to the number of columns in the input x
that contains
the specific variance portion of the variance matrix.
The full variance matrix is the sum of the systematic and specific matrices.
If output
is "factor"
, then an object of class
"statfacmodBurSt"
which is a list with components:
loadings 
a matrix of the loadings for the correlation matrix. 
uniquenesses 
the uniquenesses for the correlation matrix. That is, the proportion of the variance that is not explained by the factors. Note that if there are uniquenesses that have been modified
via the 
sdev 
the standard deviations for the variables. Note that if there are uniquenesses that have been modified
via the 
constant.names 
A character vector giving the names of the variables that are constant (if any). 
cumulative.variance.fraction 
numeric vector giving the cumulative fraction of the variance explained by (all) the factors. 
timestamp 
character string giving the date and time the calculation was completed. 
call 
an image of the call that created the object. 
Details
Observations that are missing on all variables are deleted. Then a principal components factor model is estimated with the variables that have complete data.
For variables that have missing values, the standard deviation is estimated when there are enough obeservations otherwise a given quantile of the standard deviations of the other assets is used as the estimate. The loadings for these variables are set to be either the average loading for the variables with no missing data, or zero. The loadings for the most important factors are modified by performing a regression with the nonmissing data for each variable (if there is enough data to do the regression).
The treatment of variables with missing values can be quite important.
You may well benefit from specializing how missing values are handled to
your particular problem.
To do this, set the output to "factor"
– then you can modify the
loadings (and per force the uniquenesses), and the standard deviations to
fit your situation.
This may include taking sectors and countries into account, for example.
The default settings for missing value treatment are suitable for creating a variance matrix for longonly portfolio optimization – high volatility and average correlation. Take note that the proper treatment of missing values is HIGHLY dependent on the use to which the variance matrix is to be put.
OBSERVATION WEIGHTS. Time weights are quite helpful for estimating variances from returns. The default weighting seems to perform reasonably well over a range of situations.
FACTOR MODEL TO FULL MODEL.
This class of object has a method for fitted
which
returns the variance matrix corresponding to the factor model representation.
Warning
The default value for weights
assumes that the last row is the most
recent observation and the first observation is the most ancient observation.
Research Issues
The method of handling missing values used in the function has not been well studied. It seems not to be the worst approach, but undoubtedly can be improved.
The default method of boosting the result away from singularity is completely unstudied. For optimization it is wise to move away from singularity, just how to do that best seems like a research question.
Revision
This help was last revised 2014 March 09.
Author(s)
Burns Statistics
See Also
fitted.statfacmodBurSt
,
var.shrink.eqcor
, cov.wt
,
slideWeight
.
Examples
1 2 3 4 5 6 7 8 9 10  ## Not run:
varian1 < factor.model.stat(retmat)
varfac < factor.model.stat(retmat, nfac=0, zero=TRUE, output="fact")
varian2 < fitted(varfac) # get matrix from factor model
varian3 < factor.model.stat(retmat, nfac=rep(c(5,3,1), c(20,40,1)))
## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker. Vote for new features on Trello.