Description Usage Arguments Details Value Note Author(s) References See Also Examples
Maximum likelihood estimation of the mean and covariance matrix of multivariate normal (MVN) distributed data with a monotone missingness pattern. Through the use of parsimonious/shrinkage regressions (e.g., plsr, pcr, ridge, lasso, etc.), where standard regressions fail, this function can handle an (almost) arbitrary amount of missing data
1 2 3 4 
y 
data 
pre 
logical indicating whether preprocessing of the

method 
describes the type of parsimonious
(or shrinkage) regression to
be performed when standard least squares regression fails.
From the pls package we have 
p 
when performing regressions, 
ncomp.max 
maximal number of (principal) components to include
in a 
batch 
indicates whether the columns with equal missingness should be processed together using a multiresponse regression. This is more efficient if many OLS regressions are used, but can lead to slightly poorer, even unstable, fits when parsimonious regressions are used 
validation 
method for cross validation when applying
a parsimonious regression method. The default setting
of 
obs 
logical indicating whether or not to (additionally)
compute a mean vector and covariance matrix based only on the observed
data, without regressions. I.e., means are calculated as averages
of each non 
verb 
whether or not to print progress indicators. The default
( 
quiet 
causes 
If pre = TRUE
then monomvn
first rearranges the columns
of y
into nondecreasing order with respect to the number of
missing (NA
) entries. Then (at least) the first column should
be completely observed. The mean components and covariances between
the first set of complete columns are obtained through the standard
mean
and cov
routines.
Next each successive group of columns with the same missingness pattern
is processed in sequence (assuming batch = TRUE
).
Suppose a total of j
columns have
been processed this way already. Let y2
represent the nonmissing
contingent of the next group of k
columns of y
with and identical missingness pattern, and let y1
be the
previously processed j1
columns of y
containing only the rows
corresponding to each nonNA
entry in y2
. I.e.,
nrow(y1) = nrow(y2)
. Note that y1
contains no
NA
entries since the missing data pattern is monotone.
The k
next entries (indices j:(j+k)
) of the mean vector,
and the j:(j+k)
rows and columns of the covariance matrix are
obtained by multivariate regression of y2
on y1
.
The regression method used (except in the case of method =
"factor"
depends on the number of rows and columns
in y1
and on the p
parameter. Whenever ncol(y1)
< p*nrow(y1)
leastsquares regression is used, otherwise
method = c("pcr", "plsr")
. If ever a leastsquares regression
fails due to colinearity then one of the other method
s is
tried. The "factor"
method always involves an OLS regression
on (a subset of) the first p
columns of y
.
All method
s require a scheme for estimating the amount of
variability explained by increasing the numbers of coefficients
(or principal components) in the model.
Towards this end, the pls and lars packages support
10fold cross validation (CV) or leaveoneout (LOO) CV estimates of
root mean squared error. See pls and lars for
more details. monomvn
uses
CV in all cases except when nrow(y1) <= 10
, in which case CV fails and
LOO is used. Whenever nrow(y1) <= 3
pcr
fails, so plsr
is used instead.
If quiet = FALSE
then a warning
is given whenever the first choice for a regression fails.
For pls methods, RMSEs are calculated for a number of
components in 1:ncomp.max
where
a NULL
value for ncomp.max
it is replaced with
ncomp.max < min(ncomp.max, ncol(y2), nrow(y1)1)
which is the max allowed by the pls package.
Simple heuristics are used to select a small number of components
(ncomp
for pls), or number of coefficients (for
lars), which explains a large amount of the variability (RMSE).
The lars methods use a “onestandard error rule” outlined
in Section 7.10, page 216 of HTF below. The
pls package does not currently support the calculation of
standard errors for CV estimates of RMSE, so a simple linear penalty
for increasing ncomp
is used instead. The ridge constant
(lambda) for lm.ridge
is set using the
optimize
function on the GCV
output.
Based on the ML ncol(y1)+1
regression coefficients (including
intercept) obtained for each of the
columns of y2
, and on the corresponding matrix
of
residual sum of squares, and on the previous j1
means
and rows/cols of the covariance matrix, the j:(j+k)
entries and
rows/cols can be filled in as described by Little and Rubin, section 7.4.3.
Once every column has been processed, the entries of the mean vector, and rows/cols of the covariance matrix are rearranged into their original order.
monomvn
returns an object of class "monomvn"
, which is a
list
containing a subset of the components below.
call 
a copy of the function call as used 
mu 
estimated mean vector with columns corresponding to the
columns of 
S 
estimated covariance matrix with rows and columns
corresponding to the columns of 
na 
when 
o 
when 
method 
method of regression used on each column, or

ncomp 
number of components in a 
lambda 
if 
mu.obs 
when 
S.obs 
when 
The CV in plsr and lars are random in nature, and so
can be dependent on the random seed. Use validation=LOO
for
deterministic (but slower) result.
When using method = "factor"
in the current version of
the package, the factors in the first p
columns of y
must also obey the monotone pattern, and,
have no more NA
entries than the other columns of y
.
Be warned that the lars implementation of
"forward.stagewise"
can sometimes get stuck in
(what seems like) an infinite loop.
This is not a bug in the monomvn
package;
the bug has been reported to the authors of lars
Robert B. Gramacy rbg@vt.edu
Robert B. Gramacy, Joo Hee Lee, and Ricardo Silva (2007).
On estimating covariances between many assets with histories
of highly variable length.
Preprint available on arXiv:0710.5837:
http://arxiv.org/abs/0710.5837
Roderick J.A. Little and Donald B. Rubin (2002). Statistical Analysis with Missing Data, Second Edition. Wilely.
BjornHelge Mevik and Ron Wehrens (2007). The pls Package: Principal Component and Partial Least Squares Regression in R. Journal of Statistical Software 18(2)
Bradley Efron, Trevor Hastie, Ian Johnstone and Robert Tibshirani
(2003).
Least Angle Regression (with discussion).
Annals of Statistics 32(2); see also
http://wwwstat.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf
Trevor Hastie, Robert Tibshirani and Jerome Friedman (2002). Elements of Statistical Learning. Springer, NY. [HTF]
Some of the code for monomvn
, and its subroutines, was inspired
by code found on the world wide web, written by Daniel Heitjan.
Search for “fcn.q”
http://bobby.gramacy.com/r_packages/monomvn
bmonomvn
, em.norm
in the now defunct norm
and mvnmle
packages
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55  ## standard usage, duplicating the results in
## Little and Rubin, section 7.4.3  try adding
## verb=3 argument for a stepbystep breakdown
data(cement.miss)
out < monomvn(cement.miss)
out
out$mu
out$S
##
## A bigger example, comparing the various methods
##
## generate N=100 samples from a 10d random MVN
xmuS < randmvn(100, 20)
## randomly impose monotone missingness
xmiss < rmono(xmuS$x)
## plsr
oplsr < monomvn(xmiss, obs=TRUE)
oplsr
Ellik.norm(oplsr$mu, oplsr$S, xmuS$mu, xmuS$S)
## calculate the complete and observed RMSEs
n < nrow(xmiss)  max(oplsr$na)
x.c < xmiss[1:n,]
mu.c < apply(x.c, 2, mean)
S.c < cov(x.c)*(n1)/n
Ellik.norm(mu.c, S.c, xmuS$mu, xmuS$S)
Ellik.norm(oplsr$mu.obs, oplsr$S.obs, xmuS$mu, xmuS$S)
## plcr
opcr < monomvn(xmiss, method="pcr")
Ellik.norm(opcr$mu, opcr$S, xmuS$mu, xmuS$S)
## ridge regression
oridge < monomvn(xmiss, method="ridge")
Ellik.norm(oridge$mu, oridge$S, xmuS$mu, xmuS$S)
## lasso
olasso < monomvn(xmiss, method="lasso")
Ellik.norm(olasso$mu, olasso$S, xmuS$mu, xmuS$S)
## lar
olar < monomvn(xmiss, method="lar")
Ellik.norm(olar$mu, olar$S, xmuS$mu, xmuS$S)
## forward.stagewise
ofs < monomvn(xmiss, method="forward.stagewise")
Ellik.norm(ofs$mu, ofs$S, xmuS$mu, xmuS$S)
## stepwise
ostep < monomvn(xmiss, method="stepwise")
Ellik.norm(ostep$mu, ostep$S, xmuS$mu, xmuS$S)

Loading required package: pls
Attaching package: 'pls'
The following object is masked from 'package:stats':
loadings
Loading required package: lars
Loaded lars 1.2
Loading required package: MASS
Call:
monomvn(y = cement.miss)
Methods used (p=0.9):
2 complete
3 lsr
[,1]
x1 6.655166
x2 49.965259
x3 11.769231
x4 27.047089
y 95.423077
x1 x2 x3 x4 y
x1 23.80287 22.26922 26.97542 12.28292 50.86578
x2 22.26922 258.86275 17.13549 273.80147 211.90393
x3 26.97542 17.13549 41.02564 10.39915 51.51923
x4 12.28292 273.80147 10.39915 319.38777 206.48170
y 50.86578 211.90393 51.51923 206.48170 226.31359
Call:
monomvn(y = xmiss, obs = TRUE)
Methods used (p=0.9):
1 complete
17 lsr
2 plsrCV, ncomp range: [4,7]
[1] 24423.72
[1] NA
[1] NA
[1] 24421.08
[1] 24422.77
[1] 24947.67
[1] 24421.34
[1] 24421.53
[1] 24423.06
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.