impute missing values for a matrix via nuclearnorm regularization.
Description
fit a lowrank matrix approximation to a matrix with missing values via nuclearnorm regularization. The algorithm works like EM, filling in the missing values with the current guess, and then solving the optimization problem on the complete matrix using a softthresholded SVD. Special sparsematrix classes available for very large matrices.
Usage
1 2 
Arguments
x 
An m by n matrix with NAs. For large matrices can be of class

rank.max 
This restricts the rank of the solution. If sufficiently large, and with

lambda 
nuclearnorm regularization parameter. If 
type 
two algorithms are implements, 
thresh 
convergence threshold, measured as the relative change in the Frobenius norm between two successive estimates. 
maxit 
maximum number of iterations. 
trace.it 
with 
warm.start 
an svd object can be supplied as a warm start. This is particularly
useful when constructing a path of solutions with decreasing values of

final.svd 
only applicable to 
Details
SoftImpute solves the following problem for a matrix X with missing entries:
\minXM_o^2 +λM_*.
Here \cdot_o is the Frobenius norm, restricted to the entries
corresponding to the
nonmissing entries of X, and M_* is the nuclear norm
of M (sum of singular values).
For full details of the "svd" algorithm are described in the reference
below. The "als" algorithm will be described in a forthcoming
article. Both methods employ special sparsematrix tricks for large
matrices with many missing values. This package creates a new
sparsematrix class "SparseplusLowRank"
for matrices of the form
x+ab',
where x is sparse and a and b are tall
skinny matrices, hence ab' is low rank. Methods for efficient left
and right matrix multiplication are provided for this class. For large
matrices, the function Incomplete()
can be used to build the
appropriate
sparse input matrix from marketformat data.
Value
An svd object is returned, with components "u", "d", and "v".
If the solution has zeros in "d", the solution is truncated to rank one
more than the number of zeros (so the zero is visible). If the input
matrix had been centered and scaled by biScale
, the scaling
details are assigned as attributes inherited from the input matrix.
Author(s)
Trevor Hastie, Rahul Mazumder
Maintainer: Trevor Hastie hastie@stanford.edu
References
Rahul Mazumder, Trevor Hastie and Rob Tibshirani (2010)
Spectral Regularization Algorithms for Learning Large Incomplete
Matrices,
http://www.stanford.edu/~hastie/Papers/mazumder10a.pdf
Journal of Machine Learning Research 11 (2010) 22872322
See Also
biScale
, svd.als
,\ codeIncomplete,
lambda0
, impute
, complete
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26  set.seed(101)
n=200
p=100
J=50
np=n*p
missfrac=0.3
x=matrix(rnorm(n*J),n,J)%*%matrix(rnorm(J*p),J,p)+matrix(rnorm(np),n,p)/5
ix=seq(np)
imiss=sample(ix,np*missfrac,replace=FALSE)
xna=x
xna[imiss]=NA
###uses regular matrix method for matrices with NAs
fit1=softImpute(xna,rank=50,lambda=30)
###uses sparse matrix method for matrices of class "Incomplete"
xnaC=as(xna,"Incomplete")
fit2=softImpute(xnaC,rank=50,lambda=30)
###uses "svd" algorithm
fit3=softImpute(xnaC,rank=50,lambda=30,type="svd")
ximp=complete(xna,fit1)
### first scale xna
xnas=biScale(xna)
fit4=softImpute(xnas,rank=50,lambda=10)
ximp=complete(xna,fit4)
impute(fit4,i=c(1,3,7),j=c(2,5,10))
impute(fit4,i=c(1,3,7),j=c(2,5,10),unscale=FALSE)#ignore scaling and centering
