impute missing values for a matrix via nuclear-norm regularization.

Share:

Description

fit a low-rank matrix approximation to a matrix with missing values via nuclear-norm regularization. The algorithm works like EM, filling in the missing values with the current guess, and then solving the optimization problem on the complete matrix using a soft-thresholded SVD. Special sparse-matrix classes available for very large matrices.

Usage

1
2
softImpute(x, rank.max = 2, lambda = 0, type = c("als", "svd"), thresh = 1e-05,
           maxit = 100, trace.it = FALSE, warm.start = NULL, final.svd = TRUE)

Arguments

x

An m by n matrix with NAs. For large matrices can be of class "Incomplete", in which case the missing values are represented as pseudo zeros leading to dramatic storage reduction. x can have been centered and scaled via biScale, and this information is carried along with the solution.

rank.max

This restricts the rank of the solution. If sufficiently large, and with type="svd", the solution solves the nuclear-norm convex matrix-completion problem. In this case the number of nonzero singular values returned will be less than or equal to rank.max. If smaller ranks are used, the solution is not guaranteed to solve the problem, although still results in good local minima. rank.max should be no bigger than min(dim(x)-1.

lambda

nuclear-norm regularization parameter. If lambda=0, the algorithm reverts to "hardImpute", for which convergence is typically slower, and to local minimum. Ideally lambda should be chosen so that the solution reached has rank slightly less than rank.max. See also lambda0() for computing the smallest lambda with a zero solution.

type

two algorithms are implements, type="svd" or the default type="als". The "svd" algorithm repeatedly computes the svd of the completed matrix, and soft thresholds its singular values. Each new soft-thresholded svd is used to re-impute the missing entries. For large matrices of class "Incomplete", the svd is achieved by an efficient form of alternating orthogonal ridge regression. The "als" algorithm uses this same alternating ridge regression, but updates the imputation at each step, leading to quite substantial speedups in some cases. The "als" approach does not currently have the same theoretical convergence guarantees as the "svd" approach.

thresh

convergence threshold, measured as the relative change in the Frobenius norm between two successive estimates.

maxit

maximum number of iterations.

trace.it

with trace.it=TRUE, convergence progress is reported.

warm.start

an svd object can be supplied as a warm start. This is particularly useful when constructing a path of solutions with decreasing values of lambda and increasing rank.max. The previous solution can be provided directly as a warm start for the next.

final.svd

only applicable to type="als". The alternating ridge-regressions do not lead to exact zeros. With the default final.svd=TRUE, at the final iteration, a one step unregularized iteration is performed, followed by soft-thresholding of the singular values, leading to hard zeros.

Details

SoftImpute solves the following problem for a matrix X with missing entries:

\min||X-M||_o^2 +λ||M||_*.

Here ||\cdot||_o is the Frobenius norm, restricted to the entries corresponding to the non-missing entries of X, and ||M||_* is the nuclear norm of M (sum of singular values). For full details of the "svd" algorithm are described in the reference below. The "als" algorithm will be described in a forthcoming article. Both methods employ special sparse-matrix tricks for large matrices with many missing values. This package creates a new sparse-matrix class "SparseplusLowRank" for matrices of the form

x+ab',

where x is sparse and a and b are tall skinny matrices, hence ab' is low rank. Methods for efficient left and right matrix multiplication are provided for this class. For large matrices, the function Incomplete() can be used to build the appropriate sparse input matrix from market-format data.

Value

An svd object is returned, with components "u", "d", and "v". If the solution has zeros in "d", the solution is truncated to rank one more than the number of zeros (so the zero is visible). If the input matrix had been centered and scaled by biScale, the scaling details are assigned as attributes inherited from the input matrix.

Author(s)

Trevor Hastie, Rahul Mazumder
Maintainer: Trevor Hastie hastie@stanford.edu

References

Rahul Mazumder, Trevor Hastie and Rob Tibshirani (2010) Spectral Regularization Algorithms for Learning Large Incomplete Matrices, http://www.stanford.edu/~hastie/Papers/mazumder10a.pdf
Journal of Machine Learning Research 11 (2010) 2287-2322

See Also

biScale, svd.als,\ codeIncomplete, lambda0, impute, complete

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
set.seed(101)
n=200
p=100
J=50
np=n*p
missfrac=0.3
x=matrix(rnorm(n*J),n,J)%*%matrix(rnorm(J*p),J,p)+matrix(rnorm(np),n,p)/5
ix=seq(np)
imiss=sample(ix,np*missfrac,replace=FALSE)
xna=x
xna[imiss]=NA
###uses regular matrix method for matrices with NAs
fit1=softImpute(xna,rank=50,lambda=30)
###uses sparse matrix method for matrices of class "Incomplete"
xnaC=as(xna,"Incomplete")
fit2=softImpute(xnaC,rank=50,lambda=30)
###uses "svd" algorithm
fit3=softImpute(xnaC,rank=50,lambda=30,type="svd")
ximp=complete(xna,fit1)
### first scale xna
xnas=biScale(xna)
fit4=softImpute(xnas,rank=50,lambda=10)
ximp=complete(xna,fit4)
impute(fit4,i=c(1,3,7),j=c(2,5,10))
impute(fit4,i=c(1,3,7),j=c(2,5,10),unscale=FALSE)#ignore scaling and centering