Description Usage Arguments Details Value References
The softImpute
algorithm is used to impute missing values.
For more details, see softImpute
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | impute_soft(
data_ref,
data_new = NULL,
cols = dplyr::everything(),
rank_max_ovrl = min(dim(data_ref) - 1),
rank_max_init = min(2, rank_max_ovrl),
rank_stp_size = 1,
lambda = seq(rank_max_ovrl * 0.6, 1, length.out = 10),
grid = FALSE,
restore_data = TRUE,
verbose = 1,
bs = TRUE,
bs_maxit = 20,
bs_thresh = 1e-09,
bs_row.center = FALSE,
bs_col.center = TRUE,
bs_row.scale = FALSE,
bs_col.scale = TRUE,
si_type = "als",
si_thresh = 1e-05,
si_maxit = 100,
si_final.svd = TRUE
)
|
data_ref |
a data frame. |
data_new |
an optional data frame. If supplied, then |
cols |
columns that should be imputed and/or used to impute other columns. Supports tidy select functions (see examples). |
rank_max_ovrl |
an integer value that restricts the rank of the
solution for all |
rank_max_init |
an integer value that restricts the rank of the
solution for the first |
rank_stp_size |
an integer value that indicates how much the maximum
rank of |
lambda |
nuclear-norm regularization parameter. If |
grid |
a logical value. If |
restore_data |
a logical value. If |
verbose |
an integer value of 0, 1, or 2. If |
bs |
a logical value. If |
bs_maxit |
an integer indicating the maximum number of iterations
for the |
bs_thresh |
convergence threshold for the |
bs_row.center |
a logical value. If |
bs_col.center |
a logical value. If |
bs_row.scale |
a logical value. If |
bs_col.scale |
a logical value. If |
si_type |
two algorithms are implemented, type="svd" or the default type="als". The "svd" algorithm repeatedly computes the svd of the completed matrix, and soft thresholds its singular values. Each new soft-thresholded svd is used to re-impute the missing entries. For large matrices of class "Incomplete", the svd is achieved by an efficient form of alternating orthogonal ridge regression. The "als" algorithm uses this same alternating ridge regression, but updates the imputation at each step, leading to quite substantial speedups in some cases. The "als" approach does not currently have the same theoretical convergence guarantees as the "svd" approach. |
si_thresh |
convergence threshold for the |
si_maxit |
maximum number of iterations for the |
si_final.svd |
only applicable to |
Multiple imputation: The number of imputations returned depends on
rank_max_init
, rank_max_ovrl
, rank_stp_size
, lambda
, and grid
.
If grid
is FALSE
, then there will be length(lambda)
imputed value
sets in the returned output, and they will be based on fitted
softImpute
models with increasing maximum ranks. Generally, these ranks
are seq(rank_max_init, rank_max_ovrl, by = rank_stp_size)
, but will
be automatically adjusted to have consistency with (1) lambda
and
(2) the maximum allowed rank for data_ref
as needed. If grid
is
TRUE
, then every combination of lambda
and the rank sequence
will be fitted and the output will contain one set of imputed values
for each combination.
Rank inputs: If rank is sufficiently large, and with si_type="svd"
,
the softImpute
algorithm solves the nuclear-norm convex
matrix-completion problem (see Reference 1). In this case the number
of nonzero singular values returned will be less than or equal to
the maximum rank. If smaller ranks are used, the solution is not
guaranteed to solve the problem, although still results in good local
minima. The rank of a softImpute
fit should not exceed
min(dim(data_ref) - 1
.
biScale The softImpute::biScale()
function is more flexible than
the current function indicates. Specifically, biScale
allows users
to supply vectors to its row/column centering/scaling inputs that will
in turn be used to center/scale the corresponding rows/columns.
impute_soft()
is more strict and does not offer this option.
Also, impute_soft()
uses different default values to increase the
likelihood of the biScale
algorithm converging quickly.
a data frame with fitting parameters and imputed values.
Rahul Mazumder, Trevor Hastie and Rob Tibshirani (2010) Spectral Regularization Algorithms for Learning Large Incomplete Matrices, http://www.stanford.edu/~hastie/Papers/mazumder10a.pdf, Journal of Machine Learning Research 11 (2010) 2287-2322
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.