imputeX  R Documentation 
Replace 'NA'/'NaN' values in new 'X' data according to the model predictions, given that same 'X' data and optionally 'U' data.
Note: this function will not perform any internal reindexing for the data. If the 'X' to which the data was fit was a 'data.frame', the numeration of the items will be under 'model$info$item_mapping'. There is also a function predict_new which will let the model do the appropriate reindexing.
imputeX( model, X, weight = NULL, U = NULL, U_bin = NULL, nthreads = model$info$nthreads )
model 
A collective matrix factorization model as output by function CMF. This functionality is not available for the other model classes. 
X 
New 'X' data with missing values which will be imputed. Must be passed as a dense matrix from base R (class 'matrix'). 
weight 
Associated observation weights for entries in 'X'. If passed, must have the same shape as 'X'. 
U 
New 'U' data, with rows matching to those of 'X'. Can be passed in the following formats:

U_bin 
New binary columns of 'U' (rows matching to those of 'X'). Must be passed as a dense matrix from base R or as a 'data.frame'. 
nthreads 
Number of parallel threads to use. 
If using the matrix factorization model as a general missingvalue imputer, it's recommended to:
Fit a model without user biases.
Set a lower regularization for the item biases than for the matrices.
Tune the regularization parameter(s) very well.
In general, matrix factorization works better for imputation of selected entries of sparseandwide matrices, whereas for dense matrices, the method is unlikely to provide better results than mean/median imputation, but it is nevertheless provided for experimentation purposes.
The 'X' matrix with its missing values imputed according to the model predictions.
library(cmfrec) ### Simplest example SeqMat < matrix(1:50, nrow=10) SeqMat[2,1] < NaN SeqMat[8,3] < NaN m < CMF(SeqMat, k=1, lambda=1e10, nthreads=1L, verbose=FALSE) imputeX(m, SeqMat) ### Better example with multivariate normal data if (require("MASS")) { ### Generate random data, set some values as NA set.seed(1) n_rows < 1000 n_cols < 5 mu < rnorm(n_cols) S < matrix(rnorm(n_cols^2), nrow = n_cols) S < t(S) %*% S X < MASS::mvrnorm(n_rows, mu, S) X_na < X values_NA < matrix(runif(n_rows*n_cols) < .15, nrow=n_rows) X_na[values_NA] < NaN ### In the event that any column is fully missing if (any(colSums(is.na(X_na)) == n_rows)) { cols_remove < colSums(is.na(X_na)) == n_rows X_na < X_na[, !cols_remove, drop=FALSE] values_NA < values_NA[, !cols_remove, drop=FALSE] } ### Impute missing values with model model < CMF(X_na, k=3, lambda=c(0,0,1,1,1,1), user_bias=FALSE, verbose=FALSE, nthreads=1L) X_imputed < imputeX(model, X_na) cat(sprintf("RMSE for imputed values w/model: %f\n", sqrt(mean((X[values_NA]  X_imputed[values_NA])^2)))) ### Compare against simple mean imputation X_means < apply(X_na, 2, mean, na.rm=TRUE) X_imp_mean < X_na for (cl in 1:n_cols) X_imp_mean[values_NA[,cl], cl] < X_means[cl] cat(sprintf("RMSE for imputed values w/means: %f\n", sqrt(mean((X[values_NA]  X_imp_mean[values_NA])^2)))) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.