imputeX  R Documentation 
Replace 'NA'/'NaN' values in new 'X' data according to the model predictions, given that same 'X' data and optionally 'U' data.
Note: this function will not perform any internal reindexing for the data. If the 'X' to which the data was fit was a 'data.frame', the numeration of the items will be under 'model$info$item_mapping'. There is also a function predict_new which will let the model do the appropriate reindexing.
imputeX(
model,
X,
weight = NULL,
U = NULL,
U_bin = NULL,
nthreads = model$info$nthreads
)
model 
A collective matrix factorization model as output by function CMF. This functionality is not available for the other model classes. 
X 
New 'X' data with missing values which will be imputed. Must be passed as a dense matrix from base R (class 'matrix'). 
weight 
Associated observation weights for entries in 'X'. If passed, must have the same shape as 'X'. 
U 
New 'U' data, with rows matching to those of 'X'. Can be passed in the following formats:

U_bin 
New binary columns of 'U' (rows matching to those of 'X'). Must be passed as a dense matrix from base R or as a 'data.frame'. 
nthreads 
Number of parallel threads to use. 
If using the matrix factorization model as a general missingvalue imputer, it's recommended to:
Fit a model without user biases.
Set a lower regularization for the item biases than for the matrices.
Tune the regularization parameter(s) very well.
In general, matrix factorization works better for imputation of selected entries of sparseandwide matrices, whereas for dense matrices, the method is unlikely to provide better results than mean/median imputation, but it is nevertheless provided for experimentation purposes.
The 'X' matrix with its missing values imputed according to the model predictions.
library(cmfrec)
### Simplest example
SeqMat < matrix(1:50, nrow=10)
SeqMat[2,1] < NaN
SeqMat[8,3] < NaN
m < CMF(SeqMat, k=1, lambda=1e10, nthreads=1L, verbose=FALSE)
imputeX(m, SeqMat)
### Better example with multivariate normal data
if (require("MASS")) {
### Generate random data, set some values as NA
set.seed(1)
n_rows < 1000
n_cols < 5
mu < rnorm(n_cols)
S < matrix(rnorm(n_cols^2), nrow = n_cols)
S < t(S) %*% S
X < MASS::mvrnorm(n_rows, mu, S)
X_na < X
values_NA < matrix(runif(n_rows*n_cols) < .15, nrow=n_rows)
X_na[values_NA] < NaN
### In the event that any column is fully missing
if (any(colSums(is.na(X_na)) == n_rows)) {
cols_remove < colSums(is.na(X_na)) == n_rows
X_na < X_na[, !cols_remove, drop=FALSE]
values_NA < values_NA[, !cols_remove, drop=FALSE]
}
### Impute missing values with model
model < CMF(X_na, k=3, lambda=c(0,0,1,1,1,1),
user_bias=FALSE,
verbose=FALSE, nthreads=1L)
X_imputed < imputeX(model, X_na)
cat(sprintf("RMSE for imputed values w/model: %f\n",
sqrt(mean((X[values_NA]  X_imputed[values_NA])^2))))
### Compare against simple mean imputation
X_means < apply(X_na, 2, mean, na.rm=TRUE)
X_imp_mean < X_na
for (cl in 1:n_cols)
X_imp_mean[values_NA[,cl], cl] < X_means[cl]
cat(sprintf("RMSE for imputed values w/means: %f\n",
sqrt(mean((X[values_NA]  X_imp_mean[values_NA])^2))))
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.