| sim_na | R Documentation |
sim_na() corrupts a given data matrix D such that a random perc
percent of its entries are set to be missing (set to NA). Used by
grid_search_cv() in constructing test matrices for PCP models. Can be
used for experimentation with PCP models.
Note: only observed values can be corrupted as NA. This means if a matrix
D already has e.g. 20% of its values missing, then
sim_na(D, perc = 0.2) would result in a matrix with 40% of
its values as missing.
Should e.g. perc = 0.6 be passed as input when D only has e.g. 10% of its
entries left as observed, then all remaining corruptable entries will be
set to NA.
sim_na(D, perc, seed = 42)
D |
The input data matrix. |
perc |
A double in the range |
seed |
(Optional) An integer specifying the seed for the random
selection of entries in |
A list containing:
D_tilde: The original matrix D with a random perc percent of its
entries set to NA.
tilde_mask: A binary matrix of dim(D) specifying the locations of
corrupted entries (1) and uncorrupted entries (0).
grid_search_cv(), sim_lod(), impute_matrix(), sim_data()
# Simple example corrupting 20% of a 5x5 matrix
D <- matrix(1:25, 5, 5)
corrupted_data <- sim_na(D, perc = 0.2)
corrupted_data$D_tilde
sum(is.na(corrupted_data$D_tilde)) / prod(dim(corrupted_data$D_tilde))
# Now corrupting another 20% ontop of the original 20%
double_corrupted <- sim_na(corrupted_data$D_tilde, perc = 0.2)
double_corrupted$D_tilde
sum(is.na(double_corrupted$D_tilde)) / prod(dim(double_corrupted$D_tilde))
# Corrupting the remaining entries by passing in a large value for perc
all_corrupted <- sim_na(double_corrupted$D_tilde, perc = 1)
all_corrupted$D_tilde
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.