sim_data | R Documentation |
sim_data()
generates a simulated dataset D = L + S + Z
for
experimentation with Principal Component Pursuit (PCP) algorithms.
sim_data(
n = 100,
p = 10,
r = 3,
sparse_nonzero_idxs = NULL,
sigma = 0.05,
seed = 42
)
n , p |
(Optional) A pair of integers specifying the simulated dataset's
number of |
r |
(Optional) An integer specifying the rank of the simulated dataset's
low-rank component. Intuitively, the number of latent patterns governing
the simulated dataset. Must be that |
sparse_nonzero_idxs |
(Optional) An integer vector with
|
sigma |
(Optional) A double specifying the standard deviation of the
dense (Gaussian) noise component |
seed |
(Optional) An integer specifying the seed for random number
generation. By default, |
The data is simulated as follows:
L <- matrix(runif(n * r), n, r) %*% matrix(runif(r * p), r, p)
S <- matrix(0, n, p)
S[sparse_nonzero_idxs] <- 1
Z <- matrix(rnorm(n * p, sd = sigma), n, p)
D <- L + S + Z
A list containing:
D
: The observed data matrix, where D = L + S + Z
.
L
: The ground truth rank-r
low-rank matrix.
S
: The ground truth sparse matrix.
S
: The ground truth dense (Gaussian) noise matrix.
sim_na()
, sim_lod()
, impute_matrix()
# rank 3 example
data <- sim_data()
matrix_rank(data$D)
matrix_rank(data$L)
# rank 7 example
data <- sim_data(n = 1000, p = 25, r = 7)
matrix_rank(data$D)
matrix_rank(data$L)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.