Home

/

CRAN

/

rospca

/

datagen: Generate sparse data with outliers

datagen: Generate sparse data with outliers
In rospca: Robust Sparse PCA using the ROSPCA Algorithm

dataGen

R Documentation

Generate sparse data with outliers

Description

Generate sparse data with outliers using simulation scheme detailed in Hubert et al. (2016).

Usage

dataGen(m = 100, n = 100, p = 10, a = c(0.9,0.5,0), bLength = 4, SD = c(10,5,2), 
        eps = 0, seed = TRUE)

Arguments

`m`	Number of datasets to generate, default is 100.
`n`	Number of observations, default is 100.
`p`	Number of dimensions, default is 10.
`a`	Numeric vector containing the inner group correlations for each block. The number of useful blocks is thus given by `k=length(a)-1` which should be at least 2. By default, the correlations are equal to 0.9, 0.5 and 0, respectively.
`bLength`	Length of the blocks of useful variables, default is 4.
`SD`	Numeric vector containing the standard deviations of the blocks of variables, default is `c(10,4,2)`. Note that `SD` and `a` should have the same length.
`eps`	Proportion of contamination, should be between 0 and 0.5. Default is 0 (no contamination).
`seed`	Logical indicating if a seed is used when generating the datasets, default is `TRUE`.

Details

Firstly, we generate a correlation matrix such that it has sparse eigenvectors. We design the correlation matrix to have length(a)=k+1 groups of variables with no correlation between variables from different groups. The first k groups consist of bLength variables each. The correlation between the different variables of the group is equal to a[1] for group 1, .... . The (k+1)th group contains the remaining p-k \times bLength variables, which we specify to have correlation a[k+1].
Secondly, the correlation matrix R is transformed into the covariance matrix \Sigma= V^{0.5} \cdot R \cdot V^{0.5}, where V=diag(SD^2).
Thirdly, the n observations are generated from a p-variate normal distribution with mean the p-variate zero-vector and covariance matrix \Sigma. Standard normally distributed noise terms are also added to each of the p variables to make the sparse structure of the data harder to detect.
Finally, (100 \times eps)\% of the data points are randomly replaced by outliers. These outliers are generated from a p-variate normal distribution as in Croux et al. (2013).
The ith eigenvector of R, for i=1,...,k, is given by a (sparse) vector with the (bLength \times (i-1)+1)th till the (bLength \times i)th elements equal to 1/\sqrt{bLength} and all other elements equal to zero.
See Hubert et al. (2016) for more details.

Value

A list with components:

`data`	List of length `m` containing all data matrices.
`ind`	List of length `m` containing the numeric vectors with the indices of the contaminated observations.
`R`	Correlation matrix of the data, a numeric matrix of size `p` by `p`.
`Sigma`	Covariance matrix of the data (`\Sigma`), a numeric matrix of size `p` by `p`.

Author(s)

Tom Reynkens

References

Hubert, M., Reynkens, T., Schmitt, E. and Verdonck, T. (2016). “Sparse PCA for High-Dimensional Data with Outliers,” Technometrics, 58, 424–434.

Croux, C., Filzmoser, P., and Fritz, H. (2013), “Robust Sparse Principal Component Analysis,” Technometrics, 55, 202–214.

Examples

X <- dataGen(m=1, n=100, p=10, eps=0.2, bLength=4)$data[[1]]

resR <- robpca(X, k=2, skew=FALSE)
diagPlot(resR)

rospca documentation built on April 3, 2025, 5:58 p.m.

rospca index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

rospca
Robust Sparse PCA using the ROSPCA Algorithm

datagen: Generate sparse data with outliers
In rospca: Robust Sparse PCA using the ROSPCA Algorithm

Generate sparse data with outliers

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to datagen in rospca...

R Package Documentation

Browse R Packages

We want your feedback!

rospca Robust Sparse PCA using the ROSPCA Algorithm

datagen: Generate sparse data with outliers In rospca: Robust Sparse PCA using the ROSPCA Algorithm

Generate sparse data with outliers

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to datagen in rospca...

R Package Documentation

Browse R Packages

We want your feedback!

rospca
Robust Sparse PCA using the ROSPCA Algorithm

datagen: Generate sparse data with outliers
In rospca: Robust Sparse PCA using the ROSPCA Algorithm