PCADSC: Compute the elements used for PCADSC
In PCADSC: Tools for Principal Component Analysis-Based Data Structure Comparisons

Description Usage Arguments Details Value See Also Examples

View source: R/PCADSC.R

Principal Component Analysis-based Data Structure Comparison tools that prepare a dataset for various diagnostic plots for comparing data structures. More specifically, PCADSC performs PCA on two subsets of a dataset in order to compare the structures of these datasets, e.g. to assess whether they can be analyzed pooled or not. The results of the PCAs are then manipulated in various ways and stored for easy plotting using the three PCADSC plotting tools, the CEPlot, the anglePlot and the chromaPlot.

1 2	PCADSC(data, splitBy, vars = NULL, doCE = TRUE, doAngle = TRUE, doChroma = TRUE, B = 10000)

`data`	A dataset, either a `data.frame` or a `matrix` with variables in columns and observations in rows. Note that `tibble`s and `data.table`s are accepted as input, but they are instantly converted to `data.frame`s. Future releases might include specific implementation for these data representations.
`splitBy`	The name of a grouping variable with two levels defining the two groups within the dataset whose data structures we wish to compare.
`vars`	The variable names in `data` to include in the PCADSC. If `NULL` (the default), all variables except for `splitBy` are used.
`doCE`	Logical. Should the cumulative eigenvalue plot information be computed?
`doAngle`	Logical. Should the angle plot information be computed?
`doChroma`	Logical. Should the chroma plot information be computed?
`B`	A positive integer. The number of resampling steps performed in the cumulative eigenvalue step, if relevant.

PCADSC presents a suite of non-parametric, visual tools for comparing the strucutures of two subsets of a dataset. These tools are all based on PCA (principal component analysis) and thus they can be interpreted as comparisons of the covariance matrices of the two (sub)datasets. PCADSC performs PCA using singular value decomposition for increased numerical precision. Before performing PCA on the full dataset and the two subsets, all variables within each such dataset are standardized.

An object of class PCADSC, which is a named list with the following entries:

pcaRes: The results of the PCAs performed on the first subset, the second subset and the full subset and also information about the data splitting.
CEInfo: The information needed for making a cumulative eigenvalue plot (see CEPlot).
angleInfo: The information needed for making an angle plot (see anglePlot).
chromaInfo: The information needed for making a chroma plot (see chromaPlot).
data: The original (full) dataset.
splitBy: The name of the variable that splits the dataset in two.
vars: The names of the variables in the dataset that should be used for PCA.
B: The number of resamplings performed for the CEInfo.

doCE, doAngle, doChroma, CEPlot, anglePlot, chromaPlot

#load iris data
data(iris)

#Define grouping variable, grouping the observations by whether their species is
#Setosa or not
iris$group <- "setosa"
iris$group[iris$Species != "setosa"] <- "non-setosa"
iris$Species <- NULL

## Not run: 
#Make a full PCADSC object, splitting the data by "group"
irisPCADSC <- PCADSC(iris, "group")

#The three plotting functions can now be called on irisPCADSC:
CEPlot(irisPCADSC)
anglePlot(irisPCADSC)
chromaPlot(irisPCADSC)

#Make a partial PCADSC object with no angle plot information and add
#angle plot information afterwards:
irisPCADSC2 <- PCADSC(iris, "group", doAngle = FALSE)
irisPCADSC2 <- doAngle(irisPCADSC)

## End(Not run)

#Make a partial PCADSC obejct with no plotting (angle/CE/chroma)
#information:
irisPCADSC_minimal <- PCADSC(iris, "group", doAngle = FALSE,
  doCE = FALSE, doChroma = FALSE)