kEstimateFast: Estimate best number of Components for missing value...
In hredestig/pcaMethods: A collection of PCA methods

kEstimateFast

R Documentation

Estimate best number of Components for missing value estimation

Description

This is a simple estimator for the optimal number of componets when applying PCA or LLSimpute for missing value estimation. No cross validation is performed, instead the estimation quality is defined as Matrix[!missing] - Estimate[!missing]. This will give a relatively rough estimate, but the number of iterations equals the length of the parameter evalPcs.
Does not work with LLSimpute!! As error measure the NRMSEP (see Feten et. al, 2005) or the Q2 distance is used. The NRMSEP basically normalises the RMSD between original data and estimate by the variable-wise variance. The reason for this is that a higher variance will generally lead to a higher estimation error. If the number of samples is small, the gene - wise variance may become an unstable criterion and the Q2 distance should be used instead. Also if variance normalisation was applied previously.

Usage

kEstimateFast(Matrix, method = "ppca", evalPcs = 1:3, em = "nrmsep",
  allVariables = FALSE, verbose = interactive(), ...)

Arguments

`Matrix`	`matrix` – numeric matrix containing observations in rows and variables in columns
`method`	`character` – a valid pca method (see `pca`).
`evalPcs`	`numeric` – The principal components to use for cross validation or cluster sizes if used with llsImpute. Should be an array containing integer values, eg. evalPcs = 1:10 or evalPcs = C(2,5,8).The NRMSEP is calculated for each component.
`em`	`character` – The error measure. This can be nrmsep or q2
`allVariables`	`boolean` – If TRUE, the NRMSEP is calculated for all variables, If FALSE, only the incomplete ones are included. You maybe want to do this to compare several methods on a complete data set.
`verbose`	`boolean` – If TRUE, the NRMSEP and the variance are printed to the console each iteration.
`...`	Further arguments to `pca`

Value

list

Returns a list with the elements:

minNPcs - number of PCs for which the minimal average NRMSEP was obtained
eError - an array of of size length(evalPcs). Contains the estimation error for each number of components.
evalPcs - The evaluated numbers of components or cluster sizes (the same as the evalPcs input parameter).

Author(s)

Wolfram Stacklies

Examples

data(metaboliteData)
# Estimate best number of PCs with ppca for component 2:4
esti <- kEstimateFast(t(metaboliteData), method = "ppca", evalPcs = 2:4, em="nrmsep")
barplot(drop(esti$eError), xlab = "Components",ylab = "NRMSEP (1 iterations)")
# The best k value is:
print(esti$minNPcs)

hredestig/pcaMethods documentation built on Sept. 30, 2023, 10:38 a.m.