nlpca: Non-linear PCA

Description Usage Arguments Details Value Author(s) References Examples

View source: R/nlpca.R

Description

Neural network based non-linear PCA

Usage

1
2
3
nlpca(Matrix, nPcs = 2, maxSteps = 2 * prod(dim(Matrix)),
  unitsPerLayer = NULL, functionsPerLayer = NULL,
  weightDecay = 0.001, weights = NULL, verbose = interactive(), ...)

Arguments

Matrix

matrix — Preprocessed data with the variables in columns and observations in rows. The data may contain missing values, denoted as NA

nPcs

numeric – Number of components to estimate. The preciseness of the missing value estimation depends on thenumber of components, which should resemble the internal structure of the data.

maxSteps

numeric – Number of estimation steps. Default is based on a generous rule of thumb.

unitsPerLayer

The network units, example: c(2,4,6) for two input units 2feature units (principal components), one hidden layer fornon-linearity and three output units (original amount ofvariables).

functionsPerLayer

The function to apply at each layer eg. c("linr", "tanh", "linr")

weightDecay

Value between 0 and 1.

weights

Starting weights for the network. Defaults to uniform random values but can be set specifically to make algorithm deterministic.

verbose

boolean – nlpca prints the number of steps and warning messages if set to TRUE. Default is interactive().

...

Reserved for future use. Not passed on anywhere.

Details

Artificial Neural Network (MLP) for performing non-linear PCA. Non-linear PCA is conceptually similar to classical PCA but theoretically quite different. Instead of simply decomposing our matrix (X) to scores (T) loadings (P) and an error (E) we train a neural network (our loadings) to find a curve through the multidimensional space of X that describes a much variance as possible. Classical ways of interpreting PCA results are thus not applicable to NLPCA since the loadings are hidden in the network. However, the scores of components that lead to low cross-validation errors can still be interpreted via the score plot. Unfortunately this method depend on slow iterations which currently are implemented in R only making this method extremely slow. Furthermore, the algorithm does not by itself decide when it has converged but simply does 'maxSteps' iterations.

Value

Standard PCA result object used by all PCA-basedmethods of this package. Contains scores, loadings, data meanand more. See pcaRes for details.

Author(s)

Based on a matlab script by Matthias Scholz and ported to R by Henning Redestig

References

Matthias Scholz, Fatma Kaplan, Charles L Guy, Joachim Kopkaand Joachim Selbig. Non-linear PCA: a missing data approach. Bioinformatics, 21(20):3887-3895, Oct 2005

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Data set with three variables where data points constitute a helix
data(helix)
helixNA <- helix
## not a single complete observation
helixNA <- t(apply(helix, 1, function(x) { x[sample(1:3, 1)] <- NA; x}))
## 50 steps is not enough, for good estimation use 1000
helixNlPca <- pca(helixNA, nPcs=1, method="nlpca", maxSteps=50)
fittedData <- fitted(helixNlPca, helixNA)
plot(fittedData[which(is.na(helixNA))], helix[which(is.na(helixNA))])
## compared to solution by Nipals PCA which cannot extract non-linear patterns
helixNipPca <- pca(helixNA, nPcs=2)
fittedData <- fitted(helixNipPca)
plot(fittedData[which(is.na(helixNA))], helix[which(is.na(helixNA))])

Example output

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: 'pcaMethods'

The following object is masked from 'package:stats':

    loadings

pcaMethods documentation built on Nov. 8, 2020, 6:19 p.m.