PriorNormPCA: Prior PCA analysis for threshold setting and noise removal

Description Usage Arguments Details Value Author(s) References Examples

View source: R/PriorNormPCA.R

Description

This function performs a simple PCA analysis to aid in threshold setting and noise removal.

Usage

1

Arguments

X

Data Matrix (need not be normalised). Subsequent ICA seeks independent modes as independent distributions with values "down the rows".

Details

This function performs a simple PCA analysis and is used prior to application of the main ICA algorithm. The objective of the prior PCA is to help determine the dimensionality of a subspace on which the further ICA converges. The convention used here is that the rows of 'X' label the space over which independent components are sought. For a typical microarray application in which ICA is being used as a generative model for gene expression, rows should label genes and columns should label samples. If, however, ICA is to be used as an unsupervised projection pursuit algorithm, rows should label samples and columns genes. For the latter application, the number of genes should be less than the number of samples.

Value

A list with following components:

X: Normalised data matrix with the mean of each column set to zero.

Dx: Eigenvalues in a diagonal matrix.

Ex: Eigenvectors

Author(s)

Andrew Teschendorff a.teschendorff@ucl.ac.uk

References

Hyvaerinen A., Karhunen J., and Oja E.: Independent Component Analysis, John Wiley and Sons, New York, (2001).

Kreil D. and MacKay D. (2003): Reproducibility Assessment of Independent Component Analysis of Expression Ratios from DNA microarrays, Comparative and Functional Genomics *4* (3),300-317.

Liebermeister W. (2002): Linear Modes of gene expression determined by independent component analysis, Bioinformatics *18*, no.1, 51-60.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## The function is currently defined as
function (X) 
{
    ndim <- ncol(X)
    ntp <- nrow(X)
    for (s in 1:ndim) {
        X[, s] <- X[, s] - mean(X[, s])
    }
    print("Performing SVD")
    svd.o <- svd(X, LINPACK = TRUE)
    Dx <- diag(svd.o$d * svd.o$d)/ntp
    Ex <- svd.o$v
    barplot(Dx, main = "Singular values")
    return(list(X = X, Dx = Dx, Ex = Ex))
  }

mlica2 documentation built on May 1, 2019, 10:45 p.m.