# correlationSquaredDecomp: Compute SVD of squared correlation matrix In GabrielHoffman/pinnacle: pinnacle: Gene set enrichment analysis based on genomic intervals

## Description

Given the SVD of C, compute the SVD of C^2

## Usage

 `1` ```correlationSquaredDecomp(V, d, rank = ncol(V)) ```

## Arguments

 `V` eigen vectors of correlation matrix `d` *singular* values of correlation matrix `rank` use the first 'rank' singular vectors from the SVD. Using increasing 'rank' will increase the accuracy of the estimation. But now that the computationaly complexity is O(P choose(rank, 2)), where P is the number of features in the dataset

## Details

Consider a data matrix X_N x P of P features and N samples where N << P. Let the columns of X be scaled so that C_P x P = XX^T. C is often too big to compute directly since it is O(P^2) and O(P^3) to invert. But we can compute the SVD of X in O(PN^2). The goal is to compute the SVD of the matrix C^2, given only the SVD of C in less than O(P^2 time). Here we compute this SVD of C^2 in O(PN^4) time, which is tractible for small N. Moreover, if we use an SVD X = UDV^T with of rank R, we can approximate the SVD of C^2 in O(PR^4) using only D and V In practice, this can be reduced to O(P (choose(N,2) + N)^4)

## Value

compute the SVD of C^2

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22``` ```N = 50 # samples P = 200 # features # Simulate feature matrix X = matrix(rnorm(N*P), N, P) # Scale feature matrix X = scale(X) / sqrt(N-1) # Compute SVD of feature matrix dcmp = svd(X, nu=0) # Compute correlation and squared correlation matrices # This is O(P^2) C = crossprod(X) Csq = C^2 # Compute SVD of Csq using only the svd of C # this is faster than O(PN^4) # if R is the rank of X # Time is O(P (choose(N,2) + N)^4) dcmp_C2 = correlationSquaredDecomp( dcmp\$v, dcmp\$d ) ```

GabrielHoffman/pinnacle documentation built on May 3, 2019, 3:02 p.m.