ScalablePCA: Perform Principal Component Analysis on a large data set

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/ScalablePCA.R

Description

Run prcomp on subsamples of the data set and compile the results for the first dimension.

Usage

1
2
3
  ScalablePCA(x, filename = NULL, db = NULL,
    subsample = 10000, n.subsamples = 1000, ignore.cols,
    use.cols, return.sds = FALSE, progress.bar = FALSE)

Arguments

x

data.frame, data over which to run PCA

filename

character, name of the file containing the data. This must be a tab-delimited file with a header row formatted per the default options for read.delim.

db

Object type, database connection to table containing the data (NOT IMPLEMENTED).

subsample

numeric or logical, If an integer, size of each subsample. If FALSE, runs PCA on entire data set.

n.subsamples

numeric, number of subsamples.

ignore.cols

numeric, indices of columns not to include.

use.cols

numeric, indices of columns to use.

return.sds

logical, if TRUE return the standard deviations of each network's edge weights.

progress.bar

logical, if TRUE then progress in running subsamples will be shown.

Details

Scales the function prcomp to data sets with an arbitrarily large number of rows by running prcomp on repeated subsamples of the rows.

Value

If return.sds is FALSE, return named vector of component weights for first dimension of principal component analysis (see example for comparison to prcomp).

If return.sds is TRUE, return a list.

coefficients named vector of the component weights for first dimension of principal component analysis (see example for comparison to prcomp).
sds named vector of the standard deviations of each network's edge weights.

Author(s)

Stephen R. Haptonstahl srh@haptonstahl.org

References

https://github.com/shaptonstahl/

See Also

prcomp

Examples

1
2
3
4
data(iris)        # provides example data
prcomp(iris[,1:4], center=FALSE, scale.=FALSE)$rotation[,1]
ScalablePCA(iris, subsample=10, use.cols=1:4)
ScalablePCA(iris, subsample=10, ignore.cols=5)

Example output

Loading required package: igraph

Attaching package: 'igraph'

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

Loading required package: Rcpp
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
  -0.7511082   -0.3800862   -0.5130089   -0.1679075 
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
   0.7521740    0.3836565    0.5066391    0.1646170 
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
   0.7524501    0.3841336    0.5060183    0.1645433 

dils documentation built on May 2, 2019, 8:28 a.m.