skIncrPPCA: optionally fault tolerant incremental partial PCA for...

skIncrPPCAR Documentation

optionally fault tolerant incremental partial PCA for projection of samples from SummarizedExperiment

Description

optionally fault tolerant incremental partial PCA for projection of samples from SummarizedExperiment

Usage

skIncrPPCA(
  se,
  chunksize,
  n_components,
  assayind = 1,
  picklePath = "./skIdump.pkl",
  matTx = force,
  ...
)

Arguments

se

instance of SummarizedExperiment

chunksize

integer number of samples per step

n_components

integer number of PCs to compute

assayind

not used, assumed set to 1

picklePath

if non-null, incremental results saved here via joblib.dump, for each chunk. If NULL, no saving of incremental results.

matTx

a function defaulting to force() that accepts a matrix and returns a matrix with identical dimensions, e.g., function(x) log(x+1)

...

not used

Value

python instance of sklearn.decomposition.incremental_pca.IncrementalPCA

Note

Will treat samples as records and all features (rows) as attributes, projecting. to an n_components-dimensional space. Method will acquire chunk of assay data and transpose before computing PCA contributions. In case of crash, restore from picklePath using joblib$load after loading reticulate. You can use the n_samples_seen_ component of the restored python reference to determine where to restart. You can manage resumption using skPartialPCA_step.

Examples

# demo SE made with TENxGenomics:
# mm = matrixSummarizedExperiment(h5path, 1:27998, 1:750)
# saveHDF5SummarizedExperiment(mm, "tenx_750")
#
if (FALSE) {
if (requireNamespace("HDF5Array")) {
  se750 = HDF5Array::loadHDF5SummarizedExperiment(
     system.file("hdf5/tenx_750", package="BiocSklearn"))
  lit = skIncrPPCA(se750[, 1:50], chunksize=5, n_components=4)
  round(cor(pypc <- lit$transform(dat <- t(as.matrix(assay(se750[,1:50]))))),3)
  rpc = prcomp(dat)
  round(cor(rpc$x[,1:4], pypc), 3)
} # this has to be made basilisk-compliant
} # and is blocked until then

vjcitn/BiocSklearn documentation built on Feb. 4, 2024, 5:12 a.m.