bigstatsr-package: bigstatsr: Statistical Tools for Filebacked Big Matrices

bigstatsr-packageR Documentation

bigstatsr: Statistical Tools for Filebacked Big Matrices

Description

Easy-to-use, efficient, flexible and scalable statistical tools. Package bigstatsr provides and uses Filebacked Big Matrices via memory-mapping. It provides for instance matrix operations, Principal Component Analysis, sparse linear supervised models, utility functions and more <doi:10.1093/bioinformatics/bty185>.

Arguments

X

An object of class FBM.

X.code

An object of class FBM.code256.

y.train

Vector of responses, corresponding to ind.train.

y01.train

Vector of responses, corresponding to ind.train. Must be only 0s and 1s.

ind.train

An optional vector of the row indices that are used, for the training part. If not specified, all rows are used. Don't use negative indices.

ind.row

An optional vector of the row indices that are used. If not specified, all rows are used. Don't use negative indices.

ind.col

An optional vector of the column indices that are used. If not specified, all columns are used. Don't use negative indices.

block.size

Maximum number of columns read at once. Default uses block_size.

ncores

Number of cores used. Default doesn't use parallelism. You may use nb_cores.

fun.scaling

A function with parameters X, ind.row and ind.col, and that returns a data.frame with $center and $scale for the columns corresponding to ind.col, to scale each of their elements such as followed:

\frac{X_{i,j} - center_j}{scale_j}.

Default doesn't use any scaling. You can also provide your own center and scale by using as_scaling_fun().

covar.train

Matrix of covariables to be added in each model to correct for confounders (e.g. the scores of PCA), corresponding to ind.train. Default is NULL and corresponds to only adding an intercept to each model. You can use covar_from_df() to convert from a data frame.

covar.row

Matrix of covariables to be added in each model to correct for confounders (e.g. the scores of PCA), corresponding to ind.row. Default is NULL and corresponds to only adding an intercept to each model. You can use covar_from_df() to convert from a data frame.

center

Vector of same length of ind.col to subtract from columns of X.

scale

Vector of same length of ind.col to divide from columns of X.

Matrix parallelization

Large matrix computations are made block-wise and won't be parallelized in order to not have to reduce the size of these blocks. Instead, you may use Microsoft R Open or OpenBLAS in order to accelerate these block matrix computations. You can also control the number of cores used with bigparallelr::set_blas_ncores().

Author(s)

Maintainer: Florian Privé florian.prive.21@gmail.com

Other contributors:

See Also

Useful links:


bigstatsr documentation built on Oct. 14, 2022, 9:05 a.m.