big_colstats: Standard univariate statistics
In bigstatsr: Statistical Tools for Filebacked Big Matrices

big_colstats

R Documentation

Standard univariate statistics

Description

Standard univariate statistics for columns of a Filebacked Big Matrix. For now, the sum and var are implemented (the mean and sd can easily be deduced, see examples).

Usage

big_colstats(X, ind.row = rows_along(X), ind.col = cols_along(X), ncores = 1)

Arguments

`X`	An object of class FBM.
`ind.row`	An optional vector of the row indices that are used. If not specified, all rows are used. Don't use negative indices.
`ind.col`	An optional vector of the column indices that are used. If not specified, all columns are used. Don't use negative indices.
`ncores`	Number of cores used. Default doesn't use parallelism. You may use nb_cores.

Value

Data.frame of two numeric vectors sum and var with the corresponding column statistics.

Examples

set.seed(1)

X <- big_attachExtdata()

# Check the results
str(test <- big_colstats(X))

# Only with the first 100 rows
ind <- 1:100
str(test2 <- big_colstats(X, ind.row = ind))
plot(test$sum, test2$sum)
abline(lm(test2$sum ~ test$sum), col = "red", lwd = 2)

X.ind <- X[ind, ]
all.equal(test2$sum, colSums(X.ind))
all.equal(test2$var, apply(X.ind, 2, var))

# deduce mean and sd
# note that the are also implemented in big_scale()
means <- test2$sum / length(ind) # if using all rows,
                                 # divide by nrow(X) instead
all.equal(means, colMeans(X.ind))
sds <- sqrt(test2$var)
all.equal(sds, apply(X.ind, 2, sd))

bigstatsr documentation built on Sept. 11, 2024, 7:08 p.m.