geosketch: Run geosketch to subsample a matrix

View source: R/geosketch.R

geosketchR Documentation

Run geosketch to subsample a matrix

Description

Perform geometric sketching with the geosketch python package.

Usage

geosketch(
  mat,
  N,
  replace = FALSE,
  k = "auto",
  alpha = 0.1,
  seed = NULL,
  max_iter = 200,
  one_indexed = TRUE,
  verbose = FALSE
)

Arguments

mat

m x n matrix. Samples (the dimension along which to subsample) should be in the rows, features in the columns.

N

Numeric scalar, the number of samples to retain.

replace

Logical scalar, whether to sample with replacement.

k

Numeric scalar or "auto", specifying the number of covering. If k = "auto" (the default), it is set to sqrt(nrow(mat)) for replace = TRUE and to N for replace = FALSE.

alpha

Numeric scalar defining the acceptable interval around k. Binary search halts when it obtains between k * (1 - alpha) and k * (1 + alpha) covering boxes.

seed

Numeric scalar or NULL (default). If not NULL, it will be converted to integer and passed to numpy to seed the random number generator.

max_iter

Numeric scalar giving the maximum iterations at which to terminate binary search in rare cases of non-monotonicity of covering boxes.

one_indexed

Logical scalar, whether to return one-indexed indices.

verbose

Locigal scalar, whether to print logging output while running.

Details

The first time this function is run, it will create a conda environment containing the geosketch package. This is done via the basilisk R/Bioconductor package - see the documentation for that package for troubleshooting.

Value

A numeric vector with indices to retain.

Author(s)

Charlotte Soneson, Michael Stadler

References

Hie et al (2019): Geometric sketching compactly summarizes the single-cell transcriptomic landscape. Cell Systems 8, 483–493.

Examples

x <- matrix(rnorm(500), nrow = 100)
geosketch(mat = x, N = 10, seed = 42)


csoneson/sketchR documentation built on Nov. 4, 2024, 4:05 p.m.