raw_to_sce: raw_to_sce

View source: R/raw_to_sce.R

raw_to_sceR Documentation

raw_to_sce

Description

Essentially a parser: raw counts (or log-normalized counts) stored in a file (e.g. .txt) -> SingleCellExperiment object of the right format. If raw counts are passed, log-normalization is performed (optional but recommended) and logcounts will be used downstream. If no batch is supplied, size factors are calculated using scran::quickCluster (arguments d and min.mean regulate the coarsity of clustering) and then caluclated size factors are supplied to scuttle::logNormCounts to calulcate log-normalized counts. If batch is supplied, size factors are calculated per within batch and then size factors are supplied to batchelor::multiBatchNorm.

Usage

raw_to_sce(
  counts_dir,
  counts_type = "counts",
  transform_counts_to_logcounts = TRUE,
  header = TRUE,
  sep = "\t",
  meta_dir = NULL,
  batch = NULL,
  verbose = TRUE,
  d = 50,
  min.mean = 0.1,
  ...
)

Arguments

counts_dir

String specifying the directory for counts matrix (assuming counts where already calculated)

counts_type

String specifying whether raw data is stored as counts or log-counts (= 'counts' and 'logcounts' respectively). For geneBasis we recommend to work with log-counts. If you do not have log-counts precomputed, they can be computed within this function.

transform_counts_to_logcounts

In case, raw data are counts (as opposed to log-counts), Boolean specifying whether we should perform log-normalization.

header

Boolean specifying if logcounts_dir file has cell IDs stored in colnames.

sep

the field separator string. Note it should be the same for logcounts_dir and meta_dir (if latter exists).

meta_dir

If not NULL (NULL is default), a string specifying the directory for meta-data (i.e. celltype, batch, UMAP-coordinates). Store UMAP-coordinates as 'x' and 'y' (relevant for plotting functions). Also, if meta contains field cell - this field will be used for cell IDs (so ensure the values are unique).

batch

If not NULL (i.e. no batch, NULL is default), string specifying a column in meta file that will be used as batchID. Please check that specified batch name exists in meta-file.

verbose

Boolean identifying whether intermediate print outputs should be returned. Default verbose=TRUE.

d

Only used for log-normalization: an integer scalar specifying the number of principal components to retain.

min.mean

Only used for log-normalization: a numeric scalar specifying the filter to be applied on the average count for each filter prior to computing ranks.

...

Additional arguments. This includes d and min.mean for scran::quickCluster - used to calculate size factors to compute normalized log-counts.

Value

SingleCellExperiment object with gene counts/logcounts and meta-data (if supplied) stored in colData.

Examples

require(SingleCellExperiment)
counts_dir = system.file("extdata", "raw_spleen.txt", package = "geneBasisR")
meta_dir = system.file("extdata", "raw_spleen_meta.txt", package = "geneBasisR")
out = raw_to_sce(counts_dir, counts_type = "logcounts", transform_counts_to_logcounts = FALSE, header = TRUE, sep = "\t" , meta_dir = meta_dir, batch = NULL)


MarioniLab/geneBasisR documentation built on June 30, 2023, 2:04 p.m.