mix_samples_with_jitter: Mix samples with jitter

View source: R/function_mix_with_Jitter.R

mix_samples_with_jitterR Documentation

Mix samples with jitter

Description

'mix_samples_with_jitter' takes a expression matrix ('expr.data') and 'pheno' information. It then mixes the samples with known quantities, such that it can be used for loss-function learning digital tissue deconvolution. For a mixture it randomly selects a quantity for each cell type. Then, it randomly selects profiles from 'expr.data' for each cell type, multiplies it with the respective quantity, and averages it to a simulated bulk profile. Notice, in the mixtures, the frequency of a cell type is reflected by the 'prob.each' vector, and not by their occurrence in 'pheno'. Alternatively, there is a 'mix_samples' function, that reflects the underlying composition of the data set: mix_samples

Usage

mix_samples_with_jitter(
  included.in.X,
  prob.each = NA,
  n.samples,
  expr.data,
  pheno,
  verbose = FALSE,
  add.jitter = FALSE,
  chosen.mean = 1,
  chosen.sd = 0.05,
  n.per.mixture = 1,
  normalize.to.count = TRUE
)

Arguments

included.in.X

vector of strings, indicating types that are in the reference matrix. Only those types, and sorted in that order, will be included in the quantity matrix.

prob.each

numeric vector with same length as 'included.in.X.' For each cell type in 'included.in.X', prob.each' holds the expected average quantity in the mixtures.

n.samples

integer above 0, numbers of samples to be drawn

expr.data

numeric matrix, with features as rows and samples as columns

pheno

named vector of strings, with pheno information ('pheno') for each sample in 'expr.data'. names(pheno)' must all be in 'colnames(expr.data)'

verbose

logical, should information be printed to console?

add.jitter

logical, should each mixture be multiplied with a vector of normally distributed numbers? (JITTER)

chosen.mean

float, mean of jitter

chosen.sd

float, standard deviation of jitter

n.per.mixture

integer above 0, below ncol(expr.data), how many samples should be included per mixutre

normalize.to.count

logical, normalize each mixture?

Value

list with two entries. "quantities": matrix (nrow = ncol(expr.data), ncol = n.samples) and "mixtures": matrix (nrow = nrow(expr.data), ncol = n.samples)

Examples

library(DTD)
random.data <- generate_random_data(
      n.types = 10,
      n.samples.per.type = 10,
      n.features = 500,
      sample.type = "Cell",
      feature.type = "gene"
      )

# normalize all samples to the same amount of counts:
random.data <- normalize_to_count(random.data)

# extract indicator list.
# This list contains the type of the sample as value, and the sample name as names
indicator.list <- gsub("^Cell[0-9]*\\.", "", colnames(random.data))
names(indicator.list) <- colnames(random.data)

# First, decide which cells should be deconvoluted.
included.in.X <- c("Type2", "Type3", "Type4", "Type5")

training.data <- mix_samples_with_jitter(
    included.in.X = included.in.X
    , prob.each = c(2,1,1,1)
    , n.samples = 1e3
    , expr.data = random.data
    , pheno = indicator.list
    , add.jitter = TRUE
    , chosen.mean = 1
    , chosen.sd = 0.05
)

# see the effect of "Type2" having higher 'prob.each' entry:
apply(training.data$quantities, 1, mean)


MarianSchoen/DTD documentation built on April 29, 2022, 1:59 p.m.