mix_samples_with_jitter: Mix samples with jitter
In MarianSchoen/DTD: Digital Tissue Deconvolution

View source: R/function_mix_with_Jitter.R

mix_samples_with_jitter

R Documentation

Mix samples with jitter

Description

'mix_samples_with_jitter' takes a expression matrix ('expr.data') and 'pheno' information. It then mixes the samples with known quantities, such that it can be used for loss-function learning digital tissue deconvolution. For a mixture it randomly selects a quantity for each cell type. Then, it randomly selects profiles from 'expr.data' for each cell type, multiplies it with the respective quantity, and averages it to a simulated bulk profile. Notice, in the mixtures, the frequency of a cell type is reflected by the 'prob.each' vector, and not by their occurrence in 'pheno'. Alternatively, there is a 'mix_samples' function, that reflects the underlying composition of the data set: mix_samples

Usage

mix_samples_with_jitter(
  included.in.X,
  prob.each = NA,
  n.samples,
  expr.data,
  pheno,
  verbose = FALSE,
  add.jitter = FALSE,
  chosen.mean = 1,
  chosen.sd = 0.05,
  n.per.mixture = 1,
  normalize.to.count = TRUE
)

Arguments

`included.in.X`	vector of strings, indicating types that are in the reference matrix. Only those types, and sorted in that order, will be included in the quantity matrix.
`prob.each`	numeric vector with same length as 'included.in.X.' For each cell type in 'included.in.X', prob.each' holds the expected average quantity in the mixtures.
`n.samples`	integer above 0, numbers of samples to be drawn
`expr.data`	numeric matrix, with features as rows and samples as columns
`pheno`	named vector of strings, with pheno information ('pheno') for each sample in 'expr.data'. names(pheno)' must all be in 'colnames(expr.data)'
`verbose`	logical, should information be printed to console?
`add.jitter`	logical, should each mixture be multiplied with a vector of normally distributed numbers? (JITTER)
`chosen.mean`	float, mean of jitter
`chosen.sd`	float, standard deviation of jitter
`n.per.mixture`	integer above 0, below ncol(expr.data), how many samples should be included per mixutre
`normalize.to.count`	logical, normalize each mixture?

Value

list with two entries. "quantities": matrix (nrow = ncol(expr.data), ncol = n.samples) and "mixtures": matrix (nrow = nrow(expr.data), ncol = n.samples)

Examples

library(DTD)
random.data <- generate_random_data(
      n.types = 10,
      n.samples.per.type = 10,
      n.features = 500,
      sample.type = "Cell",
      feature.type = "gene"
      )

# normalize all samples to the same amount of counts:
random.data <- normalize_to_count(random.data)

# extract indicator list.
# This list contains the type of the sample as value, and the sample name as names
indicator.list <- gsub("^Cell[0-9]*\\.", "", colnames(random.data))
names(indicator.list) <- colnames(random.data)

# First, decide which cells should be deconvoluted.
included.in.X <- c("Type2", "Type3", "Type4", "Type5")

training.data <- mix_samples_with_jitter(
    included.in.X = included.in.X
    , prob.each = c(2,1,1,1)
    , n.samples = 1e3
    , expr.data = random.data
    , pheno = indicator.list
    , add.jitter = TRUE
    , chosen.mean = 1
    , chosen.sd = 0.05
)

# see the effect of "Type2" having higher 'prob.each' entry:
apply(training.data$quantities, 1, mean)

MarianSchoen/DTD documentation built on April 29, 2022, 1:59 p.m.