View source: R/function_mix_with_Jitter.R
| mix_samples_with_jitter | R Documentation |
'mix_samples_with_jitter' takes a expression matrix ('expr.data')
and 'pheno' information.
It then mixes the samples with known quantities, such that it can be used for
loss-function learning digital tissue deconvolution.
For a mixture it randomly selects a quantity for each cell type. Then, it
randomly selects profiles from 'expr.data' for each cell type, multiplies it
with the respective quantity, and averages it to a simulated bulk profile.
Notice, in the mixtures, the frequency of a cell type is reflected by the
'prob.each' vector, and not by their occurrence in 'pheno'. Alternatively,
there is a 'mix_samples' function, that reflects the underlying composition
of the data set: mix_samples
mix_samples_with_jitter( included.in.X, prob.each = NA, n.samples, expr.data, pheno, verbose = FALSE, add.jitter = FALSE, chosen.mean = 1, chosen.sd = 0.05, n.per.mixture = 1, normalize.to.count = TRUE )
included.in.X |
vector of strings, indicating types that are in the reference matrix. Only those types, and sorted in that order, will be included in the quantity matrix. |
prob.each |
numeric vector with same length as 'included.in.X.' For each cell type in 'included.in.X', prob.each' holds the expected average quantity in the mixtures. |
n.samples |
integer above 0, numbers of samples to be drawn |
expr.data |
numeric matrix, with features as rows and samples as columns |
pheno |
named vector of strings, with pheno information ('pheno') for each sample in 'expr.data'. names(pheno)' must all be in 'colnames(expr.data)' |
verbose |
logical, should information be printed to console? |
add.jitter |
logical, should each mixture be multiplied with a vector of normally distributed numbers? (JITTER) |
chosen.mean |
float, mean of jitter |
chosen.sd |
float, standard deviation of jitter |
n.per.mixture |
integer above 0, below ncol(expr.data), how many samples should be included per mixutre |
normalize.to.count |
logical, normalize each mixture? |
list with two entries. "quantities": matrix (nrow = ncol(expr.data), ncol = n.samples) and "mixtures": matrix (nrow = nrow(expr.data), ncol = n.samples)
library(DTD)
random.data <- generate_random_data(
n.types = 10,
n.samples.per.type = 10,
n.features = 500,
sample.type = "Cell",
feature.type = "gene"
)
# normalize all samples to the same amount of counts:
random.data <- normalize_to_count(random.data)
# extract indicator list.
# This list contains the type of the sample as value, and the sample name as names
indicator.list <- gsub("^Cell[0-9]*\\.", "", colnames(random.data))
names(indicator.list) <- colnames(random.data)
# First, decide which cells should be deconvoluted.
included.in.X <- c("Type2", "Type3", "Type4", "Type5")
training.data <- mix_samples_with_jitter(
included.in.X = included.in.X
, prob.each = c(2,1,1,1)
, n.samples = 1e3
, expr.data = random.data
, pheno = indicator.list
, add.jitter = TRUE
, chosen.mean = 1
, chosen.sd = 0.05
)
# see the effect of "Type2" having higher 'prob.each' entry:
apply(training.data$quantities, 1, mean)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.