View source: R/function_mix_with_Jitter.R
mix_samples_with_jitter | R Documentation |
'mix_samples_with_jitter' takes a expression matrix ('expr.data')
and 'pheno' information.
It then mixes the samples with known quantities, such that it can be used for
loss-function learning digital tissue deconvolution.
For a mixture it randomly selects a quantity for each cell type. Then, it
randomly selects profiles from 'expr.data' for each cell type, multiplies it
with the respective quantity, and averages it to a simulated bulk profile.
Notice, in the mixtures, the frequency of a cell type is reflected by the
'prob.each' vector, and not by their occurrence in 'pheno'. Alternatively,
there is a 'mix_samples' function, that reflects the underlying composition
of the data set: mix_samples
mix_samples_with_jitter( included.in.X, prob.each = NA, n.samples, expr.data, pheno, verbose = FALSE, add.jitter = FALSE, chosen.mean = 1, chosen.sd = 0.05, n.per.mixture = 1, normalize.to.count = TRUE )
included.in.X |
vector of strings, indicating types that are in the reference matrix. Only those types, and sorted in that order, will be included in the quantity matrix. |
prob.each |
numeric vector with same length as 'included.in.X.' For each cell type in 'included.in.X', prob.each' holds the expected average quantity in the mixtures. |
n.samples |
integer above 0, numbers of samples to be drawn |
expr.data |
numeric matrix, with features as rows and samples as columns |
pheno |
named vector of strings, with pheno information ('pheno') for each sample in 'expr.data'. names(pheno)' must all be in 'colnames(expr.data)' |
verbose |
logical, should information be printed to console? |
add.jitter |
logical, should each mixture be multiplied with a vector of normally distributed numbers? (JITTER) |
chosen.mean |
float, mean of jitter |
chosen.sd |
float, standard deviation of jitter |
n.per.mixture |
integer above 0, below ncol(expr.data), how many samples should be included per mixutre |
normalize.to.count |
logical, normalize each mixture? |
list with two entries. "quantities": matrix (nrow = ncol(expr.data), ncol = n.samples) and "mixtures": matrix (nrow = nrow(expr.data), ncol = n.samples)
library(DTD) random.data <- generate_random_data( n.types = 10, n.samples.per.type = 10, n.features = 500, sample.type = "Cell", feature.type = "gene" ) # normalize all samples to the same amount of counts: random.data <- normalize_to_count(random.data) # extract indicator list. # This list contains the type of the sample as value, and the sample name as names indicator.list <- gsub("^Cell[0-9]*\\.", "", colnames(random.data)) names(indicator.list) <- colnames(random.data) # First, decide which cells should be deconvoluted. included.in.X <- c("Type2", "Type3", "Type4", "Type5") training.data <- mix_samples_with_jitter( included.in.X = included.in.X , prob.each = c(2,1,1,1) , n.samples = 1e3 , expr.data = random.data , pheno = indicator.list , add.jitter = TRUE , chosen.mean = 1 , chosen.sd = 0.05 ) # see the effect of "Type2" having higher 'prob.each' entry: apply(training.data$quantities, 1, mean)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.