generate_synthetic_data: Generate a data set according to the probabilistic dropout...

Description Usage Arguments Value Examples

View source: R/synthetic_data.R

Description

Specify the number of rows in the dataset, the number of conditions and replicates, how many proteins have a different mean and a few additional hyperparameters and get a synthetic dataset the is similar to data from a real label-free mass spectrometry experiment.

Usage

1
2
3
4
5
6
7
8
generate_synthetic_data(n_rows, experimental_design = NULL,
  n_replicates = as.numeric(table(experimental_design)),
  n_conditions = length(n_replicates), frac_changed = 0.1,
  n_changed = round(n_rows * min(1, frac_changed)), mu0 = 20,
  sigma20 = 10, nu = 3, eta = 0.3, rho = rep(18, times = if
  (length(n_replicates) == 1) n_replicates * n_conditions else
  sum(n_replicates)), zeta = rep(-1, times = if (length(n_replicates) ==
  1) n_replicates * n_conditions else sum(n_replicates)))

Arguments

n_rows

integer. The number of rows in the new dataset

experimental_design

a vector that specifies which samples belong to the same condition. Default: 'NULL' in which case 'n_replicates' must be specified

n_replicates

integer or vector. The number of replicates in each condition.

n_conditions

The number of conditions. Setting 'n_replicates=3' and 'n_conditions=2' is equal to specifying 'experimental_design=c(1,1,1,2,2,2)'.

frac_changed

the fraction of rows for which different means are drawn for each conditon.

n_changed

alternative way to specify for how many rows have different means in each condition.

mu0

the global mean around which the row means are drawn. Default '20'

sigma20

the global variance specifying the spread of means around 'mu0'. Default '10'.

nu

degrees of freedom for the the global variance prior. Default '3'.

eta

scale of the global variance prior. Default '0.3'.

rho

vector specifying the intensity where the chance of a dropout is 50/50. Either length one or same length as 'n_replicates * n_conditons' or 'length(experimental_design)' respectively. Default '18'.

zeta

vector specifying the scale of the dropout curve. Either length one or same length as 'n_replicates * n_conditons' or 'length(experimental_design)' respectively. Default '18'.

Value

a list with 5 elements

X

the data matrix with missing values

t_X

the true data matrix, before data dropped out

mus

matrix of size 'n_rows * n_conditions'. The true means for each condition

sigmas2

a vector of size 'n_rows'. The true variance for each row.

changed

a boolean vector of size 'n_rows', with the label if a row has different means for each condition

Examples

1
2
3
4
5
6
7
8
 data <- generate_synthetic_data(n_rows=10,
                n_replicates=3, n_conditions=2)

 data2 <- generate_synthetic_data(n_rows=10,
                experimental_design=c(1,1,1,2,2,2))

 data3 <- generate_synthetic_data(n_rows=10,
                rep(letters[1:3], each=4))

const-ae/proDD documentation built on Jan. 14, 2020, 9:34 a.m.