generate_synthetic_data: Generate a dataset according to the probabilistic dropout...

View source: R/generate_synthetic_data.R

generate_synthetic_dataR Documentation

Generate a dataset according to the probabilistic dropout model

Description

Generate a dataset according to the probabilistic dropout model

Usage

generate_synthetic_data(
  n_proteins,
  n_conditions = 2,
  n_replicates = 3,
  frac_changed = 0.1,
  dropout_curve_position = 18.5,
  dropout_curve_scale = -1.2,
  location_prior_mean = 20,
  location_prior_scale = 4,
  variance_prior_scale = 0.05,
  variance_prior_df = 2,
  effect_size = 2,
  return_summarized_experiment = FALSE
)

Arguments

n_proteins

the number of rows in the dataset

n_conditions

the number of conditions. Default: 2

n_replicates

the number of replicates per condition. Can either be a single number or a vector with length(n_replicates) == n_conditions. Default: 3

frac_changed

the fraction of proteins that actually differ between the conditions. Default: 0.1

dropout_curve_position

the point where the chance to observe a value is 50%. Can be a single number or a vector of length(dropout_curve_position) == n_conditions * n_replicates. Default: 18.5

dropout_curve_scale

The width of the dropout curve. Negative numbers mean that lower intensities are more likely to be missing. Can be a single number or a vector of length(dropout_curve_position) == n_conditions * n_replicates. Default: -1.2

location_prior_mean, location_prior_scale

the position and the variance around which the individual condition means (t_mu) scatter. Default: 20 and 4

variance_prior_scale, variance_prior_df

the scale and the degrees of freedom of the inverse Chi-squared distribution used as a prior for the variances. Default: 0.05 and 2

effect_size

the standard deviation that is used to draw different values for the frac_changed part of the proteins. Default: 2

return_summarized_experiment

a boolean indicator if the method should return a SummarizedExperiment object instead of a list. Default: FALSE

Value

a list with the following elements

Y

the intensity matrix including the missing values

Z

the intensity matrix before dropping out values

t_mu

a matrix with n_proteins rows and n_conditions columns that contains the underlying means for each protein

t_sigma2

a vector with the true variances for each protein

changed

a vector with boolean values if the protein is actually changed

group

the group structure mapping samples to conditions

if return_summarized_experiment is FALSE. Otherwise returns a SummarizedExperiment with the same information.

Examples

  syn_data <- generate_synthetic_data(n_proteins = 10)
  names(syn_data)
  head(syn_data$Y)

  # Returning a SummarizedExperiment
  se <- generate_synthetic_data(n_proteins = 10, return_summarized_experiment = TRUE)
  se
  head(SummarizedExperiment::assay(se))


const-ae/proDA documentation built on Oct. 31, 2023, 9:39 p.m.