View source: R/create_synthetic_data.R
create_synthetic_data | R Documentation |
This function creates a synthetic limited proteolysis proteomics dataset that can be used to test functions while knowing the ground truth.
create_synthetic_data( n_proteins, frac_change, n_replicates, n_conditions, method = "effect_random", concentrations = NULL, median_offset_sd = 0.05, mean_protein_intensity = 16.88, sd_protein_intensity = 1.4, mean_n_peptides = 12.75, size_n_peptides = 0.9, mean_sd_peptides = 1.7, sd_sd_peptides = 0.75, mean_log_replicates = -2.2, sd_log_replicates = 1.05, effect_sd = 2, dropout_curve_inflection = 14, dropout_curve_sd = -1.2, additional_metadata = TRUE )
n_proteins |
a numeric value that specifies the number of proteins in the synthetic dataset. |
frac_change |
a numeric value that specifies the fraction of proteins that has a peptide changing in abundance. So far only one peptide per protein is changing. |
n_replicates |
a numeric value that specifies the number of replicates per condition. |
n_conditions |
a numeric value that specifies the number of conditions. |
method |
a character value that specifies the method type for the random sampling of
significantly changing peptides. If |
concentrations |
a numeric vector of length equal to the number of conditions, only needs
to be specified if |
median_offset_sd |
a numeric value that specifies the standard deviation of normal distribution that is used for sampling of inter-sample-differences. Default is 0.05. |
mean_protein_intensity |
a numeric value that specifies the mean of the protein intensity distribution. Default: 16.8. |
sd_protein_intensity |
a numeric value that specifies the standard deviation of the protein intensity distribution. Default: 1.4. |
mean_n_peptides |
a numeric value that specifies the mean number of peptides per protein. Default: 12.75. |
size_n_peptides |
a numeric value that specifies the dispersion parameter (the shape
parameter of the gamma mixing distribution). Can be theoretically calculated as
|
mean_sd_peptides |
a numeric value that specifies the mean of peptide intensity standard deviations within a protein. Default: 1.7. |
sd_sd_peptides |
a numeric value that specifies the standard deviation of peptide intensity standard deviation within a protein. Default: 0.75. |
mean_log_replicates, sd_log_replicates |
a numeric value that specifies the |
effect_sd |
a numeric value that specifies the standard deviation of a normal distribution
around |
dropout_curve_inflection |
a numeric value that specifies the intensity inflection point of a probabilistic dropout curve that is used to sample intensity dependent missing values. This argument determines how many missing values there are in the dataset. Default: 14. |
dropout_curve_sd |
a numeric value that specifies the standard deviation of the probabilistic dropout curve. Needs to be negative to sample a droupout towards low intensities. Default: -1.2. |
additional_metadata |
a logical value that determines if metadata such as protein coverage, missed cleavages and charge state should be sampled and added to the list. |
A data frame that contains complete peptide intensities and peptide intensities with values that were created based on a probabilistic dropout curve.
create_synthetic_data( n_proteins = 10, frac_change = 0.1, n_replicates = 3, n_conditions = 2 ) # determination of mean_n_peptides and size_n_peptides parameters based on real data (count) # example peptide count per protein count <- c(6, 3, 2, 0, 1, 0, 1, 2, 2, 0) theta <- c(mu = 1, k = 1) negbinom <- function(theta) { -sum(stats::dnbinom(count, mu = theta[1], size = theta[2], log = TRUE)) } fit <- stats::optim(theta, negbinom) fit # determination of mean_log_replicates and sd_log_replicates parameters # based on real data (standard_deviations) # example standard deviations of replicates standard_deviations <- c(0.61, 0.54, 0.2, 1.2, 0.8, 0.3, 0.2, 0.6) theta2 <- c(meanlog = 1, sdlog = 1) lognorm <- function(theta2) { -sum(stats::dlnorm(standard_deviations, meanlog = theta2[1], sdlog = theta2[2], log = TRUE)) } fit2 <- stats::optim(theta2, lognorm) fit2
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.