simulate_data: simulate_data

View source: R/simulate_data.R

simulate_dataR Documentation

simulate_data

Description

Simulate LC-MS/MS data or any other spectra from sdf db files.

Usage

simulate_data(
  db_ids = NULL,
  compound_names = NULL,
  xls_file_name = NULL,
  valid_sdf_file = NULL,
  nbatch = 3,
  nsamps_per_batch = c(100, 200, 300),
  QC_freq = c(25, 25, 25),
  multiplyer = 100,
  seed = 123,
  sim_sd = NULL,
  m_eff = 1.25,
  b_eff_pos = 1.25,
  b_eff_neg = 0.75,
  save_rds = FALSE,
  rds_name = NULL
)

Arguments

db_ids

character() ID(s) of compounds to be read from a validated sdf file.

compound_names

character() Name of the compound(s) to be queried. Not case sensitive, but will only search compounds which begin with the string entered.

xls_file_name

character(1) Index file in .xls format.
This file can be produced using sdf2Index.

valid_sdf_file

character(1) sdf file to be read.
This file can also be produced using sdf2Index.

nbatch

numeric() Number of batches to simulate.

nsamps_per_batch

numeric() A numeric vector the same length as 1:nbatch. Number of samples included in each batch to be simulated.

QC_freq

numeric() A numeric vector the same length as 1:nbatch. Frequency of QC samples.

multiplyer

numeric(1) value to multiply exact m/z value from MoNA database to have an area measure (default is 1e2)

seed

numeric() the seed to be used for reproducibility..

sim_sd

numeric() Standard deviation of simulated data. Default value is 0.25*m/z value after multiplier.

m_eff

numeric() monotonic effect (slope). Should be >=1.

b_eff_pos

numeric() batch effect in the positive direction (between batch mean differences). Should be <=1.

b_eff_neg

numeric() batch effect in the negative direction (between batch mean differences). Should be >=1

save_rds

logical(1) To write to an rds file.

rds_name

character(1) Name of the .rds file if save_rds is set to TRUE (default is FALSE).

Value

A nested tibble. The first column is the db id of the compounds simulated. The second column is the simulated matrix. The following four columns represent four batch simulation scenarios, which are:

  • 1) t1_sim_mat - a monotonic (up/down effect) by batch.

  • 2) t2_sim_mat - a batch to batch block effect.

  • 3) t3_sim_mat - random, no systematic change.

  • 4) t4_sim_mat - monotonic and batch to batch block effect.

Examples

sim1 = simulate_data(compound_names = "tricin",
                     xls_file_name = system.file("extdata", "Index.xls", package = "pseudoDrift"),
                     valid_sdf_file = system.file("extdata", "valid-test.sdf", package = "pseudoDrift"))

sim2 = simulate_data(db_ids = c("FIO00738", "FIO00739","FIO00740"),
                     xls_file_name = system.file("extdata", "Index.xls", package = "pseudoDrift"),
                     valid_sdf_file = system.file("extdata", "valid-test.sdf", package = "pseudoDrift"))

## Providing name or compound db IDs gives the same result
identical(sim1,sim2)

jrod55/pseudoDrift documentation built on April 6, 2024, 5:23 a.m.