docs/functions/simulate_data.md

simulate_data

Simulate Multi-omic Data. Data will be structured with common samples (N) across multiple omics datasets (D) each with P features. Other parameters such as signal-to-noise, response family, etc. can be modified below.

Description

Simulate Multi-omic Data. Data will be structured with common samples (N) across multiple omics datasets (D) each with P features. Other parameters such as signal-to-noise, response family, etc. can be modified below.

Usage

simulate_data(
  N = 100,
  D = 3,
  P = 100,
  c1 = 3,
  c2 = 1,
  sparsity = 0.2,
  method = "factor",
  num.factors = 5,
  family = "gaussian",
  factors.influencing.X = 2,
  factors.influencing.Y = 2,
  ordinal.centers = c(-1.2, -1, 0, 1, 1.2),
  multi.centers.x = c(-1, -1, 1, 1)/sqrt(2),
  multi.centers.y = c(-1, 1, -1, 1)/sqrt(2),
  N.test = 1000,
  seed = 123
)

Arguments

Argument |Description ------------- |---------------- N | Number of samples (N). Defaults to 100. D | Number of datasets (D). Defaults to 3. P | Number of features (P) per dataset (D). Total number of features will be P * D. Defaults to 100. c1 | Primary level of signal provided. c1 = 1 will be less signal-to-noise, whereas a higher c1 (c1 >= 3) will be more. Defaults to 3. c2 | A second signal parameter, controls spread of signal from true means. c2 = 1 (default) is normal spread. Increase to reduce spread, and decrease to spread points further. Only used when family = "ordinal" or family = "multinomial". Will ignore if family = "gaussian". sparsity | How much sparsity to implement in X? Defaults to 0.2, can be between 0 and 1. method | How to simulate the data? method = "factor" will simulate data from num.factors true factors. method = "random" will simulate X randomly (with correlation depending upon c) and Y directly from X. num.factors | How many factors to be simulated in U? Defaults to 5. Note that using method = "random" will ignore this parameter. family | What type of response to simulate for Y? Options are "gaussian" (default), "ordinal", and "multinomial". Note that "ordinal" and "multinomial" require additional parameters below. factors.influencing.X | How many factors should influence X? Defaults to 2. Note that using method = "random" will ignore this parameter. factors.influencing.Y | How many factors should influence Y? Defaults to 2. Note that using method = "random" will ignore this parameter. ordinal.centers | Centers for signal for family = "ordinal" Defaults to c(-1.2,-1, 0, 1,1.2). Must be a vector of length C, where C is the number of ordinal classes. If family = "multinomial", use multi.centers.x and multi.centers.y instead. Ignores for family = "gaussian". multi.centers.x | X-axis centers for signal for family = "multinomial" Defaults to c(-1, -1, 1, 1)/sqrt(2). Must be a vector of length C, where C is the number of multinomial classes. If family = "ordinal", use ordinal.centers instead. Ignores for family = "gaussian". multi.centers.y | Y axis centers for signal for family = "multinomial" Defaults to c(-1, 1, -1, 1)/sqrt(2). Must be a vector of length C, where C is the number of multinomial classes. If family = "ordinal", use ordinal.centers instead. Ignores for family = "gaussian". N.test | Number of samples for the test dataset to be returned. Defaults to 1000. seed | Seed to set for consistent results. Defaults to seed = 123



jgygi/SPEAR documentation built on July 5, 2023, 5:35 p.m.