simulate_data: Simulate full data set
In seroanalytics/serosolver: Inference Framework for Serological Data

simulate_data

R Documentation

Simulate full data set

Description

Simulates a full data set for a given set of parameters etc.

Usage

simulate_data(
  par_tab,
  group = 1,
  n_indiv = 100,
  antigenic_map = NULL,
  possible_exposure_times = NULL,
  measured_biomarker_ids = NULL,
  sampling_times,
  nsamps = 2,
  missing_data = 0,
  age_min = 5,
  age_max = 80,
  age_group_bounds = NULL,
  attack_rates,
  repeats = 1,
  measurement_bias = NULL,
  data_type = NULL,
  demographics = NULL,
  verbose = FALSE
)

Arguments

`par_tab`	the full parameter table controlling parameter ranges and values
`group`	which group index to give this simulated data
`n_indiv`	number of individuals to simulate
`antigenic_map`	(optional) A data frame of antigenic x and y coordinates. Must have column names: x_coord; y_coord; inf_times. See `example_antigenic_map`.
`possible_exposure_times`	(optional) If no antigenic map is specified, this argument gives the vector of times at which individuals can be infected
`measured_biomarker_ids`	vector of biomarker IDs that have titres measured matching entries in possible_exposure_times
`sampling_times`	possible sampling times for the individuals, matching entries in possible_exposure_times
`nsamps`	the number of samples each individual has (eg. nsamps=2 gives each individual 2 random sampling times from sampling_times)
`missing_data`	numeric between 0 and 1, used to censor a proportion of titre observations at random (MAR)
`age_min`	simulated age minimum
`age_max`	simulated age maximum
`attack_rates`	a vector of attack_rates for each entry in possible_exposure_times to be used in the simulation (between 0 and 1)
`repeats`	number of repeat observations for each year
`data_type`	if not NULL, then a vector of data types to use for each biomarker_group
`demographics`	if not NULL, then a tibble for each individual (1:n_indiv) giving demographic variable entries. Most importantly must include "birth" as the birth time. This is used if, for example, you have a stratification grouping in 'par_tab'
`verbose`	if TRUE, prints additional messages
`measurement_indices`	default NULL, optional vector giving the index of ‘measurement_bias' that each antigen/biomarker ID uses the measurement shift from from. eg. if there’s 6 circulation years and 3 strain clusters, then this might be c(1,1,2,2,3,3)

Value

a list with: 1) the data frame of antibody data as returned by simulate_group; 2) a matrix of infection histories as returned by simulate_infection_histories; 3) a vector of ages

Examples

data(example_par_tab)
data(example_antigenic_map)

## Times at which individuals can be infected
possible_exposure_times <- example_antigenic_map$inf_times
## Simulate some random attack rates between 0 and 0.2
attack_rates <- runif(length(possible_exposure_times), 0, 0.2)
## Vector giving the circulation times of measured antigens
sampled_antigens <- seq(min(possible_exposure_times), max(possible_exposure_times), by=2)
all_simulated_data <- simulate_data(par_tab=example_par_tab, group=1, n_indiv=50,    
                                   possible_exposure_times=possible_exposure_times,
                                   measured_biomarker_ids=sampled_antigens,
                                   sampling_times=2010:2015, nsamps=2, antigenic_map=example_antigenic_map, 
                                   age_min=10,age_max=75,
                                   attack_rates=attack_rates, repeats=2)
antibody_data <- all_simulated_data$data
antibody_data <- merge(antibody_data, all_simulated_data$ages)

seroanalytics/serosolver documentation built on April 12, 2025, 7:49 p.m.