simulate_evolData: Simulate Data Evolution along a Tree

View source: R/user_functions.R

simulate_evolDataR Documentation

Simulate Data Evolution along a Tree

Description

This function simulates methylation data evolution along a tree. Either by simulating data at the root of the provided evolutionary tree (if infoStr is given) or by using pre-existing data at the root (if rootData is given) and letting it evolve along the tree.

Usage

simulate_evolData(
  infoStr = NULL,
  rootData = NULL,
  tree = NULL,
  params = NULL,
  dt = 0.01,
  CFTP = FALSE,
  CFTP_step_limit = 327680000,
  n_rep = 1,
  only_tip = TRUE
)

Arguments

infoStr

A data frame containing columns 'n' for the number of sites, and 'globalState' for the favoured global methylation state. If customized initial equilibrium frequencies are given, it also contains columns 'u_eqFreq', 'p_eqFreq', and 'm_eqFreq' with the equilibrium frequency values for unmethylated, partially methylated, and methylated.

rootData

The output of the simulate_initialData()$data function. It represents the initial data at the root of the evolutionary tree.

tree

A string in Newick format representing the evolutionary tree.

params

Optional data frame with specific parameter values. Structure as in get_parameterValues() output. If not provided, default values will be used.

dt

Length of time step for Gillespie's Tau-Leap Approximation (default is 0.01).

CFTP

Default FALSE. TRUE for calling cftp algorithm to set root state according to model equilibrium (Note that current implementation neglects IWE process).

CFTP_step_limit

when CFTP = TRUE, maximum number of steps before applying an approximation method (default 327680000 corresponding to size of CFTP info of approx 6.1 GB).

n_rep

Number of replicates to simulate (default is 1).

only_tip

Logical indicating whether to extract data only for tips (default is TRUE, FALSE to extract the information for all the tree branches).

Value

A list containing the parameters used ($params), the length of the time step used for the Gillespie's tau-leap approximation ($dt, default 0.01), the tree used ($tree). simulated data and the simulated data ($data). In $data, each list element corresponds to a simulation replicate.

  • If only_tip is TRUE: In $data, each list element corresponds to a simulation replicate. Each replicate includes one list per tree tip, each containing:

    • The name of each tip in the simulated tree (e.g. replicate 2, tip 1: $data[[2]][[1]]$name).

    • A list with the sequence of methylation states for each tip-specific structure (e.g. replicate 1, tip 2, 3rd structure: $data[[1]][[2]]$seq[[3]]. The methylation states are encoded as 0 for unmethylated, 0.5 for partially methylated, and 1 for methylated.

  • If only_tip is FALSE, $data contains 2 lists:

    • $data$branchInTree: a list in which each element contains the information of the relationship with other branches:

      • Index of the parent branch (e.g. branch 2): $data$branchInTree[[2]]$parent_index)

      • Index(es) of the offspring branch(es) (e.g. branch 1 (root)): $data$branchInTree[[1]]$offspring_index)

    • $data$sim_data: A list containing simulated data. Each list element corresponds to a simulation replicate. Each replicate includes one list per tree branch, each containing:

      • The name of each branch in the simulated tree. It's NULL for the tree root and inner nodes, and the name of the tips for the tree tips. (e.g. replicate 2, branch 1: $data$sim_data[[2]][[1]]$name)

      • Information of IWE events on that branch. It's NULL for the tree root and FALSE for the branches in which no IWE event was sampled, and a list containing $islands with the index(ces) of the island structure(s) that went through the IWE event and $times for the branch time point(s ) in which the IWE was sampled. (e.g. replicate 1, branch 3: $data$sim_data[[1]][[3]]$IWE)

      • A list with the sequence of methylation states for each structure (the index of the list corresponds to the index of the structures). The methylation states are encoded as 0 for unmethylated, 0.5 for partially methylated, and 1 for methylated. (e.g. replicate 3, branch 2, structure 1: $data$sim_data[[3]][[2]]$seq[[1]])

      • A list with the methylation equilibrium frequencies for each structure (the index of the list corresponds to the index of the structures). Each structure has a vector with 3 values, the first one corresponding to the frequency of unmethylated, the second one to the frequency of partially methylated, and the third one to the frequency of methylated CpGs. (e.g. replicate 3, branch 2, structure 1: $data$sim_data[[3]][[2]]$eqFreqs[[1]])

Examples

# Example data
infoStr <- data.frame(n = c(10, 100, 10), globalState = c("M", "U", "M"))

# Simulate data evolution along a tree with default parameters
simulate_evolData(infoStr = infoStr, tree = "(A:0.1,B:0.1);")

# Simulate data evolution along a tree with custom parameters
custom_params <- get_parameterValues()
custom_params$iota <- 0.5
simulate_evolData(infoStr = infoStr, tree = "(A:0.1,B:0.1);", params = custom_params)


MethEvolSIM documentation built on April 12, 2025, 1:30 a.m.