mean_TreeFreqsChange_i: Mean Number of Significant Frequency Changes per Island...

View source: R/summaryStatistics.R

mean_TreeFreqsChange_iR Documentation

Mean Number of Significant Frequency Changes per Island Across all Tree Tips

Description

This function analyzes the frequency changes of methylation states (unmethylated, partially methylated, methylated) across tree tips for a given set of islands. It performs a chi-squared test for each island to check for significant changes in frequencies across tips and returns the proportion of islands showing significant changes.

Usage

mean_TreeFreqsChange_i(
  tree,
  data,
  categorized_data = FALSE,
  index_islands,
  pValue_threshold,
  testing = FALSE
)

Arguments

tree

A phylogenetic tree object, typically of class phylo, containing tip labels.

data

A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island). The data should be structured as data[[tip]][[structure]], where each structure has the same number of sites across tips. The input data must be prefiltered to ensure CpG sites are represented consistently across different tips. Each element contains the methylation states at the sites in a given tip and structure represented as 0, 0.5 or 1 (for unmethylated, partially-methylated and methylated). If methylation states are not represented as 0, 0.5, 1 they are categorized as 0 when value equal or under 0.2 0.5 when value between 0.2 and 0.8 and 1 when value over 0.8. For customized categorization thresholds use categorize_siteMethSt

categorized_data

Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1.

index_islands

A vector of indices of genomic structures corresponding to islands in data.

pValue_threshold

A numeric value between 0 and 1 that serves as the threshold for statistical significance in the chi-squared test.

testing

Logical defaulted to FALSE. TRUE for testing output.

Details

The function uses simulate.p.value = TRUE in chisq.test to compute the p-value via Monte Carlo simulation to improve reliability regardless of whether the expected frequencies meet the assumptions of the chi-squared test (i.e., expected counts of at least 5 in each category).

Throws errors if:

  • The tree is not valid.

  • data is not structured correctly across tips.

  • index_islands is empty.

  • pValue_threshold is not between 0 and 1.

Value

A numeric value representing the mean proportion of islands with significant frequency changes across tips.

Examples

# Example of usage:

tree <- "((d:1,e:1):2,a:2);"

data <- list(
  #Tip 1
  list(c(rep(1,9), rep(0,1)), 
       c(rep(0,9), 1), 
       c(rep(0,9), rep(0.5,1))), 
  #Tip 2
  list(c(rep(1,9), rep(0.5,1)), 
       c(rep(0.5,9), 1), 
       c(rep(1,9), rep(0,1))), 
  #Tip 3
  list(c(rep(1,9), rep(0.5,1)), 
       c(rep(0.5,9), 1), 
       c(rep(0,9), rep(0.5,1)))) 
       
index_islands <- c(1,3)


mean_TreeFreqsChange_i(tree, 
                       data, categorized_data = TRUE,
                       index_islands, 
                       pValue_threshold = 0.05)


MethEvolSIM documentation built on April 12, 2025, 1:30 a.m.