freqSites_cherryMethDiff: Compute Methylation Frequency Differences Between Cherry...

View source: R/summaryStatistics.R

freqSites_cherryMethDiffR Documentation

Compute Methylation Frequency Differences Between Cherry Pairs

Description

This function calculates the frequency of methylation differences between pairs of cherry tips in a phylogenetic tree. A cherry is a pair of leaf nodes that share a direct common ancestor. The function quantifies full and half methylation differences for each genomic structure (e.g., island/non-island) across all sites and normalizes these counts by the number of sites per structure to obtain frequencies.

Usage

freqSites_cherryMethDiff(
  tree,
  data,
  categorized_data = FALSE,
  input_control = TRUE
)

Arguments

tree

A phylogenetic tree object. The function assumes it follows an appropriate format for downstream processing.

data

A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island). The data should be structured as data[[tip]][[structure]], where each structure has the same number of sites across tips. The input data must be prefiltered to ensure CpG sites are represented consistently across different tips. Each element contains the methylation states at the sites in a given tip and structure represented as 0, 0.5 or 1 (for unmethylated, partially-methylated and methylated). If methylation states are not represented as 0, 0.5, 1 they are categorized as 0 when value equal or under 0.2 0.5 when value between 0.2 and 0.8 and 1 when value over 0.8. For customized categorization thresholds use categorize_siteMethSt

categorized_data

Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1.

input_control

A logical value indicating whether to validate the input data. If TRUE (default), the function checks that the data has the required structure. It ensures that the number of tips is sufficient and that the data structure is consistent across tips and structures. If FALSE, the function assumes the tree is already valid and skips the validation step.

Details

The function first validates the tree structure and extracts pairwise distances between cherry tips. It then counts methylation differences using countSites_cherryMethDiff and normalizes these counts by the number of sites per structure to compute frequencies. The resulting data frame provides a per-cherry frequency of methylation differences (half or full difference) across different genomic structures.

Value

A data frame with one row per cherry, containing the following columns:

tip_names

A character string representing the names of the two tips in the cherry, concatenated with a hyphen.

tip_indices

A character string representing the indices of the two tips in the cherry, concatenated with a hyphen.

dist

A numeric value representing the sum of the branch distances between the cherry tips.

One column for each structure named with the structure number followed by _f

A numeric value representing the frequency of sites with a full methylation difference (where one tip is methylated and the other is unmethylated) for the given structure.

One column for each structure named with the structure number followed by _h

A numeric value representing the frequency of sites with a half methylation difference (where one tip is partially methylated and the other is either fully methylated or unmethylated) for the given structure.

Examples

# Example data setup

data <- list(
list(rep(1,10), rep(0,5), rep(1,8)),
list(rep(1,10), rep(0.5,5), rep(0,8)),
list(rep(1,10), rep(0.5,5), rep(0,8)),
list(c(rep(0,5), rep(0.5, 5)), c(0, 0, 1, 1, 1), c(0.5, 1, rep(0, 6))))

tree <- "((a:1.5,b:1.5):2,(c:2,d:2):1.5);"

freqSites_cherryMethDiff(tree, data, categorized_data = TRUE)


MethEvolSIM documentation built on April 12, 2025, 1:30 a.m.