countSites_cherryMethDiff: Count Methylation Differences Between Cherry Pairs

View source: R/summaryStatistics.R

countSites_cherryMethDiffR Documentation

Count Methylation Differences Between Cherry Pairs

Description

This function calculates the number of methylation differences between pairs of cherry tips in a phylogenetic tree. A cherry is a pair of leaf nodes that share a direct common ancestor. The function quantifies full and half methylation differences for each genomic structure (e.g., island/non-island) across all sites.

Usage

countSites_cherryMethDiff(
  cherryDist,
  data,
  categorized_data = FALSE,
  input_control = TRUE
)

Arguments

cherryDist

A data frame containing pairwise distances between the tips of a phylogenetic tree that form cherries. This should be as the output of get_cherryDist, and must include the following columns:

first_tip_name

A character string representing the name of the first tip in the cherry.

second_tip_name

A character string representing the name of the second tip in the cherry.

first_tip_index

An integer representing the index of the first tip in the cherry.

second_tip_index

An integer representing the index of the second tip in the cherry.

dist

A numeric value representing the sum of the branch lengths between the two tips (i.e., the distance between the cherries).

data

A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island). The data should be structured as data[[tip]][[structure]], where each structure has the same number of sites across tips. The input data must be prefiltered to ensure CpG sites are represented consistently across different tips. Each element contains the methylation states at the sites in a given tip and structure represented as 0, 0.5 or 1 (for unmethylated, partially-methylated and methylated). If methylation states are not represented as 0, 0.5, 1 they are categorized as 0 when value equal or under 0.2 0.5 when value between 0.2 and 0.8 and 1 when value over 0.8. For customized categorization thresholds use categorize_siteMethSt

categorized_data

Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1.

input_control

A logical value indicating whether to validate the input data. If TRUE (default), the function checks that the data has the required structure. It ensures that the number of tips is sufficient and that the data structure is consistent across tips and structures. If FALSE, the function assumes the tree is already valid and skips the validation step.

Details

The function first verifies that cherryDist contains the required columns and has at least one row. It also ensures that data contains a sufficient number of tips and that all structures have the same number of sites. The function then iterates over each cherry and genomic structure to compute the number of full and half methylation differences between the two tips of each cherry.

Value

A data frame with one row per cherry, containing the following columns:

tip_names

A character string representing the names of the two tips in the cherry, concatenated with a hyphen.

tip_indices

A character string representing the indices of the two tips in the cherry, concatenated with a hyphen.

dist

A numeric value representing the sum of the branch distances between the cherry tips.

One column for each structure named with the structure number followed by _f

An integer count of the sites with a full methylation difference (where one tip is methylated and the other is unmethylated) for the given structure.

One column for each structure named with the structure number followed by _h

An integer count of the sites with a half methylation difference (where one tip is partially methylated and the other is either fully methylated or unmethylated) for the given structure.

Examples

# Example data setup

data <- list(
  list(c(0, 1, 0.5, 0), c(1, 1, 0, 0.5)),
  list(c(1, 0, 0.5, 1), c(0, 1, 0.5, 0.5))
)

tree <- "(tip1:0.25, tip2:0.25);"

cherryDist <- get_cherryDist(tree)

countSites_cherryMethDiff(cherryDist, data, categorized_data = TRUE)


MethEvolSIM documentation built on April 12, 2025, 1:30 a.m.