R/zeroRDD_to_splits.R

Defines functions zeroRDD_to_splits

#' Identifying the splits of the tree-topology from zero-valued RD differences
#'
#' This function identifies splits of the tree-topology from zero-valued RD differences.
#'
#' Each zero-valued RD differences identifies the existence of a split in the tree-topology. In practice, many equivalent splits can be produced, and some splits identified by multiple zero-valued RD differences are essentially the same split. By comparing the taxa involved in splits identified by zero-valued RD difference, one can identify splits that are equivalent. Once all equivalencies are determined, a set of distinct splits is identified. Combining all the equivalent splits while keeping all of the splits that conflict with one another greatly improves the computation accuracy and efficiency in function `topology_to_newick`.
#'
#' To achieve this, the procedure could be split as two steps. First, check the equivalence of each split; second, eliminate the repeated splits. Details on how to identify distinct splits can be found in the paper Peng et al 2021.
#'
#' @param array_zero_ID An array where zero valued RD differences have a value zero, and the rest have a value of 1. Note that the array is P X P X P, and for each i, array_zero_ID[i, , ] is symmetric. This array could be generated by function `BMaf_to_zeroRDD`
#'
#' @return A matrix with P columns, where each row represents a split. In a given row, a k-way split is represented by assigning each set of taxa that is descendent of a split as an identifying number from 1 to k, and numbering the position corresponding to each taxa that is part of such a set by that identifying number. The other positions are set to zero.
#' @noRd
#'
#' @examples
#'
#' # load example data from rapidphylo package
#' data("Human_Allele_Frequencies")
#' mat_allele_freq <- Human_Allele_Frequencies
#' # perform logistic transformation
#' mat_allele_freq[mat_allele_freq==1] <- 0.99
#' mat_allele_freq[mat_allele_freq==0] <- 0.01
#' trans_mat_allele_freq <- log(mat_allele_freq/(1-mat_allele_freq))
#' # convert type of object into data frame
#' trans_mat_allele_freq <- as.data.frame(trans_mat_allele_freq)
#' outgroup <- 'Han'
#' names<-row.names(trans_mat_allele_freq)
#' # use the population names as the row names of your transformed allele frequency matrix")
#' if (is.character(outgroup)){
#'   index<-which(names==outgroup)
#'   }else {
#'     index<-outgroup
#'     }
#' trans_mat_allele_freq<-rbind(trans_mat_allele_freq[-index, ],trans_mat_allele_freq[index,])
#' label<-row.names(trans_mat_allele_freq)
#' array_zero_ID<-BMaf_to_zeroRDD(trans_mat_allele_freq,use="pairwise.complete.obs")
#' # run function
#' base_tree<-zeroRDD_to_splits(array_zero_ID)
#' base_tree[1:20, 1:13]

zeroRDD_to_splits<-function(array_zero_ID){
  split_set_indicator<-check.equivalence(array_zero_ID)
  new_split_set_indicator<-eliminate.rep(split_set_indicator)
  return(new_split_set_indicator)
}

Try the rapidphylo package in your browser

Any scripts or data that you put into this service are public.

rapidphylo documentation built on Feb. 16, 2023, 10:41 p.m.