max_incong: Algortihm for maximizing incongruence between two phylogenies

max_incongR Documentation

Algortihm for maximizing incongruence between two phylogenies

Description

Prunes the host (H) and symbiont (S) phylogenies to conform with the trimmed matrix and computes the given global-fit method (PACo or ParaFit) between the pruned trees. Then, determines the frequency of each host-symbiont association occurring in a given percentile of cases that maximize phylogenetic incongruence.

Usage

max_incong(
  HS,
  treeH,
  treeS,
  n,
  N,
  method = "paco",
  symmetric = FALSE,
  ei.correct = "none",
  percentile = 0.99,
  diff.fq = FALSE,
  strat = "sequential",
  cl = 1
)

Arguments

HS

Host-Symbiont association matrix.

treeH

Host phyolgeny. An object of class "phylo".

treeS

Symbiont phylogeny. An object of class "phylo".

n

Number of associations.

N

Number of runs.

method

Specifies the desired global-fit method (PACo or ParaFit). The default is PACo. Options are "paco" (PACo) or "paraF" (ParaFit).

symmetric

Specifies the type of Procrustes superimposition. Default is FALSE, indicates that the superposition is applied asymmetrically (S depends on H). If TRUE, PACo is applied symmetrically (dependency between S and H is reciprocal).

ei.correct

Specifies how to correct potential negative eigenvalues from the conversion of phylogenetic distances into Principal Coordinates: "none" (the default) indicates that no correction is applied, particularly if H and S are ultrametric; "sqrt.D" takes the element-wise square-root of the phylogenetic distances; "lingoes" and "cailliez" apply the classical Lingoes and Cailliez corrections, respectively.

percentile

Percentile to evaluate (p). Default is 0.99 (99\%).

diff.fq

Determines whether a correction to detect those associations that present a similar contribution to (in)congruence and occur with some frequency at the 0.01 and 0.99 percentiles. These correction avoid multiple associations being overrepresented. If TRUE a corrected frequency value (observed in p - observed in (p-1)) is computed for each host-symbiont association.

strat

Flag indicating whether execution is to be "sequential" or "parallel". Default is "sequential", resolves R expressions sequentially in the current R process. If "parallel" resolves R expressions in parallel in separate R sessions running in the background.

cl

Number of cluster to be used for parallel computing. parallelly::availableCores() returns the number of clusters available. Default is cl = 1 resulting in "sequential" execution.y.

Value

A dataframe with host-symbiont associations in rows. The first and second columns display the names of the host and symbiont terminals, respectively. The third column designates the host-symbiont association by pasting the names of the terminals, and the fourth column displays the frequency of occurrence of each host-symbiont association in p. If diff.fq = TRUE, column 5 displays the corrected frequencies.

NOTE

The node.label object in both trees can not contain NAs or null values (i.e. no numeric value). All nodes should have a value. Else remove node labels within the "phylo" class tree with tree$node.label <- NULL. For more details, see distory::dist.multiPhylo().

  \code{GD} method can not be used with the trimmed matrices produced
  with \code{\link[=trimHS_maxI]{trimHS_maxI()}} or with the algorithm
  \code{\link[=max_incong]{max_incong()}} for those datasets with
  multiple associations.

Examples

data(nuc_pc)
N = 1 #for the example, we recommend 1e+4 value
n = 15
NPi <- max_incong(np_matrix, NUCtr, CPtr, n, N, method = "paco",
                  symmetric = FALSE, ei.correct = "sqrt.D",
                  percentile = 0.99, diff.fq = TRUE)



Rtapas documentation built on June 22, 2024, 10:47 a.m.