estPhylo: Function to estimate & plot phylogenetic tree

View source: R/other_useful_functions.R

estPhyloR Documentation

Function to estimate & plot phylogenetic tree

Description

Function to estimate & plot phylogenetic tree

Usage

estPhylo(
  blockInterest = NULL,
  gwasRes = NULL,
  nTopRes = 1,
  gene.set = NULL,
  indexRegion = 1:10,
  chrInterest = NULL,
  posRegion = NULL,
  blockName = NULL,
  nHaplo = NULL,
  pheno = NULL,
  geno = NULL,
  ZETA = NULL,
  chi2Test = TRUE,
  thresChi2Test = 0.05,
  plotTree = TRUE,
  distMat = NULL,
  distMethod = "manhattan",
  evolutionDist = FALSE,
  subpopInfo = NULL,
  groupingMethod = "kmedoids",
  nGrp = 3,
  nIterClustering = 100,
  kernelTypes = "addNOIA",
  n.core = parallel::detectCores() - 1,
  parallel.method = "mclapply",
  hOpt = "optimized",
  hOpt2 = "optimized",
  maxIter = 20,
  rangeHStart = 10^c(-1:1),
  saveName = NULL,
  saveStyle = "png",
  pchBase = c(1, 16),
  colNodeBase = c(2, 4),
  colTipBase = c(3, 5, 6),
  cexMax = 2,
  cexMin = 0.7,
  edgeColoring = TRUE,
  tipLabel = TRUE,
  ggPlotTree = FALSE,
  cexMaxForGG = 0.12,
  cexMinForGG = 0.06,
  alphaBase = c(0.9, 0.3),
  verbose = TRUE
)

Arguments

blockInterest

A n \times M matrix representing the marker genotype that belongs to the haplotype block of interest. If this argument is NULL, this argument will automatically be determined by 'geno',

gwasRes

You can use the results (data.frame) of haplotype-based (SNP-set) GWAS by 'RGWAS.multisnp' function.

nTopRes

Haplotype blocks (or gene sets, SNP-sets) with top 'nTopRes' p-values by 'gwasRes' will be used.

gene.set

If you have information of gene (or haplotype block), you can use it to perform kernel-based GWAS. You should assign your gene information to gene.set in the form of a "data.frame" (whose dimension is (the number of gene) x 2). In the first column, you should assign the gene name. And in the second column, you should assign the names of each marker, which correspond to the marker names of "geno" argument.

indexRegion

You can specify the haplotype block (or gene set, SNP-set) of interest by the marker index in 'geno'.

chrInterest

You can specify the haplotype block (or gene set, SNP-set) of interest by the marker position in 'geno'. Please assign the chromosome number to this argument.

posRegion

You can specify the haplotype block (or gene set, SNP-set) of interest by the marker position in 'geno'. Please assign the position in the chromosome to this argument.

blockName

You can specify the haplotype block (or gene set, SNP-set) of interest by the name of haplotype block in 'geno'.

nHaplo

Number of haplotypes. If not defined, this is automatically defined by the data. If defined, k-medoids clustering is performed to define haplotypes.

pheno

Data frame where the first column is the line name (gid). The remaining columns should be a phenotype to test.

geno

Data frame with the marker names in the first column. The second and third columns contain the chromosome and map position. Columns 4 and higher contain the marker scores for each line, coded as [-1, 0, 1] = [aa, Aa, AA].

ZETA

A list of covariance (relationship) matrix (K: m \times m) and its design matrix (Z: n \times m) of random effects. Please set names of list "Z" and "K"! You can use more than one kernel matrix. For example,

ZETA = list(A = list(Z = Z.A, K = K.A), D = list(Z = Z.D, K = K.D))

Z.A, Z.D

Design matrix (n \times m) for the random effects. So, in many cases, you can use the identity matrix.

K.A, K.D

Different kernels which express some relationships between lines.

For example, K.A is additive relationship matrix for the covariance between lines, and K.D is dominance relationship matrix.

chi2Test

If TRUE, chi-square test for the relationship between haplotypes & subpopulations will be performed.

thresChi2Test

The threshold for the chi-square test.

plotTree

If TRUE, the function will return the plot of phylogenetic tree.

distMat

You can assign the distance matrix of the block of interest. If NULL, the distance matrix will be computed in this function.

distMethod

You can choose the method to calculate distance between accessions. This argument corresponds to the 'method' argument in the 'dist' function.

evolutionDist

If TRUE, the evolution distance will be used instead of the pure distance. The 'distMat' will be converted to the distance matrix by the evolution distance.

subpopInfo

The information of subpopulations. This argument should be a vector of factor.

groupingMethod

If 'subpopInfo' argument is NULL, this function estimates subpopulation information from marker genotype. You can choose the grouping method from 'kmeans', 'kmedoids', and 'hclust'.

nGrp

The number of groups (or subpopulations) grouped by 'groupingMethod'. If this argument is 0, the subpopulation information will not be estimated.

nIterClustering

If 'groupingMethod' = 'kmeans', the clustering will be performed multiple times. This argument specifies the number of classification performed by the function.

kernelTypes

In the function, similarlity matrix between accessions will be computed from marker genotype to estimate genotypic values. This argument specifies the method to compute similarity matrix: If this argument is 'addNOIA' (or one of other options in 'methodGRM' in 'calcGRM'), then the 'addNOIA' (or corresponding) option in the 'calcGRM' function will be used, and if this argument is 'phylo', the gaussian kernel based on phylogenetic distance will be computed from phylogenetic tree. You can assign more than one kernelTypes for this argument; for example, kernelTypes = c("addNOIA", "phylo").

n.core

Setting n.core > 1 will enable parallel execution on a machine with multiple cores. This argument is not valid when 'parallel.method = "furrr"'.

parallel.method

Method for parallel computation in optimizing hyperparameters for estimating haplotype effects. We offer three methods, "mclapply", "furrr", and "foreach".

When 'parallel.method = "mclapply"', we utilize pbmclapply function in the 'pbmcapply' package with 'count = TRUE' and mclapply function in the 'parallel' package with 'count = FALSE'.

When 'parallel.method = "furrr"', we utilize future_map function in the 'furrr' package. With 'count = TRUE', we also utilize progressor function in the 'progressr' package to show the progress bar, so please install the 'progressr' package from github (https://github.com/HenrikBengtsson/progressr). For 'parallel.method = "furrr"', you can perform multi-thread parallelization by sharing memories, which results in saving your memory, but quite slower compared to 'parallel.method = "mclapply"'.

When 'parallel.method = "foreach"', we utilize foreach function in the 'foreach' package with the utilization of makeCluster function in 'parallel' package, and registerDoParallel function in 'doParallel' package. With 'count = TRUE', we also utilize setTxtProgressBar and txtProgressBar functions in the 'utils' package to show the progress bar.

We recommend that you use the option 'parallel.method = "mclapply"', but for Windows users, this parallelization method is not supported. So, if you are Windows user, we recommend that you use the option 'parallel.method = "foreach"'.

hOpt

Optimized hyper parameter for constructing kernel when estimating haplotype effects. If hOpt = "optimized", hyper parameter will be optimized in the function. If hOpt = "tuned", hyper parameter will be replaced by the median of off-diagonal of distance matrix. If hOpt is numeric, that value will be directly used in the function.

hOpt2

Optimized hyper parameter for constructing kernel when estimating haplotype effects of nodes. If hOpt2 = "optimized", hyper parameter will be optimized in the function. If hOpt2 = "tuned", hyper parameter will be replaced by the median of off-diagonal of distance matrix. If hOpt2 is numeric, that value will be directly used in the function.

maxIter

Max number of iterations for optimization algorithm.

rangeHStart

The median of off-diagonal of distance matrix multiplied by rangeHStart will be used as the initial values for optimization of hyper parameters.

saveName

When drawing any plot, you can save plots in png format. In saveName, you should substitute the name you want to save. When saveName = NULL, the plot is not saved.

saveStyle

This argument specifies how to save the plot of phylogenetic tree. The function offers 'png', 'pdf', 'jpg', and 'tiff'.

pchBase

A vector of two integers specifying the plot types for the positive and negative genotypic values respectively.

colNodeBase

A vector of two integers or chracters specifying color of nodes for the positive and negative genotypic values respectively.

colTipBase

A vector of integers or chracters specifying color of tips for the positive and negative genotypic values respectively. The length of the vector should equal to the number of subpopulations.

cexMax

A numeric specifying the maximum point size of the plot.

cexMin

A numeric specifying the minimum point size of the plot.

edgeColoring

If TRUE, the edge branch of phylogenetic tree wiil be colored.

tipLabel

If TRUE, lavels for tips will be shown.

ggPlotTree

If TRUE, the function will return the ggplot version of phylogenetic tree. It offers the precise information on subgroups for each haplotype.

cexMaxForGG

A numeric specifying the maximum point size of the plot for ggtree, relative to the range of x and y-axes (0 < cexMaxForGG <= 1).

cexMinForGG

A numeric specifying the minimum point size of the plot for ggtree, relative to the range of x and y-axes (0 < cexMaxForGG <= 1).

alphaBase

alpha (parameter that indicates the opacity of a geom) for tip with positive / negative effects. alpha for node will be same as the alpha for tip with negative effects.

verbose

If this argument is TRUE, messages for the current step_s will be shown.

Value

A list / lists of

$haplotypeInfo

A list of haplotype information with

$haploCluster

A vector indicating each individual belongs to which haplotypes.

$haploMat

A n x h matrix where n is the number of genotypes and h is the number of haplotypes.

$haploBlock

Marker genotype of haplotype block of interest for the representing haplotypes.

$subpopInfo

The information of subpopulations.

$distMats

A list of distance matrix:

$distMat

Distance matrix between haplotypes.

$distMatEvol

Evolutionary distance matrix between haplotypes.

$distMatNJ

Phylogenetic distance matrix between haplotypes including nodes.

$pValChi2Test

A p-value of the chi-square test for the dependency between haplotypes & subpopulations. If 'chi2Test = FALSE', 'NA' will be returned.

$njRes

The result of phylogenetic tree by neighborhood-joining method

$gvTotal

Estimated genotypic values by kernel regression for each haplotype.

$gvTotalForLine

Estimated genotypic values by kernel regression for each individual.

$minuslog10p

-log_{10}(p) for haplotype block of interest. p is the p-value for the siginifacance of the haplotype block effect.

$hOpts

Optimized hyper parameters, hOpt1 & hOpt2.

$EMMResults

A list of estimated results of kernel regression:

$EM3Res

Estimated results of kernel regression for the estimation of haplotype effects. (1st step)

$EMMRes

Estimated results of kernel regression for the estimation of haplotype effects of nodes. (2nd step)

$EMM0Res

Estimated results of kernel regression for the null model.

$clusterNosForHaplotype

A list of cluster Nos of individuals that belong to each haplotype.


RAINBOWR documentation built on July 4, 2024, 1:11 a.m.