estNetwork: Function to estimate & plot haplotype network
In RAINBOWR: Genome-Wide Association Study with SNP-Set Methods

estNetwork

R Documentation

Function to estimate & plot haplotype network

Description

Function to estimate & plot haplotype network

Usage

estNetwork(
  blockInterest = NULL,
  gwasRes = NULL,
  nTopRes = 1,
  gene.set = NULL,
  indexRegion = 1:10,
  chrInterest = NULL,
  posRegion = NULL,
  blockName = NULL,
  nHaplo = NULL,
  pheno = NULL,
  geno = NULL,
  ZETA = NULL,
  chi2Test = TRUE,
  thresChi2Test = 0.05,
  plotNetwork = TRUE,
  distMat = NULL,
  distMethod = "manhattan",
  evolutionDist = FALSE,
  complementHaplo = "phylo",
  subpopInfo = NULL,
  groupingMethod = "kmedoids",
  nGrp = 3,
  nIterClustering = 100,
  iterRmst = 100,
  networkMethod = "rmst",
  autogamous = FALSE,
  probParsimony = 0.95,
  nMaxHaplo = 1000,
  kernelTypes = "addNOIA",
  n.core = parallel::detectCores() - 1,
  parallel.method = "mclapply",
  hOpt = "optimized",
  hOpt2 = "optimized",
  maxIter = 20,
  rangeHStart = 10^c(-1:1),
  saveName = NULL,
  saveStyle = "png",
  plotWhichMDS = 1:2,
  colConnection = c("grey40", "grey60"),
  ltyConnection = c("solid", "dashed"),
  lwdConnection = c(1.5, 0.8),
  pchBase = c(1, 16),
  colCompBase = c(2, 4),
  colHaploBase = c(3, 5, 6),
  cexMax = 2,
  cexMin = 0.7,
  ggPlotNetwork = FALSE,
  cexMaxForGG = 0.025,
  cexMinForGG = 0.008,
  alphaBase = c(0.9, 0.3),
  verbose = TRUE
)

Arguments

`blockInterest`	A `n \times M` matrix representing the marker genotype that belongs to the haplotype block of interest. If this argument is NULL, this argument will automatically be determined by 'geno',
`gwasRes`	You can use the results (data.frame) of haplotype-based (SNP-set) GWAS by 'RGWAS.multisnp' function.
`nTopRes`	Haplotype blocks (or gene sets, SNP-sets) with top 'nTopRes' p-values by 'gwasRes' will be used.
`gene.set`	If you have information of gene (or haplotype block), you can use it to perform kernel-based GWAS. You should assign your gene information to gene.set in the form of a "data.frame" (whose dimension is (the number of gene) x 2). In the first column, you should assign the gene name. And in the second column, you should assign the names of each marker, which correspond to the marker names of "geno" argument.
`indexRegion`	You can specify the haplotype block (or gene set, SNP-set) of interest by the marker index in 'geno'.
`chrInterest`	You can specify the haplotype block (or gene set, SNP-set) of interest by the marker position in 'geno'. Please assign the chromosome number to this argument.
`posRegion`	You can specify the haplotype block (or gene set, SNP-set) of interest by the marker position in 'geno'. Please assign the position in the chromosome to this argument.
`blockName`	You can specify the haplotype block (or gene set, SNP-set) of interest by the name of haplotype block in 'geno'.
`nHaplo`	Number of haplotypes. If not defined, this is automatically defined by the data. If defined, k-medoids clustering is performed to define haplotypes.
`pheno`	Data frame where the first column is the line name (gid). The remaining columns should be a phenotype to test.
`geno`	Data frame with the marker names in the first column. The second and third columns contain the chromosome and map position. Columns 4 and higher contain the marker scores for each line, coded as [-1, 0, 1] = [aa, Aa, AA].
`ZETA`	A list of covariance (relationship) matrix (K: `m \times m`) and its design matrix (Z: `n \times m`) of random effects. Please set names of list "Z" and "K"! You can use more than one kernel matrix. For example, ZETA = list(A = list(Z = Z.A, K = K.A), D = list(Z = Z.D, K = K.D)) Z.A, Z.D Design matrix (`n \times m`) for the random effects. So, in many cases, you can use the identity matrix. K.A, K.D Different kernels which express some relationships between lines. For example, K.A is additive relationship matrix for the covariance between lines, and K.D is dominance relationship matrix.
`chi2Test`	If TRUE, chi-square test for the relationship between haplotypes & subpopulations will be performed.
`thresChi2Test`	The threshold for the chi-square test.
`plotNetwork`	If TRUE, the function will return the plot of haplotype network.
`distMat`	You can assign the distance matrix of the block of interest. If NULL, the distance matrix will be computed in this function.
`distMethod`	You can choose the method to calculate distance between accessions. This argument corresponds to the 'method' argument in the 'dist' function.
`evolutionDist`	If TRUE, the evolution distance will be used instead of the pure distance. The 'distMat' will be converted to the distance matrix by the evolution distance when you use 'complementHaplo = "phylo"'.
`complementHaplo`	how to complement unobserved haplotypes. When 'complementHaplo = "all"', all possible haplotypes will be complemented from the observed haplotypes. When 'complementHaplo = "never"', unobserved haplotypes will not be complemented. When 'complementHaplo = "phylo"', unobserved haplotypes will be complemented as nodes of phylogenetic tree. When 'complementHaplo = "TCS"', unobserved haplotypes will be complemented by TCS methods (Clement et al., 2002).
`subpopInfo`	The information of subpopulations. This argument should be a vector of factor.
`groupingMethod`	If 'subpopInfo' argument is NULL, this function estimates subpopulation information from marker genotype. You can choose the grouping method from 'kmeans', 'kmedoids', and 'hclust'.
`nGrp`	The number of groups (or subpopulations) grouped by 'groupingMethod'. If this argument is 0, the subpopulation information will not be estimated.
`nIterClustering`	If 'groupingMethod' = 'kmeans', the clustering will be performed multiple times. This argument specifies the number of classification performed by the function.
`iterRmst`	The number of iterations for RMST (randomized minimum spanning tree).
`networkMethod`	Either one of 'mst' (minimum spanning tree), 'msn' (minimum spanning network), and 'rmst' (randomized minimum spanning tree). 'rmst' is recommended.
`autogamous`	This argument will be valid only when you use 'complementHaplo = "all"' or 'complementHaplo = "TCS"'. This argument specifies whether the plant is autogamous or not. If autogamous = TRUE, complemented haplotype will consist of only homozygous sites ([-1, 1]). If FALSE, complemented haplotype will consist of both homozygous & heterozygous sites ([-1, 0, 1]).
`probParsimony`	Equal to the argument 'prob' in 'haplotypes::parsimnet' function: A numeric vector of length one in the range [0.01, 0.99] giving the probability of parsimony as defined in Templeton et al. (1992). In order to set maximum connection steps to Inf (to connect all the haplotypes in a single network), set the probability to NULL.
`nMaxHaplo`	The maximum number of haplotypes. If the number of total (complemented + original) haplotypes are larger than 'nMaxHaplo', we will only show the results only for the original haplotypes to reduce the computational time.
`kernelTypes`	In the function, similarlity matrix between accessions will be computed from marker genotype to estimate genotypic values. This argument specifies the method to compute similarity matrix: If this argument is 'addNOIA' (or one of other options in 'methodGRM' in 'calcGRM'), then the 'addNOIA' (or corresponding) option in the 'calcGRM' function will be used, and if this argument is 'diffusion', the diffusion kernel based on Laplacian matrix will be computed from network. You can assign more than one kernelTypes for this argument; for example, kernelTypes = c("addNOIA", "diffusion").
`n.core`	Setting n.core > 1 will enable parallel execution on a machine with multiple cores. This argument is not valid when 'parallel.method = "furrr"'.
`parallel.method`	Method for parallel computation in optimizing hyperparameters for estimating haplotype effects. We offer three methods, "mclapply", "furrr", and "foreach". When 'parallel.method = "mclapply"', we utilize `pbmclapply` function in the 'pbmcapply' package with 'count = TRUE' and `mclapply` function in the 'parallel' package with 'count = FALSE'. When 'parallel.method = "furrr"', we utilize `future_map` function in the 'furrr' package. With 'count = TRUE', we also utilize `progressor` function in the 'progressr' package to show the progress bar, so please install the 'progressr' package from github (https://github.com/futureverse/progressr). For 'parallel.method = "furrr"', you can perform multi-thread parallelization by sharing memories, which results in saving your memory, but quite slower compared to 'parallel.method = "mclapply"'. When 'parallel.method = "foreach"', we utilize `foreach` function in the 'foreach' package with the utilization of `makeCluster` function in 'parallel' package, and `registerDoParallel` function in 'doParallel' package. With 'count = TRUE', we also utilize `setTxtProgressBar` and `txtProgressBar` functions in the 'utils' package to show the progress bar. We recommend that you use the option 'parallel.method = "mclapply"', but for Windows users, this parallelization method is not supported. So, if you are Windows user, we recommend that you use the option 'parallel.method = "foreach"'.
`hOpt`	Optimized hyper parameter for constructing kernel when estimating haplotype effects. If hOpt = "optimized", hyper parameter will be optimized in the function. If hOpt is numeric, that value will be directly used in the function.
`hOpt2`	Optimized hyper parameter for constructing kernel when estimating complemented haplotype effects. If hOpt2 = "optimized", hyper parameter will be optimized in the function. If hOpt2 is numeric, that value will be directly used in the function.
`maxIter`	Max number of iterations for optimization algorithm.
`rangeHStart`	The median of off-diagonal of distance matrix multiplied by rangeHStart will be used as the initial values for optimization of hyper parameters.
`saveName`	When drawing any plot, you can save plots in png format. In saveName, you should substitute the name you want to save. When saveName = NULL, the plot is not saved.
`saveStyle`	This argument specifies how to save the plot of phylogenetic tree. The function offers 'png', 'pdf', 'jpg', and 'tiff'.
`plotWhichMDS`	We will show the MDS (multi-dimensional scaling) plot, and this argument is a vector of two integers specifying that will define which MDS dimension will be plotted. The first and second integers correspond to the horizontal and vertical axes, respectively.
`colConnection`	A vector of two integers or characters specifying the colors of connection between nodes for the original and complemented haplotypes, respectively.
`ltyConnection`	A vector of two characters specifying the line types of connection between nodes for the original and complemented haplotypes, respectively.
`lwdConnection`	A vector of two integers specifying the line widths of connection between nodes for the original and complemented haplotypes, respectively.
`pchBase`	A vector of two integers specifying the plot types for the positive and negative genotypic values respectively.
`colCompBase`	A vector of two integers or characters specifying color of complemented haplotypes for the positive and negative genotypic values respectively.
`colHaploBase`	A vector of integers or characters specifying color of original haplotypes for the positive and negative genotypic values respectively. The length of the vector should equal to the number of subpopulations.
`cexMax`	A numeric specifying the maximum point size of the plot.
`cexMin`	A numeric specifying the minimum point size of the plot.
`ggPlotNetwork`	If TRUE, the function will return the ggplot version of haplotype network. It offers the precise information on subgroups for each haplotype.
`cexMaxForGG`	A numeric specifying the maximum point size of the plot for the ggplot version of haplotype network, relative to the range of x and y-axes (0 < cexMaxForGG <= 1).
`cexMinForGG`	A numeric specifying the minimum point size of the plot for the ggplot version of haplotype network, relative to the range of x and y-axes (0 < cexMaxForGG <= 1).
`alphaBase`	alpha (parameter that indicates the opacity of a geom) for original haplotype with positive / negative effects. alpha for complemented haplotype will be same as the alpha for original haplotype with negative effects.
`verbose`	If this argument is TRUE, messages for the current steps will be shown.

Value

A list / lists of

$haplotypeInfo

A list of haplotype information with

$haploCluster: A vector indicating each individual belongs to which haplotypes.
$haploMat: A n x h matrix where n is the number of genotypes and h is the number of haplotypes.
$haploBlock: Marker genotype of haplotype block of interest for the representing haplotypes.

$subpopInfo

The information of subpopulations.

$pValChi2Test

A p-value of the chi-square test for the dependency between haplotypes & subpopulations. If 'chi2Test = FALSE', 'NA' will be returned.

$mstResults

A list of estimated results of MST / MSN / RMST:

$mstRes: Estimated results of MST / MSN / RMST for the data including original haplotypes.
$mstResComp: Estimated results of MST / MSN / RMST for the data including both original and complemented haplotype.

$distMats

A list of distance matrix:

$distMat: Distance matrix between haplotypes.
$distMatComp: Distance matrix between haplotypes (including unobserved ones).
$laplacianMat: Laplacian matrix between haplotypes (including unobserved ones).

$gvTotal

Estimated genotypic values by kernel regression for each haplotype.

$gvTotalForLine

Estimated genotypic values by kernel regression for each individual.

$minuslog10p

-log_{10}(p) for haplotype block of interest. p is the p-value for the siginifacance of the haplotype block effect.

$hOpts

Optimized hyper parameters, hOpt1 & hOpt2.

$EMMResults

A list of estimated results of kernel regression:

$EM3Res: Estimated results of kernel regression for the estimation of haplotype effects. (1st step)
$EMMRes: Estimated results of kernel regression for the estimation of haplotype effects of nodes. (2nd step)
$EMM0Res: Estimated results of kernel regression for the null model.

$clusterNosForHaplotype