LDnaRaw: Produces necessary data for LDna from a matrix of pairwise LD...
In petrikemppainen/LDna: Linkage disequilibrum network analysis (LDna)

LDnaRaw

R Documentation

Produces necessary data for LDna from a matrix of pairwise LD values.

Description

Takes a lower diagonal matrix of pairwise LD values and produces data files for subsequent LD network analyses. This version works in conjunction with extractBranches.

Usage

LDnaRaw(
  LDmat,
  digits = 2,
  method = "single",
  mc.cores = NULL,
  length_out = NULL,
  fun = function(x) {
     min(x, na.rm = TRUE)
 }
)

Arguments

`LDmat`	Lower diagonal matrix of pairwise LD values, `R^2` is strongly recommended. Must include row and column names.
`digits`	Number of digits for rounding r2 values.
`method`	Specifies clustering medhod (see `hclust`), defaults to 'single'.
`mc.cores`	Specifies number of cores for `mclapply`. `NULL` (default) uses `lapply`.
`length_out`	Specifies the number of maximum number of possible node depths. Overrides `digits`, further details are given below.
`fun`	Only used internally

Details

In LDna clusters represent loci (vertices) connected by LD values (edges) above given thresholds. From a lower diagonal matrix of LD values a a clustering algorithim (see hclust) is used to produce a tree that describes cluster merging with decreasing LD threshold. In these trees nodes represent clusters/loci and node distance indicates LD threshold values at which these events occur. This version (in contrast to LDnaRaw) does not estimate lambda as it is intended to be used with extractBranches where |E|min (minimum number of edges) is the only parameter that defines LD-clusters (essentially determines how "thick" branches are).

The clustering method method can be specified. For 'strict' clusters use 'single' (e.g. for outlier or inversion detection), for chromosome binning, use 'average', 'complete' or 'median'. R2 values can be rounded to limit (digits) number of clustering events in large data sets, but too much leads to problems when recursively collapsing (too large) polytomies.

Value

A list with three objects: clusterfile, stats and a tree-file tree. Clusterfile is a matrix with all unique clusters as columns and loci as rows, where TRUE indicates the presence of a locus in a specific cluster (else FALSE is given). Stats is a data frame that gives all edges for the single linkage clustering tree along with the following information for each edge: cluster (focal cluster), parent cluster (cluster efter merger), and the corresponding nV (number of loci), nE(number of edges connecting the loci), min_LD (minimum LD among any two loci in the cluster), merge_at_below (the weakest link in the cluster before merger at lower thresholds), merge_at_above (the weakest link in the cluster just before merger spliting into to or more clusters, at higher thresholds). See example below

Author(s)

Petri Kemppainen petrikemppainen2@gmail.com, Christopher Knight Chris.Knight@manchester.ac.uk

Examples

# Simple upper diagonal LD matrix
LDmat <- structure(c(NA, 0.84, 0.64, 0.24, 0.2, 0.16, 0.44, 0.44, NA, NA, 0.8, 0.28, 0.4, 0.36, 0.36, 0.24, NA, NA, NA, 0.48, 0.32, 0.2, 0.36, 0.2, NA, NA, NA, NA, 0.76, 0.56, 0.6, 0.2, NA, NA, NA, NA, NA, 0.72, 0.68, 0.24, NA, NA, NA, NA, NA, NA, 0.44, 0.24, NA, NA, NA, NA, NA, NA, NA, 0.2, NA, NA, NA, NA, NA, NA, NA, NA), .Dim = c(8L, 8L), .Dimnames = list(c("L1", "L2", "L3", "L4", "L5", "L6", "L7", "L8"), c("L1", "L2", "L3", "L4", "L5", "L6", "L7", "L8")))
# Produce raw data
ldna <- LDnaRaw(LDmat)
ldna$clusterfile
ldna$stats


# Illustrating the use of merge_at_below and merge_at_above
clusters <- extractBranches(ldna, min.edges=0, merge.min=0.8)
clusters
cl_info <- ldna$stats[cluster==names(clusters)[2]] # focus on cluster 5_0.68
cl_info
abline(v=cl_info$merge_at_below, col="blue")
abline(v=cl_info$merge_at_above, col="green")


# using multiple cores for a larger data set
data(LDna)
ldna <- LDnaRaw(r2.baimaii_subs, mc.cores=4)
clusters <- extractBranches(ldna, min.edges=20, merge.min=0.8) ## higher threshold for cluster size
# proceed  to explore networks using function "plotLDnetwork"

petrikemppainen/LDna documentation built on April 14, 2024, 6:37 p.m.