summariseDistsForEdges: Summarise allelic physical distances for edges in an...

View source: R/func__distanceAnalyser__summariseDistsForEdges.R

summariseDistsForEdgesR Documentation

Summarise allelic physical distances for edges in an association network

Description

This function summarises allelic physical distances for a table of edges. The result shows how does the incorporation of physical distances affect the edge weights (weighted distance score: w_d).

Usage

summariseDistsForEdges(
  E,
  ds,
  d.max = 250000,
  n.max = 2,
  source.graph = "graph",
  source.contig = "contig",
  source.complete = "complete",
  sort.output = TRUE
)

Arguments

E

An edge list having columns in the order: allele_1, allele_2, co-occurrence count and the distance score s_d. An additional pair column can be included.

ds

A data frame of allelic physical distances imported from the output of the pipeline physDist.

d.max

Maximum distance to be considered as accruate.

n.max

Maximum node number for distances that are considered as accruate.

source.graph

Name for assembly graphs as a source of distance measurements.

source.contig

Name for contigs as a source of distance measurements.

source.complete

Name for finished-grade genomes as a source of distance measurements.

sort.output

Keep it TRUE to enable sorting of the output data frame in a descending order of the measurability (Mr) and count of reliable distances.

Value

A data frame of the following columns: Allele_1, Allele_2: names of associated alleles, ordered alphabetically; S_d, Co: distance score s_d and co-occurrence count; M: overall measurability of physical distances based on the co-occurrence count; Mr: measurability of reliable distances; N: overall count of physical distances; Nr: number of all reliable distances; Ng: overall number of distances from assembly graphs; Ng_r: number of reliable distances from assembly graphs; Nc_r: number of reliable distances from contigs; Nf: number of distances from finished-grade genomes.

Note

Since the distance measurements may be prioritised according to their sources, Ng and Ng_r may not be accurate when Nc_r or Nf > 0; Nc_r may not be accurate when Nf > 0.

Author(s)

Yu Wan, wanyuac@126.com

Examples

  assoc_lmm <- findPhysLink(...)
  a_lmm_dif <- subset(assoc_lmm$assoc, beta > 0 & p_adj <= 0.05)
  ds_stats <- summariseDistsForEdges(E = a_lmm_dif[, c("pair", "y", "x", "n_xy", "s_d")],
                                     ds = assoc_lmm$ds, d.max = 250e3, n.max = 2,
                                     source.graph = "graph", source.contig = "contig",
                                     source.complete = NA, sort.output = TRUE)
  ds_stats <- ds_stats[, c("Allele_1", "Allele_2", "Co", "S_d", "M", "Mr", "N", "Nr", "Nc_r")]  # For prioritised distances, Nc_r and Ng are mutually exclusive.



wanyuac/GeneMates documentation built on Aug. 12, 2022, 7:37 a.m.