summarisePhysDistForClusters: Summarising physical distances involving clusters of alleles

View source: R/func__networkAnalyser__summarisePhysDistForClusters.R

summarisePhysDistForClustersR Documentation

Summarising physical distances involving clusters of alleles

Description

This function is used for providing evidence for the presence of allele clusters as physical clusters in bacterial strains where all of these alleles are co-occurring. It takes as input a data frame of physical distance measurements and pulls out the distances corresponding to edges within each allele cluster. It then summarises these distances for every strain where these distances are acquired as well as for every cluster.

Usage

summarisePhysDistForClusters(
  cls.distr,
  cls.col = 1,
  allele.col = 2,
  cls,
  ds,
  bidirectional = TRUE,
  sample.dists = NULL,
  clade.pam = NULL,
  clade.sizes = NULL
)

Arguments

cls.distr

A data frame produced by the function getClusterMemberCooccurrence for cluster distributions. It shows strains where the most number of alleles are consistently co-occurring.

cls.col

The name or index for the column of cluster IDs in cls.distr. Default: 1 (the first column).

allele.col

The name or index for the column of allele IDs in cls.distr. Default: 2 (the second column).

cls

A GraphSet-class object produced by the function extractSubgraphs, which lists edges per cluster.

ds

A data frame of physical distances measured in all strains. It can be obtained from the data frame "ds" in the output list of the function findPhysLink. The distances may be pre-filterred for a maximal number of nodes or a maximal distance.

bidirectional

A logical value specifying whether there are always two edges of opposing directions connecting a pair of vertices. Default: TRUE.

sample.dists

(optional) A square matrix for distances between samples. It can be acquired through the function projectSamples.

clade.pam

(optional) a matrix for the presence/absence of samples in each clade of an input tree. It can be obtained using the function tree2Clades of phylix.

clade.sizes

(optional) A named vector of integers for the number of samples in each clade. It can be obtained from the element "sizes" in the outputs of the function tree2Clades. Optional.

Note

All summaries and expected numbers are based on alleles that are actually co-occurring in a set of strains. For instance, outputs only refer to five alleles if five out of six alleles in a cluster are consistently co-occurring in corresponding strains. As such, explanations to the results should be related to alleles in the 'alleles_max_co' element of the data frame 'cls.distr'.

Author(s)

Yu Wan (wanyuac@126.com)

Examples

assoc <- findPhysLink(...)
clusters <- ...  # from a network package of your preference
com <- getClusterMemberCooccurrence(com = clusters, pam = assoc[["alleles"]][["A"]], cluster.colname = "community",
clade.pam = assoc[["struc"]][["clades"]][["pam"]], clade.sizes = assoc[["struc"]][["clades"]][["sizes"]],
sample.dists = assoc[["struc"]][["C"]][["d"]], n.cores = 4)
g_com <- extractSubgraphs(V = network[["V"]], E = network[["E"]], clusters = edges)  # cf. the documentation of this function
ds <- subset(assoc[["ds"]], node_number <= 3 & distance <= 2.8e6)
ds.com <- summarisePhysDistForClusters(cls.distr = com, cls.col = "community", cls.edges = g_com, ds = ds)

Dependency: data.table


wanyuac/GeneMates documentation built on Aug. 12, 2022, 7:37 a.m.