View source: R/func__networkAnalyser__summarisePhysDistForClusters.R
summarisePhysDistForClusters | R Documentation |
This function is used for providing evidence for the presence of allele clusters as physical clusters in bacterial strains where all of these alleles are co-occurring. It takes as input a data frame of physical distance measurements and pulls out the distances corresponding to edges within each allele cluster. It then summarises these distances for every strain where these distances are acquired as well as for every cluster.
summarisePhysDistForClusters( cls.distr, cls.col = 1, allele.col = 2, cls, ds, bidirectional = TRUE, sample.dists = NULL, clade.pam = NULL, clade.sizes = NULL )
cls.distr |
A data frame produced by the function getClusterMemberCooccurrence for cluster distributions. It shows strains where the most number of alleles are consistently co-occurring. |
cls.col |
The name or index for the column of cluster IDs in cls.distr. Default: 1 (the first column). |
allele.col |
The name or index for the column of allele IDs in cls.distr. Default: 2 (the second column). |
cls |
A GraphSet-class object produced by the function extractSubgraphs, which lists edges per cluster. |
ds |
A data frame of physical distances measured in all strains. It can be obtained from the data frame "ds" in the output list of the function findPhysLink. The distances may be pre-filterred for a maximal number of nodes or a maximal distance. |
bidirectional |
A logical value specifying whether there are always two edges of opposing directions connecting a pair of vertices. Default: TRUE. |
sample.dists |
(optional) A square matrix for distances between samples. It can be acquired through the function projectSamples. |
clade.pam |
(optional) a matrix for the presence/absence of samples in each clade of an input tree. It can be obtained using the function tree2Clades of phylix. |
clade.sizes |
(optional) A named vector of integers for the number of samples in each clade. It can be obtained from the element "sizes" in the outputs of the function tree2Clades. Optional. |
All summaries and expected numbers are based on alleles that are actually co-occurring in a set of strains. For instance, outputs only refer to five alleles if five out of six alleles in a cluster are consistently co-occurring in corresponding strains. As such, explanations to the results should be related to alleles in the 'alleles_max_co' element of the data frame 'cls.distr'.
Yu Wan (wanyuac@126.com)
assoc <- findPhysLink(...) clusters <- ... # from a network package of your preference com <- getClusterMemberCooccurrence(com = clusters, pam = assoc[["alleles"]][["A"]], cluster.colname = "community", clade.pam = assoc[["struc"]][["clades"]][["pam"]], clade.sizes = assoc[["struc"]][["clades"]][["sizes"]], sample.dists = assoc[["struc"]][["C"]][["d"]], n.cores = 4) g_com <- extractSubgraphs(V = network[["V"]], E = network[["E"]], clusters = edges) # cf. the documentation of this function ds <- subset(assoc[["ds"]], node_number <= 3 & distance <= 2.8e6) ds.com <- summarisePhysDistForClusters(cls.distr = com, cls.col = "community", cls.edges = g_com, ds = ds) Dependency: data.table
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.