View source: R/func__networkAnalyser__getClusterMemberCooccurrence.R
getClusterMemberCooccurrence | R Documentation |
This function returns a data frame given a data frame of cluster IDs and alleles. In each row, a string of strain names (comma-delimited) are returned if all of the alleles are co-occurring in these strains, otherwise, an NA is returned because alleles may differ in strains when only a fixed number (less than the number of all alleles) of alleles are co-occurring in the strains. Moreover, the function reports the minimal inclusive clade of strains and the frequency of co-occurrence events in this clade when all alleles are co-occurring.
Specifically, a cluster may refer to a community (also known as a module) or a clique in a network.
getClusterMemberCooccurrence( clstr, cluster.colname = "cluster", allele.colname = "allele", pam, clade.pam = NULL, clade.sizes = NULL, sample.dists = NULL, n.cores = -1 )
clstr |
a data frame consisting of at least two columns: one for cluster IDs and the other for allele names (comma-delimited). I recommend to use integers or characters as clster IDs. This argument can be generated by taking the columns Clique and Alleles in the output of the function summariseCliques. |
cluster.colname |
the column name for cluster IDs in clstr. |
allele.colname |
the column name for allele IDs in clstr. |
pam |
an allelic PAM with columns for allele names and rows for strain names. |
clade.pam |
(optional) a matrix for the presence/absence of samples in each clade of an input tree. It can be obtained using the function tree2Clades. |
clade.sizes |
(optional) A named vector of integers for the number of samples in each clade. It can be obtained from the element "sizes" in the outputs of the function tree2Clades. Optional. |
sample.dists |
(optional) A square matrix for distances between samples. It can be acquired through the function projectSamples. |
n.cores |
An integer determining the number of cores used for parallel computing. Valid values are the same as those for the function findPhysLink. |
A data frame of 12 columns. Explanations to some columns: Column 1, named by the argument cluster.colname, which contains cluster IDs; Column 2, named by the argument allele.colname, which stores alleles per cluster; size, number of alleles per cluster; co_max, maximum number of co-occurring alleles in strains; co_perc, co_max / size * 100
Clade-wise summaries are not returned or only a few summary columns are returned when the three optional arguments are incomplete (some may be NULL).
Yu Wan (wanyuac@126.com)
assoc <- findPhysLink(...) clusters <- ... # from a network package of your preference com.co <- getClusterMemberCooccurrence(com = clusters, pam = assoc[["alleles"]][["A"]], cluster.colname = "community", clade.pam = assoc[["struc"]][["clades"]][["pam"]], clade.sizes = assoc[["struc"]][["clades"]][["sizes"]], sample.dists = assoc[["struc"]][["C"]][["d"]], n.cores = 2) Dependency: data.table, parallel
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.