panMatrix: Computing the pan-matrix for a set of gene clusters

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/panmat.R

Description

A pan-matrix has one row for each genome and one column for each gene cluster, and cell [i,j] indicates how many members genome i has in gene family j.

Usage

1
panMatrix(clustering)

Arguments

clustering

A named vector of integers.

Details

The pan-matrix is a central data structure for pan-genomic analysis. It is a matrix with one row for each genome in the study, and one column for each gene cluster. Cell [i,j] contains an integer indicating how many members genome i has in cluster j.

The input clustering must be a named integer vector with one element for each sequence in the study, typically produced by either bClust or dClust. The name of each element is a text identifying every sequence. The value of each element indicates the cluster, i.e. those sequences with identical values are in the same cluster. IMPORTANT: The name of each sequence must contain the genome_id for each genome, i.e. they must of the form GID111_seq1, GID111_seq2,... where the GIDxxx part indicates which genome the sequence belongs to. See panPrep for details.

The rows of the pan-matrix is named by the genome_id for every genome. The columns are just named Cluster_x where x is an integer copied from clustering.

Value

An integer matrix with a row for each genome and a column for each sequence cluster. The input vector clustering is attached as the attribute clustering.

Author(s)

Lars Snipen and Kristian Hovde Liland.

See Also

bClust, dClust, distManhattan, distJaccard, fluidity, chao, binomixEstimate, heaps, rarefaction.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Loading clustering data in this package
data(xmpl.bclst)

# Pan-matrix based on the clustering
panmat <- panMatrix(xmpl.bclst)

## Not run: 
# Plotting cluster distribution
library(ggplot2)
tibble(Clusters = as.integer(table(factor(colSums(panmat > 0), levels = 1:nrow(panmat)))),
       Genomes = 1:nrow(panmat)) %>% 
ggplot(aes(x = Genomes, y = Clusters)) +
geom_col()

## End(Not run)

micropan documentation built on July 15, 2020, 5:08 p.m.