PgR6M | R Documentation |
PgR6 with Methods. Final users should use pagoo
instead of this, since is more easy to understand.
Inherits: PgR6
pagoo::PgR6
-> PgR6M
new()
Create a PgR6M
object.
PgR6M$new( data, org_meta, cluster_meta, core_level = 95, sep = "__", verbose = TRUE, DF, group_meta )
data
A data.frame
or DataFrame
containing at least the
following columns: gene
(gene name), org
(organism name to which the gene belongs to),
and cluster
(group of orthologous to which the gene belongs to). More columns can be added as metadata
for each gene.
org_meta
(optional) A data.frame
or DataFrame
containing additional metadata for organisms. This data.frame
must have a column named "org" with
valid organisms names (that is, they should match with those provided in data
, column org
), and
additional columns will be used as metadata. Each row should correspond to each organism.
cluster_meta
(optional) A data.frame
or DataFrame
containing additional metadata for clusters. This data.frame
must have a column named "cluster" with
valid organisms names (that is, they should match with those provided in data
, column cluster
), and
additional columns will be used as metadata. Each row should correspond to each cluster.
core_level
The initial core_level (that's the percentage of organisms a core cluster must be in to be
considered as part of the core genome). Must be a number between 100 and 85, (default: 95). You can change it
later by using the $core_level
field once the object was created.
sep
A separator. By default is '__'(two underscores). It will be used to
create a unique gid
(gene identifier) for each gene. gid
s are created by pasting
org
to gene
, separated by sep
.
verbose
logical
. Whether to display progress messages when loading class.
DF
Deprecated. Use data
instead.
group_meta
Deprecated. Use cluster_meta
instead.
An R6 object of class PgR6M. It contains basic fields and methods for analyzing a pangenome. It also contains additional statistical methods for analyze it, and methods to make basic exploratory plots.
rarefact()
Rarefact pangenome or corgenome. Compute the number of genes which belong to the pangenome or to the coregenome, for a number of random permutations of increasingly bigger sample of genomes.
PgR6M$rarefact(what = "pangenome", n.perm = 10)
what
One of "pangenome"
or "coregenome"
.
n.perm
The number of permutations to compute (default: 10).
A matrix
, rows are the number of genomes added, columns are
permutations, and the cell number is the number of genes in each category.
dist()
Compute distance between all pairs of genomes. The default dist method is
"bray"
(Bray-Curtis distance). Another used distance method is "jaccard"
,
but you should set binary = FALSE
(see below) to obtain a meaningful result.
See vegdist
for details, this is just a wrapper function.
PgR6M$dist( method = "bray", binary = FALSE, diag = FALSE, upper = FALSE, na.rm = FALSE, ... )
method
The distance method to use. See vegdist for available methods, and details for each one.
binary
Transform abundance matrix into a presence/absence matrix before computing distance.
diag
Compute diagonals.
upper
Return only the upper diagonal.
na.rm
Pairwise deletion of missing observations when computing dissimilarities.
...
Other parameters. See vegdist for details.
A dist
object containing all pairwise dissimilarities between genomes.
pan_pca()
Performs a principal components analysis on the panmatrix
PgR6M$pan_pca(center = TRUE, scale. = FALSE, ...)
center
a logical value indicating whether the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of x can be supplied. The value is passed to scale.
scale.
a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is TRUE.
...
Other arguments. See prcomp
Returns a list with class "prcomp". See prcomp for more information.
pg_power_law_fit()
Fits a power law curve for the pangenome rarefaction simulation.
PgR6M$pg_power_law_fit(raref, ...)
raref
(Optional) A rarefaction matrix, as returned by rarefact()
.
...
Further arguments to be passed to rarefact()
. If raref
is missing, it will be computed with default arguments, or with the ones provided here.
A list
of two elements: $formula
with a fitted function, and $params
with fitted parameters. An attribute "alpha"
is also returned (If
alpha>1
, then the pangenome is closed, otherwise is open.
cg_exp_decay_fit()
Fits an exponential decay curve for the coregenome rarefaction simulation.
PgR6M$cg_exp_decay_fit(raref, pcounts = 10, ...)
raref
(Optional) A rarefaction matrix, as returned by rarefact()
.
pcounts
An integer of pseudo-counts. This is used to better fit the function
at small numbers, as the linearization method requires to subtract a constant C, which is the
coregenome size, from y
. As y
becomes closer to the coregenome size, this operation
tends to 0, and its logarithm goes crazy. By default pcounts=10
.
...
Further arguments to be passed to rarefact()
. If raref
is missing, it will be computed with default arguments, or with the ones provided here.
A list
of two elements: $formula
with a fitted function, and $params
with fitted intercept and decay parameters.
gg_barplot()
Plot a barplot with the frequency of genes within the total number of genomes.
PgR6M$gg_barplot()
A barplot, and a gg
object (ggplot2
package) invisibly.
gg_binmap()
Plot a pangenome binary map representing the presence/absence of each gene within each organism.
PgR6M$gg_binmap()
A binary map (ggplot2::geom_raster()
), and a gg
object (ggplot2
package) invisibly.
gg_dist()
Plot a heatmap showing the computed distance between all pairs of organisms.
PgR6M$gg_dist(method = "bray", ...)
method
Distance method. One of "Jaccard" (default), or "Manhattan", see above.
...
More arguments to be passed to distManhattan
.
A heatmap (ggplot2::geom_tile()
), and a gg
object (ggplot2
package) invisibly.
gg_pca()
Plot a scatter plot of a Principal Components Analysis.
PgR6M$gg_pca(colour = NULL, ...)
colour
The name of the column in $organisms
field from which points will take
colour (if provided). NULL
(default) renders black points.
...
More arguments to be passed to ggplot2::autoplot()
.
A scatter plot (ggplot2::autoplot()
), and a gg
object (ggplot2
package) invisibly.
gg_pie()
Plot a pie chart showing the number of clusters of each pangenome category: core, shell, or cloud.
PgR6M$gg_pie()
A pie chart (ggplot2::geom_bar() + coord_polar()
), and a gg
object
(ggplot2
package) invisibly.
gg_curves()
Plot pangenome and/or coregenome curves with the fitted functions returned by pg_power_law_fit()
and cg_exp_decay_fit()
. You can add points by adding + geom_points()
, of ggplot2 package
PgR6M$gg_curves(what = c("pangenome", "coregenome"), ...)
what
One of "pangenome"
or "coregenome"
.
...
????
A scatter plot, and a gg
object (ggplot2
package) invisibly.
runShinyApp()
Launch an interactive shiny app. It contains a sidebar with controls and switches to interact with the pagoo object. You can drop/recover organisms from the dataset, modify the core_level, visualize statistics, plots, and browse cluster and gene information. In the main body, it contains 2 tabs to switch between summary statistics plots and core genome information on one side, and accessory genome plots and information on the other.
The lower part of each tab contains two tables, side by side. On the "Summary" tab, the left one contain information about core clusters, with one cluster per row. When one of them is selected (click), the one on the right is updated to show information about its genes (if provided), one gene per row. On the "Accessory" tab, a similar configuration is shown, but on this case only accessory clusters/genes are displayed. There is a slider on the sidebar where one can select the accessory frequency range to display.
Give it a try!
Take into account that big pangenomes can slow down the performance of the app. More than 50-70 organisms often leads to a delay in the update of the plots/tables.
PgR6M$runShinyApp()
clone()
The objects of this class are cloneable with this method.
PgR6M$clone(deep = FALSE)
deep
Whether to make a deep clone.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.