PgR6 | R Documentation |
A basic PgR6
class constructor. It contains basic fields
and subset functions to handle a pangenome. Final users should use pagoo
instead of this, since is more easy to understand.
pan_matrix
The panmatrix. Rows are organisms, and columns are groups of orthologous. Cells indicates the presence (>=1) or absence (0) of a given gene, in a given organism. Cells can have values greater than 1 if contain in-paralogs.
organisms
A DataFrame
with available
organism names, and organism number identifier as rownames()
. (Dropped
organisms will not be displayed in this field, see $dropped
below).
Additional metadata will be shown if provided, as additional columns.
genes
A SplitDataFrameList
object with
one entry per cluster. Each element contains a DataFrame
with gene ids (<gid>
) and additional metadata, if provided. gid
are
created by paste
ing organism and gene names, so duplication in gene names
are avoided.
clusters
A DataFrame
with the groups
of orthologous (clusters). Additional metadata will be shown as additional columns,
if provided before. Each row corresponds to each cluster.
core_level
The percentage of organisms a gene must be in
to be considered as part of the coregenome. core_level = 95
by default.
Can't be set above 100, and below 85 raises a warning.
core_genes
Like genes
, but only showing core genes.
core_clusters
Like $clusters
, but only showing core
clusters.
cloud_genes
Like genes
, but only showing cloud genes.
These are defined as those clusters which contain a single gene (singletons), plus
those which have more than one but its organisms are probably clonal due to identical
general gene content. Colloquially defined as strain-specific genes.
cloud_clusters
Like $clusters
, but only showing cloud
clusters as defined above.
shell_genes
Like genes
, but only showing shell genes.
These are defined as those clusters than don't belong neither to the core genome,
nor to cloud genome. Colloquially defined as genes that are present in some but not
all strains, and that aren't strain-specific.
shell_clusters
Like $clusters
, but only showing shell
clusters, as defined above.
summary_stats
A DataFrame
with
information about the number of core, shell, and cloud clusters, as well as the
total number of clusters.
random_seed
The last .Random.seed
. Used for
reproducibility purposes only.
dropped
A character
vector with dropped organism
names, and organism number identifier as names()
new()
A basic PgR6
class constructor. It contains basic fields
and subset functions to handle a pangenome.
PgR6$new( data, org_meta, cluster_meta, core_level = 95, sep = "__", verbose = TRUE, DF, group_meta )
data
A data.frame
or DataFrame
containing at least the
following columns: gene
(gene name), org
(organism name to which the gene belongs to),
and cluster
(group of orthologous to which the gene belongs to). More columns can be added as metadata
for each gene.
org_meta
(optional) A data.frame
or DataFrame
containing additional metadata for organisms. This data.frame
must have a column named "org" with
valid organisms names (that is, they should match with those provided in data
, column org
), and
additional columns will be used as metadata. Each row should correspond to each organism.
cluster_meta
(optional) A data.frame
or DataFrame
containing additional metadata for clusters. This data.frame
must have a column named "cluster" with
valid organisms names (that is, they should match with those provided in data
, column cluster
), and
additional columns will be used as metadata. Each row should correspond to each cluster.
core_level
The initial core_level (that's the percentage of organisms a core cluster must be in to be
considered as part of the core genome). Must be a number between 100 and 85, (default: 95). You can change it
later by using the $core_level
field once the object was created.
sep
A separator. By default is '__'(two underscores). It will be used to
create a unique gid
(gene identifier) for each gene. gid
s are created by pasting
org
to gene
, separated by sep
.
verbose
logical
. Whether to display progress messages when loading class.
DF
Deprecated. Use data
instead.
group_meta
Deprecated. Use cluster_meta
instead.
An R6 object of class PgR6. It contains basic fields and methods for analyzing a pangenome.
add_metadata()
Add metadata to the object. You can add metadata to each organism, to each
group of orthologous (cluster), or to each gene. Elements with missing data should be filled
by NA
(dimensions of the provided data.frame must be coherent with object
data).
PgR6$add_metadata(map = "org", data)
map
character
identifying the metadata to map. Can
be one of "org"
, "cluster"
, or "gid"
.
data
data.frame
or DataFrame
with the metadata to
add. For each case, a column named as "map"
must exists, which should
contain identifiers for each element. In the case of adding gene (gid
)
metadata,each gene should be referenced by the name of the organism and the name
of the gene as provided in the "data"
data.frame, separated by the
"sep"
argument.
self
invisibly, but with additional metadata.
drop()
Drop an organism from the dataset. This method allows to hide an organism
from the real dataset, ignoring it in downstream analyses. All the fields and
methods will behave as it doesn't exist. For instance, if you decide to drop
organism 1, the $pan_matrix
field (see below) would not show it when
called.
PgR6$drop(x)
x
character
or numeric
. The name of the
organism wanted to be dropped, or its numeric id as returned in
$organism
field (see below).
self
invisibly, but with x
dropped. It isn't necessary
to assign the function call to a new object, nor to re-write it as R6 objects
are mutable.
recover()
Recover a previously $drop()
ped organism (see above). All fields
and methods will start to behave considering this organism again.
PgR6$recover(x)
x
character
or numeric
. The name of the
organism wanted to be recover, or its numeric id as returned in
$dropped
field (see below).
self
invisibly, but with x
recovered. It isn't necessary
to assign the function call to a new object, nor to re-write it as R6 objects
are mutable.
write_pangenome()
Write the pangenome data as flat tables (text). Is not the most recommended way
to save a pangenome, since you can loose information as numeric precision,
column classes (factor, numeric, integer), and the state of the object itself
(i.e. dropped organisms, or core_level), loosing reproducibility. Use
$save_pangenomeRDS
for a more precise way of saving a pagoo object.
Still, it is useful if you want to work with the data outside R, just keep
the above in mind.
PgR6$write_pangenome(dir = "pangenome", force = FALSE)
dir
The non-existing directory name where to put the data files. Default is "pangenome".
force
logical
. Whether to overwrite the directory if it already
exists. Default: FALSE
.
A directory with at least 3 files. "data.tsv" contain the basic
pangenome data as it is provided to the data
argument in the
initialization method ($new(...)
). "clusters.tsv" contain any metadata
associated to the clusters. "organisms.tsv" contain any metadata associated to
the organisms. The latter 2 files will contain a single column if no metadata
was provided.
save_pangenomeRDS()
Save a pagoo pangenome object. This function provides a method for saving a pagoo
object and its state into a "RDS" file. To load the pangenome, use the
load_pangenomeRDS
function in this package. It *should* be compatible between
pagoo versions, so you could update pagoo and still recover the same pangenome. Even
sep
and core_level
are restored unless the user provides those
arguments in load_pangenomeRDS
. dropped
organisms also kept hidden, as
you where working with the original object.
PgR6$save_pangenomeRDS(file = "pangenome.rds")
file
The name of the file to save. Default: "pangenome.rds".
Writes a list with all the information needed to restore the object by using the load_pangenomeRDS function, into an RDS (binary) file.
clone()
The objects of this class are cloneable with this method.
PgR6$clone(deep = FALSE)
deep
Whether to make a deep clone.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.