knitr::opts_chunk$set(echo = TRUE)
If you want a very quick look at pagoo
and start playing with pangenome objects, this is a short tutorial to show the concept. Let's start by loading a Campylobacter spp. dataset, included in the package.
library(pagoo, quietly = TRUE, warn.conflicts = FALSE) # Load package rds <- system.file('extdata', 'campylobacter.RDS', package = 'pagoo') campy <- load_pangenomeRDS(rds) # Load pangenome
Now that the object (campy
) is loaded, we can start by querying it. pagoo
was developed considering that in a pangenome each individual gene belongs to a given organism, and is assigned to a cluster of orthologous. So those variables are interconnected, but each of them can have metadata associated that is specific to each of them, i.e.: an individual gene can have coordinates inside a genome, but this doesn't apply to a whole cluster, and a given organism has, for instance, a host where it was isolated from, but this information doesn't apply to an individual gene.
So this 3 variables are 3 separate tables that can be queried:
campy$organisms
(Tip: To see all fields and methods, in any R console type campy$
and press the [TAB] key two times.)
This dataset consist in 7 Campylobacter spp genomes. For each organism, you have a row with associated metadata. The first column, org
, indicates the organism.
campy$clusters
The $clusters
field returns a table with metadata associated to each group of orthologous, in this case is the Pfam architecture domain (second column).
The last, and most important field is $genes
, which returns a list of DataFrame
with information given for each individual gene, grouped by cluster. We let the user to inspect this field by him/herself.
campy$genes
The first 3 columns (cluster
, org
, and gene
) are the glue that interconnects each of 3 "variables".
Another useful field is $pan_matrix
, which returns a matrix
with gene abundance for each cluster (columns), and each organism (rows).
pagoo
objects contain basic methods to analyze the pangenome, from general statistics to some basic plotting capabilities. Some of these methods can also take arguments.
For example:
campy$dist(method = "bray")
Or:
campy$gg_barplot()
One of the main advantages of using pagoo
is the ability to very easily manipulate sequences. Sequences are stored as a List
of DNAStringSet
from Biostrings
package.
campy$sequences
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.