Quick Start
In pagoo: Analyze Bacterial Pangenomes in R with 'Pagoo'

knitr::opts_chunk$set(echo = TRUE)

If you want a very quick look at pagoo and start playing with pangenome objects, this is a short tutorial to show the concept. Let's start by loading a Campylobacter spp. dataset, included in the package.

library(pagoo, quietly = TRUE, warn.conflicts = FALSE) # Load package
rds <- system.file('extdata', 'campylobacter.RDS', package = 'pagoo')
campy <- load_pangenomeRDS(rds) # Load pangenome

Now that the object (campy) is loaded, we can start by querying it. pagoo was developed considering that in a pangenome each individual gene belongs to a given organism, and is assigned to a cluster of orthologous. So those variables are interconnected, but each of them can have metadata associated that is specific to each of them, i.e.: an individual gene can have coordinates inside a genome, but this doesn't apply to a whole cluster, and a given organism has, for instance, a host where it was isolated from, but this information doesn't apply to an individual gene.

Basic Fields

So this 3 variables are 3 separate tables that can be queried:

campy$organisms

(Tip: To see all fields and methods, in any R console type campy$ and press the [TAB] key two times.)

This dataset consist in 7 Campylobacter spp genomes. For each organism, you have a row with associated metadata. The first column, org, indicates the organism.

campy$clusters

The $clusters field returns a table with metadata associated to each group of orthologous, in this case is the Pfam architecture domain (second column).

The last, and most important field is $genes, which returns a list of DataFrame with information given for each individual gene, grouped by cluster. We let the user to inspect this field by him/herself.

campy$genes

The first 3 columns (cluster, org, and gene) are the glue that interconnects each of 3 "variables".

Another useful field is $pan_matrix, which returns a matrix with gene abundance for each cluster (columns), and each organism (rows).

Basic Methods

pagoo objects contain basic methods to analyze the pangenome, from general statistics to some basic plotting capabilities. Some of these methods can also take arguments.

For example:

campy$dist(method = "bray")

Or:

campy$gg_barplot()

Sequence Manipulation

One of the main advantages of using pagoo is the ability to very easily manipulate sequences. Sequences are stored as a List of DNAStringSet from Biostrings package.