knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
Read and analyse RepeatMasker output in R.
Very early in development!
library(devtools) install_github("dwinter/repeatR")
The package comes with a small example dataset, including the repeats from one
scaffold in the kākāpō assembly.
We can read this file in memory using read_rm
library(repeatR) # create a file path relative to the installed package, this step is not # necessary for normal usage rm_file <- system.file("extdata", "kakapo.out", package="repeatR") kakapo <- read_rm(rm_file) kakapo
As you can see, the function reads tdata and returns a data.frame
with the
alignment information from RepeatMasker.We can now quickly look at the
composition of the repeats alignments on this scaffold:
library(ggplot2) ggplot(kakapo, aes(tclass)) + geom_bar() + coord_flip() + theme_bw(base_size=14)
It is important to note, however, that the alignment between a reference genome
and a given repeat element might be broken up over multiple rows in RepeatMakser
output. This occurs when elements are nested within each other (a pattern that
is very common for some elements in some species). repeatR
provides a the
function summarise_rm_ID
to produce a new table with one row per unique
element in the genome.
kakapo_aggregated <- summarise_rm_ID(kakapo) head(kakapo_aggregated)
With this data, we can start to analyse the total amount of the scaffold covered by elements of different classes
ggplot(kakapo_aggregated, aes(qlen, tclass)) + geom_col() + theme_bw(base_size=14) + scale_x_continuous(labels=Mb_lab)
Quite often, you will want to remove some fo the sequences that are included in
the output file. For instance, simple repeats and low complexity regions. The
function filter_by_tclass
will remove thise sequences along with functional
RNAs and ARTEFACT
sequences.
kakapo_just_TEs <- filter_by_tclass(kakapo_aggregated) table(kakapo_just_TEs$tclass)
Or the distrbution of the p_sub
statistic (the proportion of bases that
different from the consensus element). The function make_TE_pallete
includes a
pre-defined pallete for the tclass
column.
ggplot(kakapo_just_TEs, aes(p_sub, fill=tclass)) + geom_histogram(colour="black") + scale_fill_manual(values=make_TE_pallete(kakapo_aggregated)) + theme_bw(base_size=14)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.