Aligning reads with Rhisat2

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(width=100)

The r Biocpkg("Rhisat2") package provides an interface to the HISAT2 short-read aligner software [@Kim2015-mu].

Installation

Use the BiocManager package to download and install the package from Bioconductor as follows:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("Rhisat2")

If required, the latest development version of the package can also be installed from GitHub.

BiocManager::install("fmicompbio/Rhisat2")

Once the package is installed, load it into your R session:

library(Rhisat2)

Building a genome index

To build an index for the alignment, use the hisat2_build function. You need to provide one or more fasta file with reference sequences, as well as an output directory where the index will be stored, and a "prefix" (that will determine the name of the index files in the output directory). Any additional arguments to the hisat2-build binary can also be supplied to the function (see hisat2_build_usage() or https://ccb.jhu.edu/software/hisat2/manual.shtml#the-hisat2-build-indexer for an overview of the available options).

The package contains three example reference sequences, corresponding to short pieces of three different chromosomes. We show how to create an index (named myindex) based on these sequences.

list.files(system.file("extdata/refs", package="Rhisat2"), pattern="\\.fa$")
refs <- list.files(system.file("extdata/refs", package="Rhisat2"), 
                   full.names=TRUE, pattern="\\.fa$")
td <- tempdir()
hisat2_build(references=refs, outdir=td, prefix="myindex", 
             force=TRUE, strict=TRUE, execute=TRUE)

By instead setting execute=FALSE in the command above, hisat2_build() will return the index building shell command for inspection, without executing it.

print(hisat2_build(references=refs, outdir=td, prefix="myindex",
                   force=TRUE, strict=TRUE, execute=FALSE))

Aligning reads to the genome index

After creating the index, reads can be aligned using the hisat2 wrapper function. Most commonly, the reads will be provided in fastq files (one file for single-end reads, two files for paired-end reads). The names of these files can be provided directly to the hisat2 function, either as a vector (for single-end reads) or as a list of length 2 (for paired-end reads, each list element corresponds to one mate). You also need to provide the path to the index generated by hisat2_build (the output directory combined with the prefix), and the output file name where the alignments should be written.

list.files(system.file("extdata/reads", package="Rhisat2"), 
           pattern="\\.fastq$")
reads <- list.files(system.file("extdata/reads", package="Rhisat2"),
                    pattern="\\.fastq$", full.names=TRUE)
hisat2(sequences=as.list(reads), index=file.path(td, "myindex"), 
       type="paired", outfile=file.path(td, "out.sam"), 
       force=TRUE, strict=TRUE, execute=TRUE)

As for hisat2_build(), any additional arguments to the hisat2 binary can be provided to the hisat2() function (see hisat2_usage() or https://ccb.jhu.edu/software/hisat2/manual.shtml#running-hisat2 for an overview of the available options).

With HISAT2, you can provide a file with known splice sites at the alignment step, which can help in finding the correct alignments across known splice junctions. A text file in the required format can be generated using the extract_splice_sites() function, starting from an annotation file in gtf or gff3 format, or from a GRanges or TxDb object.

spsfile <- tempfile()
gtf <- system.file("extdata/refs/genes.gtf", package="Rhisat2")
extract_splice_sites(features=gtf, outfile=spsfile)
hisat2(sequences=as.list(reads), index=file.path(td, "myindex"),
       type="paired", outfile=file.path(td, "out_sps.sam"),
       `known-splicesite-infile`=spsfile, 
       force=TRUE, strict=TRUE, execute=TRUE)

Miscellaneous helper functions

To get the version of HISAT2:

hisat2_version()

To see the usage of hisat2_build():

hisat2_build_usage()

And similarly to see the usage of hisat2():

hisat2_usage()

Session info

sessionInfo()

References



Try the Rhisat2 package in your browser

Any scripts or data that you put into this service are public.

Rhisat2 documentation built on Nov. 8, 2020, 5:49 p.m.