Using minimapR

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

'minimapR' is a wrapper for 'Minimap2'. 'Minimap2' is a very valuable long read aligner for the Pacbio and Oxford Nanopore Technologies sequencing platforms. 'minimapR' is an R wrapper for 'minimap2' which was developed by Heng Li me@liheng.org. This tool aligns long reads to a reference genome and is used in many different bioinformatics workflows.

library(minimapR)

Installation

'minimap2' and 'samtools' must be installed on your system along with the R packages 'Rsamtools', 'git2r', and 'pafr'. 'minimap2' can be installed on various operating systems by running the following function or following the instruction from the output of the function. "/your/path/to/directory/for/install" should be replaced with the path to the directory where you want to install 'minimap2'.

minimap2_installation()

check if dependencies were successfully installed.

minimap2_check()
samtools_check()

Usage

Downloading data for example

We will download the reference genomes and the query sequences for the example. The reference genomes are the human genome (GRCh38) and the yeast genome (S288C). The query sequences are the human ONT reads and yeast HIFI reads.

tmp_folder <- tempdir()
cat("Temporary folder is:", tmp_folder, "\n")
href_url <- "https://github.com/jake-bioinfo/minimapR/raw/master/inst/extdata/GRCh38_chr1_130k.fa.gz"
hfq_url <- "https://github.com/jake-bioinfo/minimapR/raw/master/inst/extdata/ont_hs_sample.fastq.gz"
yref_url <- "https://github.com/jake-bioinfo/minimapR/raw/master/inst/extdata/S288C_ref_genome.fasta.gz"
yfq_url <- "https://github.com/jake-bioinfo/minimapR/raw/master/inst/extdata/yeast_sample_hifi.fastq.gz"
url_list <- c(href_url, hfq_url, yref_url, yfq_url)
lapply(url_list, function(x) download.file(x, destfile = file.path(tmp_folder, basename(x))))

# Contents of the temporary folder
cat("Contents of the temporary folder are:", "\n")
fa_list <- list.files(tmp_folder, pattern = ".fa", full.names = TRUE)
fa_list

Aligning reads to the reference genomes

We will align the human ONT reads to the human genome and the yeast HIFI reads to the yeast genome using 'minimap2'. The 'minimap2' function returns a data frame with the alignment information.

# Human ONT alignment
h_out <- file.path(tmp_folder, "ont_hs_sample")
h_alignment <- minimap2(reference = fa_list[1], 
                        query_sequences = fa_list[2], 
                        output_file_prefix = h_out, 
                        preset_string = "map-ont", 
                        threads = 4, 
                        return = TRUE, 
                        verbose = FALSE)

# Yeast HIFI alignment
y_out <- file.path(tmp_folder, "yeast_sample_hifi")
y_alignment <- minimap2(reference = fa_list[3], 
                        query_sequences = fa_list[4], 
                        output_file_prefix = y_out, 
                        preset_string = "map-hifi", 
                        threads = 4, 
                        return = TRUE, 
                        verbose = TRUE)

# Check the alignment
cat("Alignment for the human sample is:\n")
h_alignment[1:5, 1:7]
cat("Alignment for the yeast sample is:\n")
y_alignment[1:5, 1:7]

Clean up

# Clean up
unlink(tmp_folder, recursive = TRUE)
sessionInfo()


Try the minimapR package in your browser

Any scripts or data that you put into this service are public.

minimapR documentation built on April 12, 2025, 1:15 a.m.