run_samsort: Wrapper scripts for SAMtools functions

Description Usage Arguments Value Examples

Description

run_samsort sorts alignment (SAM/BAM/CRAM) files either by position or read name using samtools sort.

run_samindex indexes sorted BAM files using samtools index.

run_samview outputs all alignments matching the flag and region filters specified in either SAM or BAM format using samtools view.

run_samflagstat uses samtools flagstat to calculate and print statistics from a BAM file. It provides counts for 13 categories of reads: total, secondary, supplementary, duplicates, mapped, paired in sequencing, read1, read2, properly paired, with itself and its mate mapped, singletons, mate mapped to a different chromosome, with mate mapped to a different chromosome with MAPQ>5.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
run_samsort(file = NULL, samtools = "samtools", outformat = "BAM",
  threads = 1, memory = "768M", sortbyname = FALSE, suffix = "",
  keep = TRUE)

run_samindex(samtools = "samtools", bamfile = NULL, threads = 1)

run_samview(samtools = "samtools", file = NULL, regions = NULL,
  chrom.sizes = NULL, include.flag = NULL, exclude.flag = NULL,
  minQual = NULL, outformat = NULL, outname = NULL,
  include.header = FALSE, count = FALSE, threads = 1,
  subsample = NULL, keep.paired = TRUE, keep.proper.pair = TRUE,
  remove.unmapped = FALSE, remove.not.primary = FALSE,
  remove.duplicates = FALSE, remove.supplementary.alignment = FALSE,
  remove.mitochondrial = NULL)

run_samflagstat(bamfile = NULL, samtools = "samtools", threads = 1)

Arguments

file

A vector of characters specifying the path to the bam files.

samtools

The path to samtools (if not in executable path).

outformat

String specifying output format: ('SAM'/'BAM'/'CRAM'). For run_samview this should be SAM or 'BAM' only.

threads

A positive integer specifying the number of sorting and compression threads

memory

String specifying maximum memory per thread; suffix K/M/G recognized.

sortbyname

Boolean. If TRUE, reads are sorted by name. If FALSE, reads are sorted by chromosome/position.

suffix

Suffix to add to the basename of the file e.g, _psort (for position-sorted).

keep

Boolean. If TRUE, the input file is kept. If FALSE, the input file is deleted after a successful sort.

bamfile

Vector of characters specifying the path to sorted BAM files.

regions

Either path to a single BED file containing regions of interest or regions defined in the format "RNAME:START[-END]".

chrom.sizes

Path to chromosome size file. Required if converting to SAM format and including header. If not available, then one will be generated automatically from the existing header. This is a tab-delimited text file where each line contains the reference name in the first column and length in the second column.

include.flag

Integer value. Include output of alignments with any bits set in this value present in the FLAG field. See https:://broadinstitute.github.io.

exclude.flag

Integer value. Exclude output of alignments with any bits set in this value present in the FLAG field. See https:://broadinstitute.github.io.

minQual

Skip alignments with MAPQ smaller than this value.

outname

Name for output file.

include.header

Boolean. If TRUE header is included. This option is only necessary if outformat = "SAM" as header always included in "BAM".

count

Boolean. If true, only count alignments instead of printing.

subsample

Float value - if set then subsampling is performed. The integer part is used to seed the random number generator and the part after the decimal sets the fraction of reads to subsample.

keep.paired

Boolean. If TRUE, keep paired reads only.

keep.proper.pair

Boolean. If TRUE, keep concordantly paired reads only.

remove.unmapped

Boolean. If TRUE, remove unmapped reads.

remove.not.primary

Boolean. If TRUE, remove reads mapped as secondary alignments.

remove.duplicates

Boolean. If TRUE, remove reads marked as optical or PCR duplicates.

remove.supplementary.alignment

Boolean. If TRUE, remove reads marked as supplementary alignment.

remove.mitochondrial

Character string. If set, this will remove reads mapping to the mitochondrial genome. The string should match the reference name for the mitochindrial genome in the alignment file. Examples include "ChrM", "M" and "MT".

Value

run_samsort returns a sorted alignment file.

run_samindex returns a BAI-format index file for sorted BAM files.

run_samview returns aligned reads according to the regions and filters specified in BAM or SAM format.

run_samflagstat returns a dataframe of alignment statistics.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## Not run: 
run_samsort(file = "HB1_sample.sam", samtools = "samtools",
            outformat = "BAM", threads = (parallel::detectCores() - 1),
            memory = "1G", sortbyname = FALSE, suffix = "", keep = TRUE)


## End(Not run)

## Not run: 
run_samindex(samtools = "samtools", bamfile = "HB1_sample.bam",
threads = (parallel::detectCores() - 1))

## End(Not run)

## Not run: 
run_samview(samtools = "samtools", file = "HB1_sample.bam", regions = NULL,
chrom.sizes = NULL, include.flag = NULL, exclude.flag = NULL,
minQual = NULL, outformat = "BAM", outname = NULL, include.header = FALSE,
count = FALSE, threads = (parallel::detectCores() - 1), subsample = NULL,
keep.paired = TRUE, keep.proper.pair = TRUE, remove.unmapped = TRUE,
remove.not.primary = TRUE, remove.duplicates = TRUE,
remove.supplementary.alignment = TRUE, remove.mitochondrial = "ChrM")

## End(Not run)

## Not run: 
run_samflagstat(samtools = "samtools", threads = parallel::detectCores(),
                bamfile = "HB1_sample.bam")

## End(Not run)

anilchalisey/chompR documentation built on May 9, 2019, 3:59 a.m.