run_sambambadup: Mark duplicates in BAM file

Description Usage Arguments Value Examples

View source: R/run_sambamba.R

Description

Wrapper script to mark duplicates and optionally remove them in a BAM file using Sambamba.

Usage

1
2
3
run_sambambadup(sambamba = "sambamba", bamfile = NULL, outfile = NULL,
  remove = FALSE, threads = 1, hash_table = 262144,
  overflow_size = 2e+05, io_buffer = 128)

Arguments

sambamba

Path to Sambamba.

bamfile

Vector of characters specifying path to BAM files.

outfile

Name of output file. If left as NULL, the suffix _markdup or _dedup will be appended to the input name to indicate marking only or removal of duplicates.

remove

Boolean. If TRUE, duplicate reads are removed.

threads

Number of threads to use.

hash_table

Size of hash table for finding read pairs (default is 262144 reads); will be rounded down to the nearest power of two. For best performance should be > (average coverage) * (insert size).

overflow_size

Size of the overflow list where reads, thrown out of the hash table, get a second chance to meet their pairs (default is 200000 reads); increasing the size reduces the number of temporary files created.

io_buffer

Controls sizes of the two buffers (in MB) used for reading and writing BAM during the second pass (default is 128).

Value

A BAM file in which duplicate reads have been marked or removed.

Examples

1
2
3
4
5
6
7
## Not run: 
run_sambambadup(sambamba = "sambamba", bamfile = "HB1_sample.bam",
                outfile = "HB1_sample_markdup.bam", remove = FALSE,
                threads = (parallel::detectCores() - 1),
                hash_table = 1000000, overflow_size = 1000000)

## End(Not run)

anilchalisey/parseR documentation built on May 7, 2019, 7:45 a.m.