run_sambambadup: Mark duplicates in BAM file
In anilchalisey/parseR: Pipeline for Analysis of RNA-seq in R

Description Usage Arguments Value Examples

Wrapper script to mark duplicates and optionally remove them in a BAM file using Sambamba.

1
2
3

run_sambambadup(sambamba = "sambamba", bamfile = NULL, outfile = NULL,
  remove = FALSE, threads = 1, hash_table = 262144,
  overflow_size = 2e+05, io_buffer = 128)

`sambamba`	Path to Sambamba.
`bamfile`	Vector of characters specifying path to BAM files.
`outfile`	Name of output file. If left as NULL, the suffix _markdup or _dedup will be appended to the input name to indicate marking only or removal of duplicates.
`remove`	Boolean. If TRUE, duplicate reads are removed.
`threads`	Number of threads to use.
`hash_table`	Size of hash table for finding read pairs (default is 262144 reads); will be rounded down to the nearest power of two. For best performance should be > (average coverage) * (insert size).
`overflow_size`	Size of the overflow list where reads, thrown out of the hash table, get a second chance to meet their pairs (default is 200000 reads); increasing the size reduces the number of temporary files created.
`io_buffer`	Controls sizes of the two buffers (in MB) used for reading and writing BAM during the second pass (default is 128).

A BAM file in which duplicate reads have been marked or removed.

## Not run: 
run_sambambadup(sambamba = "sambamba", bamfile = "HB1_sample.bam",
                outfile = "HB1_sample_markdup.bam", remove = FALSE,
                threads = (parallel::detectCores() - 1),
                hash_table = 1000000, overflow_size = 1000000)

## End(Not run)