Description Usage Arguments Details Value Author(s) Examples
These function tally a set of bam files in blocks spanning a specified region and write the results to an HDF5 tally file; uses BatchJobs
for parallel computation on HPCs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | batchTallyParam(
bamFiles,
destination,
group,
chrom, start, stop,
blocksize = 100000,
registryDir = tempdir(),
resources = list("queue" = "research-rh6", "memory"="4000", "ncpus"="4", walltime="90:00"),
q=25, ncycles = 0, max.depth=1000000,
reference = NULL,
sleep = 5
)
batchTallies( confList = batchTallyParam() )
rerunBatchTallies( confList, tryCollect = TRUE )
collectTallies(blocks, confList, registries )
|
bamFiles |
A character vector of filenames of the bam files that should be tallies. Note that for writing to an HDF5 file the order of this vector must match the order of the Column field in the sampledata object that corresponds to the dataset - see |
reference |
A DNAString object containing the reference sequence corresponding to the region that is to be tallied – if this is |
destination |
Filename of the HDF5 tally file that will be written to – this needs to contain all the groups and datasets already – see |
group |
Location within the tally file where the data will be written – e.g. |
chrom |
Chromosome in which to tally |
start |
First position of the tally |
stop |
Last position of the tally |
q |
quality cut-off for considering a base call |
ncycles |
number of sequencing cycles form the front and back of the read that should be considered unreliable - used for stratifying the nucleotide counts |
max.depth |
only tally a position if there are less than this many reads overlapping it - can prevent long runtimes in unreliable regions |
blocksize |
Size of the blocks in bases that the tallying will be performed in, this influences the number of jobs send to the cluster |
registryDir |
Directory in which the registries created by |
resources |
A named list specifying the resource requirements of the cluster jobs, this must contain names for the fields specified in the cluster configuration file – see the documentation of |
confList |
A configuration list as returned by a call to |
sleep |
Number of seconds to sleep before checking if blocks are finshed, increase this if you have large blocks and find the output of |
tryCollect |
Boolean flag specifying whether the |
blocks |
|
registries |
A list mapping registry IDs to the work paths of the corresponding registries |
This is a wrapper function for applying tallyBAM
to a set of bam files specified in the bamFiles
argument. The order or samples along the sample dimension is the same as the order of the file names (i.e. the order of the bamfiles
argument). The function uses BatchJobs
to dispatch tallying in blocks along the genome to a HPC and collects the results and writes them into the HDF5 tally file specified in the destination
parameter.
rerunBatchTallies can be used to re-submit failed blocks.
collectTallies can be used to manually collect tally data from the registries created by batchTallies
[None] – prints progress messages along the way.
Paul Pyl
1 2 3 4 5 6 7 8 9 10 | ## Not run:
library(h5vc)
files <- c("NRAS.AML.bam","NRAS.Control.bam")
bamFiles <- file.path( system.file("extdata", package = "h5vcData"), files)
chrom = "1"
startpos <- 115247090
endpos <- 115259515
batchTallies( batchTallyParam(bamFiles, chrom, startpos, endpos) )
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.