Description Usage Arguments Details Value Author(s) Examples
View source: R/tally.in.ranges.r
Functions for tallying bam files in genomic intervals provided as GRanges
objects, special version of the function for direct writing or computation on a cluster exist.
1 2 3 | tallyRanges(bamfiles, ranges, reference, q = 25, ncycles = 10, max.depth = 1e+06)
tallyRangesToFile(tallyFile, study, bamfiles, ranges, reference, samples = NULL, q = 25, ncycles = 0, max.depth=1e6)
tallyRangesBatch(tallyFile, study, bamfiles, ranges, reference, q = 25, ncycles = 10, max.depth=1e6, regID = "Tally", res = list("ncpus" = 2, "memory" = 24000, "queue"="research-rh6"), written = c(), wrfile = "written.jobs.RDa", waitTime = Inf)
|
bamfiles |
Character vector giving the locations of the bam files to be tallied |
ranges |
A GRanges object describing the ranges that tallies shalle be generated in, e.g. the result of a call to |
reference |
|
samples |
The indices (within the HDF5 datasets) corresponding to the samples that the data represents. You can use this option to write sub-sets of samples from a cohort. |
q |
Read alignment quality cut-off. |
ncycles |
Number of cycles from the front and back of the reads that should be considered unreliable for mismatch detection |
max.depth |
Maximum depth of coverage to consider |
tallyFile |
Filename of the HDF5 tally file that the data shall be written to |
study |
The location within the HDF5 file that corresponds to the HDF5-group representing the study we are working on. |
regID |
Identifier for a |
res |
Resource list specifying the compute resources to be requested for each of the cluster jobs. |
written |
Numerical vector indicating the Job IDs of jobs whose results have already been written to the tally file, this can be used to resume writing after a crash. |
wrfile |
Filename for a file to store the IDs of already written jobs in, can be used to resume writing after a crash. |
waitTime |
How long shall the function wait on cluster jobst to finish, before giving up. Default is wait forever. |
tallyRanges
returns the tallies corresponding to the specifed ranges, tallyToFile
performs the same task but writes the results to the tally file directly. tallyRangesBatch
uses the BatchJobs
package to set up cluster jobs for tallying and collects and writes the results of those jobs to the tally file. It is important to have a properly configured cluster (inlcuding a .BatchJobs.R
as well as a template file). See the documentation of BatchJobs
for that information.
For tallyRanges
the return value is a list
of list
s, where the top level corresponds to the ranges provided as an input to the function and each element is a list of the datasets in compatible format, that can directly be written to an HDF5 file using the writeToTallyFile
function.
The other two function perform the writing directly and return
Paul Theodor Pyl
1 2 3 4 5 6 7 8 9 10 11 12 13 | suppressPackageStartupMessages(library("h5vc"))
suppressPackageStartupMessages(library("rhdf5"))
files <- list.files( system.file("extdata", package = "h5vcData"), "Pt.*bam$" )
bamFiles <- file.path( system.file("extdata", package = "h5vcData"), files)
suppressPackageStartupMessages(require(BSgenome.Hsapiens.NCBI.GRCh38))
suppressPackageStartupMessages(require(GenomicRanges))
dnmt3a <- read.table(system.file("extdata", "dnmt3a.txt", package = "h5vcData"), header=TRUE, stringsAsFactors = FALSE)
dnmt3a <- with( dnmt3a, GRanges(seqname, ranges = IRanges(start = start, end = end)))
dnmt3a <- reduce(dnmt3a)
require(BiocParallel)
register(MulticoreParam())
theData <- tallyRanges( bamFiles, ranges = dnmt3a[1:3], reference = Hsapiens )
str(theData)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.