Description Usage Arguments Details Value Author(s) Examples
This is the central function of the h5vc package, allows an apply operation along common dimensions of datasets in a tally file.
1 2 3 4 5 6 |
blocksize |
The size of the blocks in which to process the data (integer) |
... |
Further parameters to be handed over to |
range |
The range along the specified dimensions which should be
processed, this allows for limiting the apply to a specific region
or set of samples, etc. - optional (defaults to the whole chromosome); This can be a |
group |
The group (location) within the HDF5 file, note that when range is |
Additional function parameters are:
The name of a tally file to process
The name of a group in that tally file
The function to apply to each block, defaults to
function(x) x
, which returns the data as is (a list of
arrays)
The names of the datasets to extract,
e.g. c("Counts","Coverages")
- optional (defaults to all datasets)
The dimension to apply along for each dataset in the same
order as names
, these should correspond to compatible
dimensions between the datsets. - optional (defaults to the genomic position dimension)
Character vector of sample names - must match contents of sampleData stored in the tallyFile
A list mapping dataset names to their respective sample dimensions - default provides values for "Counts", "Coverages", "Deletions" and "Reference"
Boolean flag that controls the amount of messages being
printed by h5dapply
BPPARAM object to be passed to the bplapply
call used to apply FUN
to the blocks - see BiocParallel
documentation for details; if this is NULL
a normal lapply
will be used instead of bplapply
.
This function applys parameter FUN
to blocks along a specified
axis within the tally file, group and specified datasets. It creates a
list of arrays (one for each dataset) and processes that list with the
function FUN
.
This is by far the most essential and powerful function within this package since it allows the user to execute their own analysis functions on the tallies stored within the HDF5 tally file.
The supplied function FUN
must have a parameter data
or ...
(the former is the expected behaviour), which will be supplied to FUN
from h5dapply
for each block. This structure is a list
with one slot for each dataset specified in the names
argument to h5dapply
containing the array corresponding to the current block in the given dataset. Furthemore the slot h5dapplyInfo
is reserved and contains another list
with the following content:
Blockstart
is an integer specifying the starting position of the current block (in the dimension specified by the dims
argument to h5dapply
)
Blockend
is an integer specifying the end position of the current block (in the dimension specified by the dims
argument to h5dapply
)
Datasets
Contains a data.frame
as it is returned by h5ls
listing all datasets present in the other slots of data
with their group, name, dimensions, number of dimensions (DimCount
) and the dimension that is used for splitting into blocks (PosDim
)
Group
contains the name of the group as specified by the group
argument to h5dapply
A list with one entry per block, which is the result of applying
FUN
to the datasets specified in the parameter names
within the block.
Paul Pyl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | # loading library and example data
library(h5vc)
tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
# check the available samples and sampleData
print(sampleData)
data <- h5dapply( #extracting coverage using h5dapply
filename = tallyFile,
group = "/ExampleStudy/16",
blocksize = 1000,
FUN = function(x) rowSums(x$Coverages),
names = c( "Coverages" ),
range = c(29000000,29010000),
verbose = TRUE
)
coverages <- do.call( rbind, data )
colnames(coverages) <- sampleData$Sample[order(sampleData$Column)]
coverages
#Subsetting by Sample
sampleData <- sampleData[sampleData$Patient == "Patient5",]
data <- h5dapply( #extracting coverage using h5dapply
filename = tallyFile,
group = "/ExampleStudy/16",
blocksize = 1000,
FUN = function(x) rowSums(x$Coverages),
names = c( "Coverages" ),
range = c(29000000,29010000),
samples = sampleData$Sample,
verbose = TRUE
)
coverages <- do.call( rbind, data )
colnames(coverages) <- sampleData$Sample[order(sampleData$Column)]
coverages
#Using GRanges and IRanges
library(GenomicRanges)
library(IRanges)
granges <- GRanges(
c(rep("16", 10), rep("22", 10)),
ranges = IRanges(
start = c(seq(29000000,29009000, 1000), seq(39000000,39009000, 1000)),
width = 1000
))
data <- h5dapply( #extracting coverage using h5dapply
filename = tallyFile,
group = "/ExampleStudy",
blocksize = 1000,
FUN = function(x) rowSums(x$Coverages),
names = c( "Coverages" ),
range = granges,
verbose = TRUE
)
lapply( data, function(x) do.call(rbind, x) )
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.