prepareTallyFile: prepareTallyFile
In h5vc: Managing alignment tallies using a hdf5 backend

Description Usage Arguments Details Value Author(s) Examples

Functions for preparing an HDF5 file for storing tally data and / or modifying an existing file

1
2

prepareTallyFile( filename, study, chrom, chromlength, nsamples, maxsamples = nsamples, chunkSize = 50000, sampleChunkSize = nsamples, compressionLevel = 9, referenceFillValue = 5 )
resizeCohort( filename, study, chrom, newNumberOfSamples, dimmap = .sampleDimMap, force = FALSE )

`filename`	Filename of the HDF5 file that should store the tallies
`study`	Study identifier which will be used in structuring the file
`chrom`	Chromosome for which the structure should be generated
`chromlength`	The length of the chromosom, this will be the size of genomic position dimension
`nsamples`	Number of samples that will be stored in the file
`maxsamples`	Maximum Number of samples that can be stored in the file, this relatesto the maxdim property of HDF5 datasets, which is used to specify possible re-sizing of datasets after creation - see `http:://www.hdfgroup.org` for details
`chunkSize`	The size of the chunks used in HDF5 storage, this is specified along the genomic position dimension, by default chunks will always be all data from all samples with the given width along the genomic position dimension
`compressionLevel`	Compression level to use in the HDF5 file, defaults to `9` (highest), use lower numbers to improve access time at the cost of disk space usage
`sampleChunkSize`	Size of the HDF5 chunks along the sample dimension, the dafault value is the whole dataset, i.e. all samples. For larger datasets where the typical use-case is to extract only data corresponding to a specific sample and genomic position, smaller values of `sampleChunkSize` should be used.
`referenceFillValue`	Default value to be used for the Reference dataset, this is set to `5` by default, which corresponds to the nucleotide `N`
`newNumberOfSamples`	New cohort size, this must be smaller than the value of `maxsamples` that was provided when the file was created
`dimmap`	A list mapping dataset names to the dimension in which the samples are stored (e.g. "Counts" -> 2)
`force`	Boolean parameter that controls whether a shrinking operation (i.e. newNumberOfSamples is smaller than the current number of samples) should be performed or throw an error. Shrinking will result in data loss.

prepareTallyFile prepares (and creates if neccessary) an HDF5 file for storing the datasets that are associated with a tally. It creates the required groups and datasets (filled with 0's). resizeCohortResizes the datasets to a new number of samples, this is limited by the value of maxsamples that was provided in the initial call to prepareTallyFile

Returns TRUE on success

Paul Pyl