helpers: helper functions

Description Usage Arguments Details Value Author(s) Examples

Description

These functions are helpers for dealing with tally data stored in HDF5 files.

Usage

1
2
3
4
5
formatGenomicPosition( x, unit = "Mb", divisor = 1000000, digits = 3,
nsmall = 1 )
encodeDNAString( ds )
defineBlocks( start, stop, blocksize )
getChromSize( tallyFile, group, dataset = "Reference", posDim = 1 )

Arguments

x

Numerical genomic position

unit

Which unit to convert the position to

divisor

divisor corresponding to the unit, i.e. 'Mb' -> 1e6, 'Kb' -> 1e3

digits

number of digits to keep

nsmall

nsmall parameter to the format function

ds

A DNAString object to be encoded in the HDF5 tally file specific encoding of nucleotides.

start

first position

stop

last position

blocksize

size of blocks

tallyFile

Tally file to work on

group

Group within tallyFile that we want to find the chromosome size for

dataset

Datset to extract chromosome size from - default is "Reference"

posDim

Which dimension of the dataset describes the genomic position

Details

formatGenomicPosition: Helps formatting genomic positions for annotating axes in mismatch plots etc.

encodeDNAString: This translates a DNAString object into a comaptible encoding that can be written to a HDF5 based tally file in the Reference dataset. Since the Python script for generating tallies only sets the Reference dataset in positions where mismatches exists updating the Reference dataset becomes necessary if one would like to perform analysis involving sequence context (GC-bias, mutationSpectrum, etc.)

defineBlocks: This function returns a data.frame with the columns Start and End for blocks of size blocksize spanning the interval [start, stop].

getChromSize: This function is a helper to quickly look-up the chromosome size of a given group and tally file.

Value

formatGenomicPosition: formatted genomic position, e.g. "123.4 Mb"

encodeDNAString: A numeric vector encoding the nucleotide sequence provided in ds according to the scheme c("A"=0,"C"=1,"G"=2,"T"=3).

defineBlocks: A data.frame with the columns Start and End for blocks of size blocksize spanning the interval [start, stop].

getChromSize: Returns a numeric that is the size of the chromosome.

Author(s)

Paul Pyl

Examples

1
2
3
4
  formatGenomicPosition(123456789)
  library(Biostrings)
  lapply( DNAStringSet( c("simple"="ACGT", "movie"="GATTACA") ), encodeDNAString )
  getChromSize( system.file("extdata", "example.tally.hfs5", package="h5vcData"), "/ExampleStudy/16" )

h5vc documentation built on Nov. 8, 2020, 4:56 p.m.