getSampleData: Reading and writing sample data from / to a tally file

Description Usage Arguments Details Value Author(s) Examples

View source: R/get.sample.metadata.R

Description

These functions allow reading and writing of sample data to the HDF5-based tally files. The sample data is stored as group attribute.

Usage

1
2
getSampleData( filename, group )
setSampleData( filename, group, sampleData, largeAttributes = FALSE, stringSize = 64 )

Arguments

filename

The name of a tally file

group

The name of a group within that tally file, e.g. /ExampleStudy/22

sampleData

A data.frame with k rows (one for each sample) and columns Type, Column and (SampleGroup or Patient. Additional column will be added as well but are not required.)

largeAttributes

HDF5 limits the size of attributes to 64KB, if you have many samples setting this flag will write the attributes in a separate dataset instead. getSampleData is aware of this and automatically chooses the dataset-stored attributes if they are present

stringSize

Maximum length for string attributes (number of characters) - default of 64 characters should be fine for most cases; This has to be specified since we do not support variable length strings as of now.

Details

The returned data.frame contains information about the sample ids, sample columns in the sample dimension of the dataset. The type of sample must be one of c("Case","Control") to be used with the provided SNV calling function. Additional relevant per-sample information may be stored here.

Note that the following columns are required in the sample data where the rows represent samples in the cohort:

Sample: the sample id of the corresponding sample

Column: the index within the genomic position dimension of the corresponding sample, be aware that getSampleData and setSampleData automatically add / remove 1 from this value since internally the tally files store the dimension 0-based whereas within R we count 1-based.

Patient the patient id of the corresponding sample

Type the type of sample

Value

sampledata

A data.frame with k rows (one for each sample) and columns Type, Column and (SampleGroup or Patient).

Author(s)

Paul Pyl

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  # loading library and example data
  library(h5vc)
  tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
  sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
  sampleData
  # modify  the sample data
  sampleData$AnotherColumn <- paste( sampleData$Patient, "Modified" )
  # write to tallyFile
  setSampleData( tallyFile, "/ExampleStudy/16", sampleData )
  # re-load and check if it worked
  sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
  sampleData

h5vc documentation built on Nov. 8, 2020, 4:56 p.m.