GenoGAMDataSet constructor.

Description

This is the constructor function for GenoGAMDataSet. So far a GenoGAMDataSet can be constructed from either an experiment design file or data.frame or directly from a RangedSummarizedExperiment with a GPos object being the rowRanges.

Usage

1
2
GenoGAMDataSet(experimentDesign, chunkSize, overhangSize, design,
  directory = ".", settings = NULL, ...)

Arguments

experimentDesign

Either a character object specifying the path to a delimited text file (the delimiter will be determined automatically), or a data.frame specifying the experiment design. See details for the structure of the experimentDesign.

chunkSize

An integer specifying the size of one chunk in bp.

overhangSize

An integer specifying the size of the overhang in bp. As the overhang is taken to be symmetrical, only the overhang of one side should be provided.

design

A mgcv-like formula object. See details for its structure.

directory

The directory from which to read the data. By default the current working directory is taken.

settings

A GenoGAMSettings object. This class is already present but not yet fully tested and therefore not accessible to the user. This argument exists however in order to allow some workarounds if necessary. See the vignette for a possible use.

...

Further parameters, mostly for arguments of custom processing functions or to specify a different method for fragment size estimation. See details for further information.

Details

The experimentDesign file/data.frame must contain at least three columns with fixed names: 'ID', 'file' and 'paired'.The field 'ID' stores a unique identifier for each alignment file. It is recommended to use short and easy to understand identifiers because they are subsequently used for labelling data and plots. The field 'file' stores the BAM file name. The field 'paired', values TRUE for paired-end sequencing data, and FALSE for single-end sequencing data. All other columns are stored in the colData slot of the GenoGAMDataSet object. Note that all columns which will be used for analysis must have at most two conditions, which are for now restricted to 0 and 1. For example, if the IP data schould be corrected for input, then the input will be 0 and IP will be 1, since we are interested in the corrected IP. See examples.

Design must be a mgcv-like formula. At the moment only the following is possible: Either '~ 1' for a constant. ~ s(x) for a smooth fit over the entire data. s(x, by = "myColumn"), where 'myColumn' is a column name in the experimentDesign. This type of formula will then only fit the samples annotated with 1 in this column. Or ~ s(x) + s(x, by = "myColumn") + s(x, by = ...) + ... The last formula lets you combine any number of columns, given they are binary with 0 and 1. For example the formula for correcting IP for input would look like this: ~ s(x) + s(x, by = "experiment"), where 'experiment' is a column with 0s and 1s, with the ip samples annotated with 1 and input samples with 0. ' In case of single-end data in might be usefull to specify a different method for fragment size estimation. The argument 'shiftMethod' can be supplied with the values 'coverage' (default), 'correlation' or 'SISSR'. See ?chipseq::estimate.mean.fraglen for explanation.

Value

An object of class GenoGAMDataSet.

Author(s)

Georg Stricker georg.stricker@in.tum.de

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## Not run: 
myConfig <- data.frame(ID = c("input","ip"),
                  file = c("myInput.bam", "myIP.bam"),
                  paired = c(FALSE, FALSE),
                  experiment = factor(c(0,1)),
                  stringsAsFactors = FALSE) 
myConfig2 <- data.frame(ID = c("wildtype1","wildtype2",
                              "mutant1", "mutant2"),
                  file = c("myWT1.bam", "myWT2.bam"
                           "myMutant1.bam", "myMutant2.bam"),
                  paired = c(FALSE, FALSE, FALSE, FALSE),
                  experiment = factor(c(0, 0, 1, 1)),
                  stringsAsFactors = FALSE)

gtiles <- GenoGAMDataSet(myConfig, chunkSize = 2000,
overhang = 250, design = ~ s(x) + s(x, by = "experiment")
gtiles <- GenoGAMDataSet(myConfig2, chunkSize = 2000,
overhang = 250, design = ~ s(x) + s(x, by = "experiment"))

## End(Not run)
## make a test dataset
ggd <- makeTestGenoGAMDataSet()
ggd

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.