dba: Construct a DBA object
In DiffBind: Differential Binding Analysis of ChIP-Seq Peak Data

Description Usage Arguments Details Value Author(s) See Also Examples

Constructs a new DBA object from a sample sheet, or based on an existing DBA object

dba(DBA,mask, minOverlap=2,
    sampleSheet="dba_samples.csv", 
    config=data.frame(AnalysisMethod=DBA_DESEQ2,th=0.05,
                      DataType=DBA_DATA_GRANGES, RunParallel=TRUE, 
                      minQCth=15, fragmentSize=125, 
                      bCorPlot=FALSE, reportInit="DBA", 
                      bUsePval=FALSE, design=TRUE,
                      doBlacklist=TRUE, doGreylist=TRUE),
    peakCaller="raw", peakFormat, scoreCol, bLowerScoreBetter, 
    filter, skipLines=0, 
    bAddCallerConsensus=FALSE, 
    bRemoveM=TRUE, bRemoveRandom=TRUE, 
    bSummarizedExperiment=FALSE,
    attributes, dir)

`DBA`	existing DBA object – if present, will return a fully-constructed DBA object based on the passed one, using criteria specified in the `mask` and/or `minOverlap` parameters. If missing, will create a new DBA object based on the `sampleSheet`.
`mask`	logical or numerical vector indicating which peaksets to include in the resulting model if basing DBA object on an existing one. See `dba.mask`.
`minOverlap`	only include peaks in at least this many peaksets in the main binding matrix if basing DBA object on an existing one. If `minOverlap` is between zero and one, peak will be included from at least this proportion of peaksets.
`sampleSheet`	data frame containing sample sheet, or file name of sample sheet to load (ignored if DBA is specified). Columns names in sample sheet may include: `SampleID:` Identifier string for sample. Must be unique for each sample. `Tissue:` Identifier string for tissue type `Factor:` Identifier string for factor `Condition:` Identifier string for condition `Treatment:` Identifier string for treatment `Replicate:` Replicate number of sample `bamReads:` file path for bam file containing aligned reads for ChIP sample `bamControl:` file path for bam file containing aligned reads for control sample `Spikein:` file path for bam file containing aligned spike-in reads `ControlID:` Identifier string for control sample `Peaks:` path for file containing peaks for sample. Format determined by PeakCaller field or caller parameter `PeakCaller:` Identifier string for peak caller used. If Peaks is not a bed file, this will determine how the Peaks file is parsed. If missing, will use default peak caller specified in caller parameter. Possible values: “raw”: text file file; peak score is in fourth column “bed”: .bed file; peak score is in fifth column “narrow”: default peak.format: narrowPeaks file “macs”: MACS .xls file “swembl”: SWEMBL .peaks file “bayes”: bayesPeak file “peakset”: peakset written out using pv.writepeakset “fp4”: FindPeaks v4 `PeakFormat:` string indicating format for peak files; see PeakCaller and `dba.peakset` `ScoreCol:` column in peak files that contains peak scores `LowerBetter:` logical indicating that lower scores signify better peaks `Counts:` file path for externally computed read counts; see `dba.peakset` (`counts` parameter) For sample sheets loaded from a file, the accepted formats are comma-separated values (column headers, followed by one line per sample), or Excel-formatted spreadsheets (`.xls` or `.xlsx` extension). Leading and trailing white space will be removed from all values, with a warning.
`config`	data frame containing configuration options, or file name of config file to load when constructing a new DBA object from a sample sheet. `NULL` indicates no config file. Relevant fields include: `AnalysisMethod:` either `DBA_DESEQ2` or `DBA_EDGER`. `th:` default threshold for reporting and plotting analysis results. `DataType:` default class for peaks and reports (`DBA_DATA_GRANGES, DBA_DATA_RANGEDDATA, or DBA_DATA_FRAME`). `RunParallel:` logical indicating if counting and analysis operations should be run in parallel using multicore by default. `minQCth:` numeric, for filtering reads based on mapping quality score; only reads with a mapping quality score greater than or equal to this will be counted. `fragmentSize:` numeric with mean fragment size. Reads will be extended to this length before counting overlaps. May be a vector of lengths, one for each sample. `bCorPlot:` logical indicating that a correlation heatmap should be plotted automatically `ReportInit:` string to append to the beginning of saved report file names. `bUsePval:` logical, default indicating whether to use FDR (`FALSE`) or p-values (`TRUE`). `doBlacklist:` logical, whether to attempt to find and apply a blacklist if none is present when running `dba.analyze`. `doBlacklist:` logical, whether to attempt to generate and apply a greylist if none is present when running `dba.analyze`.
`peakCaller`	if a `sampleSheet` is specified, the default peak caller that will be used if the `PeakCaller` column is absent.
`peakFormat`	if a `sampleSheet` is specified, the default peak file format that will be used if the `PeakFormat` column is absent.
`scoreCol`	if a `sampleSheet` is specified, the default column in the peak files that will be used for scoring if the `ScoreCol` column is absent.
`bLowerScoreBetter`	if a `sampleSheet` is specified, the sort order for peak scores if the `LowerBetter` column is absent.
`filter`	if a `sampleSheet` is specified, a filter value if the `Filter` column is absent. Peaks with scores lower than this value (or higher if `bLowerScoreBetter` or `LowerBetter` is `TRUE`) will be removed.
`skipLines`	if a `sampleSheet` is specified, the number of lines (ie header lines) at the beginning of each peak file to skip.
`bAddCallerConsensus`	add a consensus peakset for each sample with more than one peakset (i.e. different peak callers) when constructing a new DBA object from a `sampleSheet`.
`bRemoveM`	logical indicating whether to remove peaks on chrM (mitochondria) when constructing a new DBA object from a sample sheet.
`bRemoveRandom`	logical indicating whether to remove peaks on chrN_random when constructing a new DBA object from a sample sheet.
`bSummarizedExperiment`	logical indicating whether to return resulting object as a `SummarizedExperiment`.
`bCorPlot`	logical indicating that a correlation heatmap should be plotted before returning. If `DBA` is `NULL` (a new DBA object is being created), and `bCorPlot` is missing, then this will take the default value (`FALSE`). However if `DBA` is `NULL` (a new DBA object is being created), and `bCorPlot` is specified, then the specified value will become the default value of `bCorPlot` for the resultant `DBA` object.
`attributes`	vector of attributes to use subsequently as defaults when generating labels in plotting functions: `DBA_ID` `DBA_TISSUE` `DBA_FACTOR` `DBA_CONDITION` `DBA_TREATMENT` `DBA_REPLICATE` `DBA_CONSENSUS` `DBA_CALLER` `DBA_CONTROL`
`dir`	Directory path. If supplied, files referenced in the `sampleSheet` will have this path prepended. Applies to `PeakFiles`, `bamReads`, `bamControl`, and `Spikein`, if present. If `sampleSheet` is a filepath, this will prepended to that as well.

MODE: Construct a new DBA object from a samplesheet:

dba(sampleSheet, config, bAddCallerConsensus, bRemoveM, bRemoveRandom, attributes)

MODE: Construct a DBA object based on an existing one:

dba(DBA, mask, attributes)

MODE: Convert a DBA object to a SummarizedExperiment object:

dba(DBA, bSummarizedExperiment=TRUE)

DBA object

Rory Stark and Gordon Brown

dba.peakset, dba.show

# Create DBA object from a samplesheet
## Not run: 
basedir <- system.file("extra", package="DiffBind")
tamoxifen <- dba(sampleSheet="tamoxifen.csv", dir=basedir)
tamoxifen

tamoxifen <- dba(sampleSheet="tamoxifen_allfields.csv")
tamoxifen

tamoxifen <- dba(sampleSheet="tamoxifen_allfields.csv",config="config.csv")
tamoxifen

## End(Not run)

#Create a DBA object with a subset of samples
data(tamoxifen_peaks)
Responsive <- dba(tamoxifen,tamoxifen$masks$Responsive)
Responsive

# change peak caller but leave peak format the same
basedir <- system.file("extra", package="DiffBind")
tamoxifen <- dba(sampleSheet="tamoxifen.csv", dir=basedir,
                 peakCaller="macs", peakFormat="raw", scoreCol=5 )
dba.show(tamoxifen, attributes=c(DBA_TISSUE,DBA_CONDITION,DBA_REPLICATE,DBA_CALLER))

# Convert DBA object to SummarizedExperiment
data(tamoxifen_counts)
sset <- dba(tamoxifen,bSummarizedExperiment=TRUE)
sset