defaultSettings: Default CHiCAGO settings

Description Usage Value Author(s) See Also Examples

View source: R/chicago.R

Description

A function that gives the default settings used for a CHiCAGO experiment.

IMPORTANT: from version 1.13, the following parameters are set based on the values in .npb file header and checked for consistency with the headers of .npbp and .poe files and custom-defined settings. They should therefore be provided to the makeDesignFiles.py script, which needs to be rerun if they need to be modified:

rmapfile (only the basename is checked; inconsistent baitmapfile will only generate a warning for compatibility with publicly released older designs), minFragLen, maxFragLen, binsize, removeAdjacent, adjBait2bait.

Usage

1

Value

A list of the following settings:

rmapfile

Default: NA. The location of the restriction map file; see the vignette for a description of what this file should contain.

baitmapfile

Default: NA. The location of the bait map file; see the vignette for a description of what this file should contain.

nperbinfile

Default: NA. See vignette.

nbaitsperbinfile

Default: NA. See vignette.

proxOEfile

Default: NA. See vignette.

Ncol

Default: "N". The column in intData(cd) that contains the number of reads.

baitmapFragIDcol

Default: 4. In the bait map file, the number of the column that specifies the fragment ID of each bait.

baitmapGeneIDcol

Default: 5. In the bait map file, the number of the column that specifies which gene(s) are on each fragment.

maxLBrownEst

Default: 1500000. The distance range to be used for estimating the Brownian component of the null model The parameter setting should approximately reflect the maximum distance, at which the power-law distance dependence is still observable.

minFragLen

Default: 150. (See maxFragLen.)

maxFragLen

Default: 40000. minFragLen and maxFragLen correspond to the limits within which we observed no clear dependence between fragment length and the numbers of reads mapping to these fragments in HindIII PCHiC data.

These parameters need to be modified when using a restriction enzyme with a different cutting frequency (such as a 4-cutter) and can also be verified by users with their datasets in each individual case. However, we note that the fragment-level scaling factors (s_i and s_j) generally incorporate the effects of fragment size, so this filtering step only aims to remove the strongest bias.

minNPerBait

Default: 250. Minimum number of reads that a bait has to accumulate to be included in the analysis.

Reasonable numbers of per-bait reads are required for robust parameter estimation. If this value is too low, the confidence of interaction calling is reduced. If too high, too many baits may be unreasonably excluded from the analysis. If it is desirable to include baits below this threshold, we recommend decreasing this parameter and then visually examining the result bait profiles (for example, using plotBaits()).

binsize

Default: 20000. The bin size (in bases) used when estimating the Brownian collision parameters.

The bin size should, on average, include several (~4-5) restriction fragments to increase the robustness of parameter estimation. However, using too large bins will reduce the precision of distance function estimation. Therefore, this value needs to be changed if using an enzyme with a different cutting frequency (such as a 4-cutter).

removeAdjacent

Default: TRUE. Should fragments adjacent to baits be removed from analysis?

We remove fragments adjacent to baits by default, as the corresponding ligation products are indistinguishable from incomplete digestion. This setting however may be set to FALSE if the rmap and baitmap files represent bins over multiple fragments as opposed to fragment-level data (e.g., to address sparsity issues with low-coverage experiments).

adjBait2bait

Default: TRUE. Should baited fragments be treated separately? Baited fragments are treated separately from the rest in estimating other end-level scaling factors (si) and technical noise levels. It is a free parameter mainly for development purposes, and we do not recommend changing it.

tlb.filterTopPercent

Default: 0.01. Top percent of fragments with respect to accumulated trans-counts to be filtered out in the binning procedure.

Other ends are pooled together when calculating their scaling factors and as part of technical noise estimation. Binning is performed by quantile, and for the most extreme outliers this approach is not going to be adequate. Increasing this value may potentially make the estimation for the highest-count bin more robust, but will exclude additional other ends from the analysis.

tlb.minProxOEPerBin

Default: 50000. Minimum pool size (i.e. minimum number of other ends per pool), used when pooling other ends together based on trans-counts.

If this parameter is set too small, then estimates will be imprecise due to sparsity issues. If this parameter is set too large, then the model becomes inflexible and so the model fit is hindered. This parameter could be decreased in a dataset that has been sequenced to an extremely high depth. Alternatively, it may need to be decreased out of necessity, in a dataset with very few other ends - for example, the vignette decreases this setting to process the PCHiCdata package data (since these data sets span only a small subset of the genome, in each case).

tlb.minProxB2BPerBin

Default: 2500. Minimum pool size, used when pooling other ends together (bait-to-bait interactions only). (See previous entry, tlb.minProxOEPerBin, for advice on setting parameter.)

techNoise.minBaitsPerBin

Default: 1000. Minimum pool size, used when pooling baits together based on accumulated trans-counts. (See tlb.minProxOEPerBin for advice on setting parameter.)

brownianNoise.samples

Default: 5. Number of times subsampling occurs when estimating the Brownian collision dispersion.

Dispersion estimation from a subset of baits has an error attached. Averaging over multiple subsamples allows us to decrease this error. Increasing this number improves the precision of dispersion estimation at the expense of greater runtime.

brownianNoise.subset

Default: 1000. Number of baits sampled from when estimating the Brownian noise dispersion. If set to NA, then all baits are used.

Estimating dispersion from the entire dataset usually requires a prohibitively large amount of memory. A subset is chosen that is large enough to get a reasonably precise estimate of the dispersion, but small enough to stay in memory. A user with excess memory may wish to increase this number to further improve the estimate's precision.

brownianNoise.seed

Default: NA. If not NA, then brownianNoise.seed is used as the random number generator seed when subsampling baits. Set this to make your analysis reproducible.

baitIDcol

Default: "baitID". The name of the baitID column in intData(cd).

otherEndIDcol

Default: "otherEndID". The name of the otherEndID column in intData(cd).

otherEndLencol

Default: "otherEndLen". The name of the column in intData(cd) that contains the lengths of the other end fragments.

distcol

Default: "distSign". The name of the column in intData(cd) that contains the genomic distance that an interaction spans.

weightAlpha

Default: 34.1157346557331. This, and the following parameters, are used in the p-value weighting procedure.

weightBeta

Default: -2.58688050486759

weightGamma

Default: -17.1347845819659

weightDelta

Default: -7.07609245521541

Author(s)

Mikhail Spivakov, Jonathan Cairns, Paula Freire Pritchett

See Also

setExperiment, modifySettings

Examples

1
2

Example output

Loading required package: data.table

Welcome to CHiCAGO - version 1.4.0
If you are new to CHiCAGO, please consider reading the vignette through the command: vignette("Chicago").
NOTE: Default values of tlb.minProxOEPerBin and tlb.minProxB2BPerBin changed as of Version 1.1.5. No action is required unless you specified non-default values, or wish to re-run the pipeline on old chicagoData objects. See news(package="Chicago")
$rmapfile
[1] NA

$baitmapfile
[1] NA

$nperbinfile
[1] NA

$nbaitsperbinfile
[1] NA

$proxOEfile
[1] NA

$Ncol
[1] "N"

$baitmapFragIDcol
[1] 4

$baitmapGeneIDcol
[1] 5

$maxLBrownEst
[1] 1500000

$minFragLen
[1] 150

$maxFragLen
[1] 40000

$minNPerBait
[1] 250

$binsize
[1] 20000

$removeAdjacent
[1] TRUE

$adjBait2bait
[1] TRUE

$tlb.filterTopPercent
[1] 0.01

$tlb.minProxOEPerBin
[1] 50000

$tlb.minProxB2BPerBin
[1] 2500

$techNoise.minBaitsPerBin
[1] 1000

$brownianNoise.samples
[1] 5

$brownianNoise.subset
[1] 1000

$brownianNoise.seed
[1] NA

$baitIDcol
[1] "baitID"

$otherEndIDcol
[1] "otherEndID"

$otherEndLencol
[1] "otherEndLen"

$distcol
[1] "distSign"

$weightAlpha
[1] 34.11573

$weightBeta
[1] -2.586881

$weightGamma
[1] -17.13478

$weightDelta
[1] -7.076092

Chicago documentation built on Nov. 8, 2020, 8:15 p.m.