estimateBsWidth: Function to estimate the appropriate binding site width...

estimateBsWidthR Documentation

Function to estimate the appropriate binding site width together with the optimal gene-wise filter level.

Description

This function tests different width of binding sites for different gene-wise filtering steps. For each test the signal-to-score ratio is calculated. The mean over all gene-wise filterings at each binding site width is used to extract the optimal width, which serves as anchor to select the optimal gene-wise filter.

Usage

estimateBsWidth(
  object,
  bsResolution = c("medium", "fine", "coarse"),
  geneResolution = c("medium", "coarse", "fine", "finest"),
  est.maxBsWidth = 13,
  est.minimumStepGain = 0.02,
  est.maxSites = Inf,
  est.subsetChromosome = "chr1",
  est.minWidth = 2,
  est.offset = 1,
  sensitive = FALSE,
  sensitive.size = 5,
  sensitive.minWidth = 2,
  anno.annoDB = NULL,
  anno.genes = NULL,
  bsResolution.steps = NULL,
  geneResolution.steps = NULL,
  quiet = TRUE,
  veryQuiet = FALSE,
  reportScoresPerBindingSite = FALSE,
  ...
)

Arguments

object

a BSFDataSet object with stored crosslink sites. This means that ranges should have a width = 1.

bsResolution

character; level of resolution at which different binding site width should be tested

geneResolution

character; level of resolution at which gene-wise filtering steps should be tested

est.maxBsWidth

numeric; the largest binding site width which should considered in the testing

est.minimumStepGain

numeric; the minimum additional gain in the score in percent the next binding site width has to have, to be selected as best option

est.maxSites

numeric; maximum number of PureCLIP sites that are used;

est.subsetChromosome

character; define on which chromosome the estimation should be done in function estimateBsWidth

est.minWidth

the minimum size of regions that are subjected to the iterative merging routine, after the initial region concatenation.

est.offset

constant added to the flanking count in the signal-to-flank ratio calculation to avoid division by Zero

sensitive

logical; whether to enable sensitive pre-filtering before binding site merging or not

sensitive.size

numeric; the size (in nucleotides) of the merged sensitive region

sensitive.minWidth

numeric; the minimum size (in nucleoties) of the merged sensitive region

anno.annoDB

an object of class OrganismDbi that contains the gene annotation (!!! Experimental !!!).

anno.genes

an object of class GenomicRanges that represents the gene ranges directly

bsResolution.steps

numeric vector; option to use a user defined threshold for binding site width directly. Overwrites bsResolution

geneResolution.steps

numeric vector; option to use a user defined threshold vector for gene-wise filtering resolution. Overwrites geneResolution

quiet

logical; whether to print messages

veryQuiet

logical; whether to suppress all messages

reportScoresPerBindingSite

report the ratio score for each binding site separately. Warning! This is for debugging and testing only. Downstream functions can be impaired.

...

additional arguments passed to pureClipGeneWiseFilter

Details

Parameter estimation is done on a subset of all crosslink sites (est.subsetChromosome).

Gene-level filter can be tested with varying levels of accuracy ranging from 'finest' to 'coarse', representing 1 20

Binding site computation at each step can be done on three different accuracy level (bsResolution). Option 'fine' is equal to a normal run of the makeBindingSites function. 'medium' will perform a shorter version of the binding site computation, skipping some of the refinement steps. Option 'coarse' will approximate binding sites by merged crosslinks regions, aligning the center at the site with the highest score.

For each binding site in each set given the defined resolutions a signal-to- flank score ratio is calculated and the mean of this score per set is returned. Next a mean of means is created which results in a single score for each binding site width that was tested. The width that yielded the highest score is selected as optimal. In addtion the minimumStepGain option allows control over the minimum additional gain in the score that a tested width has to have to be selected as the best option.

To enhance the sensitivity of the binding site estimation, the sensitivity mode exists. In this mode crosslink sites undergo a pre-filtering and merging step, to exclude potential artifical peaks (experimental-, mapping-biases). If sensitivity mode is activated the est.minWidth option should be set to 1.

The optimal geneFilter is selected as the first one that passes the merged mean of the selected optimal binding site width.

The function is part of the standard workflow performed by BSFind.

Value

an object of class BSFDataSet with binding sites with the 'params' slots 'bsSize' and 'geneFilter' being filled

See Also

BSFind, estimateBsWidthPlot

Examples

# load clip data
files <- system.file("extdata", package="BindingSiteFinder")
load(list.files(files, pattern = ".rda$", full.names = TRUE))
load(list.files(files, pattern = ".rds$", full.names = TRUE)[1])
load(list.files(files, pattern = ".rds$", full.names = TRUE)[2])
estimateBsWidth(bds, anno.genes = gns, est.maxBsWidth = 19,
 geneResolution = "coarse", bsResolution = "coarse", est.subsetChromosome = "chr22")


ZarnackGroup/BindingSiteFinder documentation built on May 2, 2024, 12:38 a.m.