BSFind: RBP binding site definition for iCLIP data

View source: R/bsfind.R

BSFindR Documentation

RBP binding site definition for iCLIP data

Description

This is the main function that performs the binding site definition analysis through the following steps:

  1. Filter PureCLIP sites by their score distribution: pureClipGlobalFilter

  2. Estimate the appropriate binding site width together with the optimal gene-wise filter level: estimateBsWidth

  3. Filter PureCLIP sites by their score distribution per gene: pureClipGeneWiseFilter

  4. Define equally sized binding sites: makeBindingSites

  5. Perform replicate reproducibility filter: reproducibilityFilter

  6. Assign binding sites to their hosting genes: assignToGenes

  7. Assign binding sites to their hosting transcript regions: assignToTranscriptRegions

  8. Re-assign PureCLIP scores to binding sites: annotateWithScore

  9. Calculation of signal-to-flank ratio: calculateSignalToFlankScore

Usage

BSFind(
  object,
  bsSize = NULL,
  cutoff.geneWiseFilter = NULL,
  cutoff.globalFilter = 0.01,
  est.bsResolution = "medium",
  est.geneResolution = "medium",
  est.maxBsWidth = 13,
  est.minimumStepGain = 0.02,
  est.maxSites = Inf,
  est.subsetChromosome = "chr1",
  est.minWidth = 2,
  est.offset = 1,
  est.sensitive = FALSE,
  est.sensitive.size = 5,
  est.sensitive.minWidth = 2,
  merge.minWidth = 2,
  merge.minCrosslinks = 2,
  merge.minClSites = 1,
  merge.CenterIsClSite = TRUE,
  merge.CenterIsSummit = TRUE,
  repro.cutoff = NULL,
  repro.nReps = NULL,
  repro.minCrosslinks = 1,
  overlaps.geneWiseFilter = "keepSingle",
  overlaps.geneAssignment = "frequency",
  overlaps.rule.geneAssignment = NULL,
  overlaps.TranscriptRegions = "frequency",
  overlaps.rule.TranscriptRegions = NULL,
  stf.flank = "bs",
  stf.flank.size = NULL,
  match.score = "score",
  match.geneID = "gene_id",
  match.geneName = "gene_name",
  match.geneType = "gene_type",
  match.ranges.score = NULL,
  match.option.score = "max",
  anno.annoDB = NULL,
  anno.genes = NULL,
  anno.transcriptRegionList = NULL,
  quiet = FALSE,
  veryQuiet = FALSE,
  ...
)

Arguments

object

a BSFDataSet object with stored ranges

bsSize

an odd integer value specifying the size of the output binding sites

cutoff.geneWiseFilter

numeric; defines the cutoff for which sites to remove in in function pureClipGeneWiseFilter. The smallest step is 1% (0.01). A cutoff of 5% will remove the lowest 5% sites, given their score, on each gene, thus keeping the strongest 95%.

cutoff.globalFilter

numeric; defines the cutoff for which sites to keep, the smallest step is 1% (0.01) in function pureClipGlobalFilter

est.bsResolution

character; level of resolution of the binding site width in function estimateBsWidth

est.geneResolution

character; level of resolution of the gene-wise filtering in function estimateBsWidth

est.maxBsWidth

numeric; the largest binding site width which should considered in the testing

est.minimumStepGain

numeric; the minimum additional gain in the score in percent the next binding site width has to have, to be selected as best option

est.maxSites

numeric; maximum number of PureCLIP sites that are used

est.subsetChromosome

character; define on which chromosome the estimation should be done in function estimateBsWidth

est.minWidth

the minimum size of regions that are subjected to the iterative merging routine, after the initial region concatenation.

est.offset

constant added to the flanking count in the signal-to-flank ratio calculation to avoid division by Zero

est.sensitive

logical; whether to enable sensitive pre-filtering before binding site merging or not

est.sensitive.size

numeric; the size (in nucleotides) of the merged sensitive region

est.sensitive.minWidth

numeric; the minimum size (in nucleoties) of the merged sensitive region

merge.minWidth

the minimum size of regions that are subjected to the iterative merging routine, after the initial region concatenation.

merge.minCrosslinks

the minimal number of positions to overlap with at least one crosslink event in the final binding sites

merge.minClSites

the minimal number of crosslink sites that have to overlap a final binding site

merge.CenterIsClSite

logical, whether the center of a final binding site must be covered by an initial crosslink site

merge.CenterIsSummit

logical, whether the center of a final binding site must exhibit the highest number of crosslink events

repro.cutoff

numeric; percentage cutoff to be used for the reproducibility quantile filtering

repro.nReps

numeric; number of replicates that must meet the cutoff defined in repro.cutoff for a binding site to be called reproducible. Defaults to N-1.

repro.minCrosslinks

numeric; minimal number of crosslinks a binding site needs to have to be called reproducible. Acts as a lower boundary for repro.cutoff. Defaults to 1.

overlaps.geneWiseFilter

character; how overlaps should be handled in pureClipGeneWiseFilter

overlaps.geneAssignment

character; how overlaps should be handled in assignToGenes

overlaps.rule.geneAssignment

character vector; a vector of gene types that should be used to handle overlaps if option 'hierarchy' is selected for assignToGenes. The order of the vector is the order of the hierarchy.

overlaps.TranscriptRegions

character; how overlaps should be handled in assignToTranscriptRegions

overlaps.rule.TranscriptRegions

character vector; a vector of gene types that should be used to handle overlaps if option 'hierarchy' is selected for assignToTranscriptRegions. The order of the vector is the order of the hierarchy.

stf.flank

character; how the flanking region shoule be set. Options are 'bs', 'manual'

stf.flank.size

numeric; if flank='manual' provide the desired flanking size

match.score

character; meta column name of the crosslink site

match.geneID

character; meta column name of the genes

match.geneName

character; meta column name of the gene name

match.geneType

character; meta column name of the gene type

match.ranges.score

a GRanges object, with numeric column for the score to match in function annotateWithScore

match.option.score

character; meta column name of the crosslink site in function annotateWithScore

anno.annoDB

an object of class OrganismDbi that contains the gene annotation !!! Experimental !!!

anno.genes

an object of class GenomicRanges that represents the gene ranges directly

anno.transcriptRegionList

an object of class CompressedGRangesList that holds an ranges for each transcript region

quiet

logical; whether to print messages

veryQuiet

logical; whether to suppress all messages

...

additional arguments passed to estimateBsWidth, makeBindingSites and reproducibilityFilter

Details

If only the annotation is provided (anno.genes and anno.transcriptRegionList), then binding sites size (bsSize) and gene-wise cutoff (cutoff.geneWiseFilter) are estimated using estimateBsWidth. To avoid this behavior one has to provide input values for the arguments bsSize and cutoff.geneWiseFilter.

If no binding site size is provided through bsSize, then estimateBsWidth is called to estimate the optimal size for the given data-set. The result of this estimation can be looked at with estimateBsWidthPlot and arguments can be adjusted if needed.

Use the processingStepsFlowChart function to get an overview of all steps carried out by the function.

For complete details on each step, see the manual pages of the respective functions. The BSFind function returns a BSFDataSet with ranges merged into binding sites. A full flowchart for the entire process can be visualized with processingStepsFlowChart. For each of the individual steps dedicated diagnostic plots exists. Further information can be found in our Bioconductor vignette: https://www.bioconductor.org/packages/release/bioc/html/BindingSiteFinder.html

Value

an object of class BSFDataSet with ranges merged into binding sites given the inputs.

See Also

BSFDataSet, estimateBsWidth, pureClipGlobalFilter, pureClipGeneWiseFilter, assignToGenes, assignToTranscriptRegions, annotateWithScore, reproducibilityFilter, calculateSignalToFlankScore, processingStepsFlowChart

Examples

# load clip data
files <- system.file("extdata", package="BindingSiteFinder")
load(list.files(files, pattern = ".rda$", full.names = TRUE))
# Load genes
load(list.files(files, pattern = ".rds$", full.names = TRUE)[1])
# load transcript regions
load(list.files(files, pattern = ".rds$", full.names = TRUE)[2])
BSFind(object = bds, bsSize = 9, anno.genes = gns,
 anno.transcriptRegionList = regions, est.subsetChromosome = "chr22")


ZarnackGroup/BindingSiteFinder documentation built on Nov. 24, 2024, 10:41 a.m.