pureClipGeneWiseFilter: Filter PureCLIP sites by their score distribution per gene

View source: R/workflow.R

pureClipGeneWiseFilterR Documentation

Filter PureCLIP sites by their score distribution per gene

Description

Function that applies a filter on the crosslink site score distribution at gene level. This allows to filter for those sites with the strongest signal on each gene. Since scores are tied to the expression level of the hosting transcript this function allows a fair filter for all genes partially independent of the expression level.

Usage

pureClipGeneWiseFilter(
  object,
  cutoff = 0.05,
  overlaps = c("keepSingle", "removeAll", "keepAll"),
  anno.annoDB = NULL,
  anno.genes = NULL,
  match.score = "score",
  match.geneID = "gene_id",
  quiet = FALSE
)

Arguments

object

a BSFDataSet object with stored crosslink ranges of width=1

cutoff

numeric; defines the cutoff for which sites to remove, the smallest step is 1% (0.01). A cutoff of 5% will remove the lowest 5% sites, given their score, on each gene, thus keeping the strongest 95%.

overlaps

character; how overlapping gene loci should be handled.

anno.annoDB

an object of class OrganismDbi that contains the gene annotation (!!! Experimental !!!).

anno.genes

an object of class GenomicRanges that represents the gene ranges directly

match.score

character; meta column name of the crosslink site GenomicRanges object that holds the score which is used for sub-setting

match.geneID

character; meta column name of the genes GenomicRanges object that holds a unique geneID

quiet

logical; whether to print messages

Details

The GenomicRanges contained in the BSFDataSet need to have a meta-column that holds a numeric score value, which is used for filtering. The name of the column can be set with scoreCol.

In the case of overlapping gene annotation, a single crosslink site will be attributed to multiple genes. The overlaps parameter allows to control these cases. Option 'keepSingle' will only keep a single instance of the site; 'removeAll' will remove both sites; 'keepAll' will keep both sites.

The function is part of the standard workflow performed by BSFind.

Value

an object of class BSFDataSet with its ranges filtered by those that passed the gene-wise threshold set with cutoff

See Also

BSFind, estimateBsWidthPlot

Examples

# load clip data
files <- system.file("extdata", package="BindingSiteFinder")
load(list.files(files, pattern = ".rda$", full.names = TRUE))
# Load GRanges with genes
load(list.files(files, pattern = ".rds$", full.names = TRUE)[1])
# apply 5% gene-wise filter
pureClipGeneWiseFilter(object = bds, anno.genes = gns, cutoff = 0.5, overlaps = "keepSingle")


ZarnackGroup/BindingSiteFinder documentation built on May 31, 2024, 3:29 a.m.