FiltDeepSignal: FiltModDeepSignal Function (Filter)

View source: R/Filter.R

FiltDeepSignalR Documentation

FiltModDeepSignal Function (Filter)

Description

Filter out data from contigs or Modifications that do not reach criterias of selection. Can also be used to obtain a gposDeepSignalMod object by simply filtering target sites which have a fraction above 0.

Usage

FiltDeepSignal(
  gposDeepSignalModBase = NULL,
  gposDeepSignalMod = NULL,
  cContigToBeRemoved = NULL,
  dnastringsetGenome,
  nContigMinSize = -1,
  listPctSeqByContig,
  nContigMinPctOfSeq = -1,
  listMeanCovByContig,
  nContigMinCoverage = -1,
  cParamNameForFilter = NULL,
  nFiltParamLoBoundaries = NULL,
  nFiltParamUpBoundaries = NULL,
  cFiltParamBoundariesToInclude = NULL,
  listMeanParamByContig = NULL,
  nContigFiltParamLoBound = NULL,
  nContigFiltParamUpBound = NULL,
  nModMinCoverage = NULL
)

Arguments

gposDeepSignalModBase

An UnStitched GPos object containing DeepSignal modification target sites data to be filtered. Defaults to NULL.

gposDeepSignalMod

An UnStitched GPos object containing DeepSignal modified sites data to be filtered. Defaults to NULL.

cContigToBeRemoved

Names of contigs for which the data will be removed. gposPacBioCSV must be provided if using this argument. Defaults to NULL.

dnastringsetGenome

A DNAStringSet object containing the sequence for each contig.

nContigMinSize

Minimum size for contigs to keep. Contigs with a size below this value will be removed. gposPacBioCSV must be provided if using this argument. Defaults to -1 (= no filter).

listPctSeqByContig

List containing, for each strand, the percentage of sequencing for each contig. This list must be composed of 2 dataframes (one by strand) called f_strand and r_strand. In each dataframe, "refName" column returning names of contigs and "seqPct" column returning percentage of sequencing. gposPacBioCSV must be provided if using this argument.

nContigMinPctOfSeq

Minimum percentage of sequencing for contigs to keep. Contigs with a percentage below this value will be removed. gposPacBioCSV must be provided if using this argument. Defaults to 95.

listMeanCovByContig

List containing, for each strand, the mean of coverage for each contig. This list must be composed of 2 dataframes (one by strand) called f_strand and r_strand. In each dataframe, "refName" column returning names of contigs and "mean_coverage" column returning mean of coverage. gposPacBioCSV must be provided if using this argument.

nContigMinCoverage

Minimum mean coverage for contigs to keep. Contigs with a mean coverage below this value will be removed. gposPacBioCSV must be provided if using this argument. Defaults to 20.

cParamNameForFilter

A character vector giving the name of the parameter to be filtered. Must correspond to the name of one column in the object provided with grangesModPos.

nFiltParamLoBoundaries

A numeric vector returning the lower boundaries of intervals. Must have the same length as "nFiltParamUpBoundaries". Defaults to NULL.

If this parameter is provided, the function will remove modifications which have values of the given parameter that are not included in the intervals provided with "nFiltParamLoBoundaries" and "nFiltParamUpBoundaries".

nFiltParamUpBoundaries

A numeric vector returning the upper boundaries of intervals. Must have the same length as "nFiltParamLoBoundaries". Defaults to NULL.

If this parameter is provided, the function will remove modifications which have values of the given parameter that are not included in the intervals provided with "nFiltParamLoBoundaries" and "nFiltParamUpBoundaries".

cFiltParamBoundariesToInclude

A character vector describing which interval boundaries must be included in the intervals provided. Can be "upperOnly" (only upper boundaries), "lowerOnly" (only lower boundaries), "both" (both upper and lower boundaries) or "none" (do not include upper and lower boundaries). If NULL, both upper and lower boundaries will be included (= "both"). Defaults to NULL. cFiltParamBoundariesToInclude = NULL #can be "upperOnly","lowerOnly","both", "none' (NULL = "both" for all)

listMeanParamByContig

List containing, for each strand, the mean of a given parameter for each contig. This list must be composed of 2 dataframes (one by strand) called f_strand and r_strand. In each dataframe, "refName" column returning names of contigs and "mean_"[parameter name] column returning the mean of the given parameter. If not NULL, remove contigs that are too far away from the mean of the Parameter of all contigs (which are not included in the interval centered on the mean) in the list provided. Defaults to NULL.

nContigFiltParamLoBound

A numeric value to be removed from the mean of the given parameter of all contigs (calculates the lower bound of the interval centered on the mean). Defaults to NULL.

nContigFiltParamUpBound

A numeric value to be added to the mean of the given parameter of all contigs (calculates the upper bound of the interval centered on the mean). Defaults to NULL.

nModMinCoverage

Minimum coverage for all Modifications to be kept. Modifications with a coverage below this value will be removed. Defaults to NULL (no filter).

Examples

# Loading Nanopore data
myDeepSignalModPath <- system.file(
  package = "DNAModAnnot", "extdata",
  "FAB39088-288418386-Chr1.CpG.call_mods.frequency.tsv"
)
mygposDeepSignalModBase <- ImportDeepSignalModFrequency(
  cDeepSignalModPath = myDeepSignalModPath,
  lSortGPos = TRUE,
  cContigToBeAnalyzed = "all"
)
mygposDeepSignalModBase

# Filtering
mygposDeepSignalMod <- FiltDeepSignal(
  gposDeepSignalModBase = mygposDeepSignalModBase,
  cParamNameForFilter = "frac",
  nFiltParamLoBoundaries = 0,
  nFiltParamUpBoundaries = 1,
  cFiltParamBoundariesToInclude = "upperOnly"
)$Mod
mygposDeepSignalMod

AlexisHardy/DNAModAnnot documentation built on Feb. 27, 2023, 12:03 a.m.