filterData: Filter the positions of interest
In lcolladotor/derfinder: Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach

filterData

R Documentation

Filter the positions of interest

Description

For a group of samples this function reads the coverage information for a specific chromosome directly from the BAM files. It then merges them into a DataFrame and removes the bases that do not pass the cutoff. This is a helper function for loadCoverage and preprocessCoverage.

Usage

filterData(
  data,
  cutoff = NULL,
  index = NULL,
  filter = "one",
  totalMapped = NULL,
  targetSize = 8e+07,
  ...
)

Arguments

`data`	Either a list of Rle objects or a DataFrame with the coverage information.
`cutoff`	The base-pair level cutoff to use. It's behavior is controlled by `filter`.
`index`	A logical Rle with the positions of the chromosome that passed the cutoff. If `NULL` it is assumed that this is the first time using filterData and thus no previous index exists.
`filter`	Has to be either `'one'` (default) or `'mean'`. In the first case, at least one sample has to have coverage above `cutoff`. In the second case, the mean coverage has to be greater than `cutoff`.
`totalMapped`	A vector with the total number of reads mapped for each sample. The vector should be in the same order as the samples in `data`. Providing this data adjusts the coverage to reads in `targetSize` library prior to filtering. See getTotalMapped for calculating this vector.
`targetSize`	The target library size to adjust the coverage to. Used only when `totalMapped` is specified. By default, it adjusts to libraries with 80 million reads.
`...`	Arguments passed to other methods and/or advanced arguments. Advanced arguments: verbose If `TRUE` basic status updates will be printed along the way. returnMean If `TRUE` the mean coverage is included in the result. `FALSE` by default. returnCoverage If `TRUE`, the coverage DataFrame is returned. `TRUE` by default.

Details

If cutoff is NULL then the data is grouped into DataFrame without applying any cutoffs. This can be useful if you want to use loadCoverage to build the coverage DataFrame without applying any cutoffs for other downstream purposes like plotting the coverage values of a given region. You can always specify the colsubset argument in preprocessCoverage to filter the data before calculating the F statistics.

Value

A list with up to three components.

coverage: is a DataFrame object where each column represents a sample. The number of rows depends on the number of base pairs that passed the cutoff and the information stored is the coverage at that given base. Included only when returnCoverage = TRUE.
position: is a logical Rle with the positions of the chromosome that passed the cutoff.
meanCoverage: is a numeric Rle with the mean coverage at each base. Included only when returnMean = TRUE.
colnames: Specifies the column names to be used for the results DataFrame. If NULL, names from data are used.
smoothMean: Whether to smooth the mean. Used only when filter = 'mean'. This option is used internally by regionMatrix.

Passed to the internal function .smootherFstats, see findRegions.

Author(s)

Leonardo Collado-Torres

Examples

## Construct some toy data
library("IRanges")
x <- Rle(round(runif(1e4, max = 10)))
y <- Rle(round(runif(1e4, max = 10)))
z <- Rle(round(runif(1e4, max = 10)))
DF <- DataFrame(x, y, z)

## Filter the data
filt1 <- filterData(DF, 5)
filt1

## Filter again but only using the first two samples
filt2 <- filterData(filt1$coverage[, 1:2], 5, index = filt1$position)
filt2

lcolladotor/derfinder documentation built on Dec. 17, 2024, 4:53 p.m.