processData: Pre-process the count matrix

Description Usage Arguments Value Author(s) References Examples

View source: R/processData.R

Description

Estimate size factors using DESeq2, and pre-filter based on low count and low variability genes using custom thresholds on median normalized count and median absolute deviation (MAD) values.

Usage

1
2
3
4
5
6
7
8
9
processData(
  y,
  geoMeans = NULL,
  estimateSFtype = "ratio",
  med_filt = TRUE,
  MAD_filt = TRUE,
  med_thresh = 100,
  MAD_quant_thresh = 50
)

Arguments

y

integer, gene expression count matrix (output from simulateData)

geoMeans

(optional) numeric vector of length g (number of rows of y): custom geometric means of gene counts. Used to estimate prediction set size factors based on training set

estimateSFtype

string, input for estimateSizeFactors "type" argument. must be 'ratio', 'poscounts', or 'iterate'. See DESeq2 vignette.

med_filt

logical, TRUE to filter low-count genes via median threshold

MAD_filt

logical, TRUE to filter low-variable genes via MAD quantile threshold

med_thresh

numeric, median threshold for pre-filtering low-count genes (default 100, i.e. pre-filters genes with median normalized count below 100)

MAD_quant_thresh

numeric value between 0 to 100, quantile threshold imposed on MAD values for pre-filtering low-variable genes (default 50, i.e. pre-filters genes below 50th quantile of MAD values)

Value

list containing the following objects: dds: DESeq2 output. size_factors: numeric vector, estimated size factors, from DESeq2 norm_y: numeric normalized count matrix, corrected for differences in sequencing depth, from DESeq2 idx: logical vector, TRUE for inclusion after pre-filtering low-count and low-variable genes row_medians: numeric vector, median normalized count for each gene row_MADs: numeric vector, median absolute deviation (MAD) value of log(norm_y+0.1) for each gene

Author(s)

David K. Lim, deelim@live.unc.edu

References

https://github.com/DavidKLim/FSCseq

Examples

1
2
sim.dat = simulateData(B=1, g=10000, K=2, n=50, LFCg=1, pDEg=0.05, beta0=12, phi0=0.35, nsims=1, save_file=F)[[1]]
proc.dat = processData(sim.dat$cts)

DavidKLim/FSCseq documentation built on Dec. 12, 2021, 3:46 a.m.