processSeq: Process Sequencing Data for Poisson-based MRFs

Description Usage Arguments Details Value Examples

Description

Process and normalize RNA-Sequencing count data into a distribution appropriate for Poisson MRFs.

Usage

1
2
processSeq(X, quanNorm = 0.75, nLowCount = 20, percentLowCount = 0.95, NumGenes = 500, 
PercentGenes = 0.1)

Arguments

X

nxp data matrix.

quanNorm

an optional parameter controlling the quantile for sample normalization, default to 0.75.

nLowCount

minimum read count to decide if to filter a gene, default to 20.

percentLowCount

filter out a gene if it has this percentage of samples less than nLowCount, default to 0.95.

NumGenes

number of genes to retain in the final data set, default to 500.

PercentGenes

percentage of genes to retain, default to 0.1.

Details

To process the next-generation sequencing count data into proper distribution (with dispersion removed), the following steps are taken in this function:

  1. Quantile normalization for the samples.

  2. Filter out genes with all low counts.

  3. Filter genes by maximal variance (if specified).

  4. Transform the data to be closer to the Poisson distribution. A log or power transform is considered and selected based upon the Kolmogorov-Smirnov goodness of fit test.

Value

a n x NumGenes or PercentGenes processed data matrix.

Examples

1
2
3
library(XMRF)
data('brcadat')
brca = t(processSeq(t(brcadat), PercentGenes=1))

Example output



XMRF documentation built on May 2, 2019, 8:18 a.m.