sp_weights: Non parametric local heteroscedasticity weights

Description Usage Arguments Value See Also Examples

View source: R/sp_weights.R

Description

Computes precision weights that account for heteroscedasticity in RNA-seq count data based on non-parametric local linear regression estimates.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
sp_weights(
  y,
  x,
  phi,
  use_phi = TRUE,
  preprocessed = FALSE,
  doPlot = FALSE,
  gene_based = FALSE,
  bw = c("nrd", "ucv", "SJ", "nrd0", "bcv"),
  kernel = c("gaussian", "epanechnikov", "rectangular", "triangular", "biweight",
    "tricube", "cosine", "optcosine"),
  exact = FALSE,
  transform = TRUE,
  verbose = TRUE,
  na.rm = FALSE
)

Arguments

y

a numeric matrix of size G x n containing the raw RNA-seq counts or preprocessed expression from n samples for G genes.

x

a numeric matrix of size n x p containing the model covariate(s) from n samples (design matrix).

phi

a numeric design matrix of size n x K containing the K variable(s) of interest( e.g. bases of time).

use_phi

a logical flag indicating whether conditional means should be conditioned on phi and on covariate(s) x, or on x alone. Default is TRUE in which case conditional means are estimated conditionally on both x and phi.

preprocessed

a logical flag indicating whether the expression data have already been preprocessed (e.g. log2 transformed). Default is FALSE, in which case y is assumed to contain raw counts and is normalized into log(counts) per million.

doPlot

a logical flag indicating whether the mean-variance plot should be drawn. Default is FALSE.

gene_based

a logical flag indicating whether to estimate weights at the gene-level. Default is FALSE, when weights will be estimated at the observation-level.

bw

a character string indicating the smoothing bandwidth selection method to use. See bandwidth for details. Possible values are "ucv", "SJ", "bcv", "nrd" or "nrd0". Default is "nrd".

kernel

a character string indicating which kernel should be used. Possibilities are "gaussian", "epanechnikov", "rectangular", "triangular", "biweight", "tricube", "cosine", "optcosine". Default is "gaussian" (NB: "tricube" kernel corresponds to the loess method).

exact

a logical flag indicating whether the non-parametric weights accounting for the mean-variance relationship should be computed exactly or extrapolated from the interpolation of local regression of the mean against the variance. Default is FALSE, which uses interpolation (faster).

transform

a logical flag indicating whether values should be transformed to uniform for the purpose of local linear smoothing. This may be helpful if tail observations are sparse and the specified bandwidth gives suboptimal performance there. Default is TRUE.

verbose

a logical flag indicating whether informative messages are printed during the computation. Default is TRUE.

na.rm

logical: should missing values (including NA and NaN) be omitted from the calculations? Default is FALSE.

Value

a n x G matrix containing the computed precision weights.

See Also

bandwidth density

Examples

1
2
3
4
5
6
7
8
9
#rm(list = ls())
set.seed(123)

G <- 10000
n <- 12
p <- 2
y <- sapply(1:G, FUN = function(x){rnbinom(n = n, size = 0.07, mu = 200)})

x <- sapply(1:p, FUN = function(x){rnorm(n = n, mean = n, sd = 1)})

tcgsaseq documentation built on Sept. 13, 2020, 5:13 p.m.