normQC: Normalized bin count QC metrics for normalization benchmark

Description Usage Arguments Details Value Author(s)

View source: R/normQC.R

Description

Quality control statistics of bin counts. First, the counts are tested to follow a normal (or Poisson) distribution across samples, in each bin. Then, the randomness of sample ranks are tested. Finally, Z-scores are computed for a subset of the bins and their normality is tested. The second and third test are the most important. Indeed, consistent rankings supports sample-specific technical bias, hence reduced power to detect 'true' abnormal read counts. Non-normal Z-scores will lead to inappropriate fit for the null distribution.

Usage

1
2
normQC(bc.df, n.subset = 10000, win.size = 100, nb.cores = 1,
  plot = FALSE)

Arguments

bc.df

matrix with bin counts (bins x samples).

n.subset

number of bins to use for the analysis. Default is 10 000. Bins are selected randomly.

win.size

the size of a window for the window-based analysis. Default is 100 (consecutive bins).

nb.cores

the number of cores to use. Default is 1.

plot

Should some graphs be outputed ? Default is FALSE.

Details

Shapiro test is used to test normality of the bin counts across samples. The proportion of bins with non-normal distribution is derived from the Pi0 estimate estimated by package qvalue. Pi0 is the proportion of pvalues following the null distribution.

Goodness of fir from package vcd is used to test if bin counts follow a Poisson distribution. Again Pi0 estimate from qvalue package is used to compute the proportion of bin that don't follow Poisson distribution.

The randomness of the sample ranks in genomic windows is computed by comparing the position of each sample to the median. If the ranks are random this position should follow a Binomial distribution. For each window we report the number of samples that fail this assumption (Bonferroni corrected P-value<.05). The outputed estimate is the average across all analyzed windows, i.e. the average number of samples with non-random ranks.

Z-scores normality is computed by comparing their density distribution and a fitted normal distribution. The estimate represents the proportion of the area under the curve that is unique to the Z-score curve.

Value

a list with

prop.non.normal.bin

proportion of bins with non-normal distribution across samples.

prop.nonRand.rank

proportion of bins with non-random ranks.

prop.non.norm.z.mean

average (across samples) proportion of bins with non-random Z-scores.

prop.non.norm.z.max

maximum (i.e for worst sample) proportion of bins with non-random Z-scores.

prop.non.norm.z

proportion of bins with non-random Z-scores, for each sample.

z.worst.dens

a data.frame with the density of the worst Z distribution.

n.subset

number of bins used for the analysis.

Author(s)

Jean Monlong


jmonlong/PopSV documentation built on Sept. 15, 2019, 9:29 p.m.