QC: Perform quality control analysis for high-throughput...

QCR Documentation

Perform quality control analysis for high-throughput screening data.

Description

This function performs comprehensive quality control analysis on high-throughput screening data to evaluate experimental design and data quality. It generates multiple diagnostic plots and calculates SSMD (Strictly Standardized Mean Difference) scores to assess the separation between positive and negative controls.

Usage

QC(countMat, negGene, posGene)

Arguments

countMat

A matrix of raw count data where rows represent genes/siRNAs and columns represent readouts/conditions. The matrix should have row names corresponding to gene/siRNA identifiers.

negGene

A data frame or matrix containing negative control gene/siRNA identifiers. The first column should contain gene/siRNA names that match the row names in countMat.

posGene

A data frame or matrix containing positive control gene/siRNA identifiers. The first column should contain gene/siRNA names that match the row names in countMat.

Details

The function performs the following quality control analyses:

  1. Creates jitter plots to visualize score distributions across readouts

  2. Performs t-SNE dimensionality reduction to assess global sample separation

  3. Generates boxplots to compare score distributions between control groups

  4. Calculates SSMD scores for each readout: \mathrm{SSMD} = (\mu_{pos} - \mu_{neg}) / \sqrt{\sigma_{pos}^2 + \sigma_{neg}^2}

  5. Reports the percentage of readouts with |\mathrm{SSMD}| \ge 2 (considered high quality)

SSMD scores \ge 2 indicate good separation between positive and negative controls, suggesting high-quality readouts.

Value

A list containing four diagnostic plots:

score_qc

A jitter plot showing the distribution of raw scores across all readouts for positive and negative controls

tSNE_QC

A t-SNE plot showing the global separation of positive and negative control samples in 2D space

QC_box

Side-by-side boxplots showing the distribution of scores for positive and negative controls across all readouts

QC_SSMD

A density plot showing the distribution of SSMD scores across readouts, with a threshold line at SSMD=2 and the percentage of high-quality readouts displayed

Author(s)

Yajing Hao, Shuyang Zhang, Junhui Li, Guofeng Zhao, Xiang-Dong Fu

References

Laurens van der Maaten & Geoffrey Hinton: Visualizing Data using t-SNE. Journal of Machine Learning Research 2008, 9(2008):2579-2605.

Zhang XD: A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays. Genomics 2007, 89:552-561.

Examples

data(countMat)
data(negGene)
data(posGene)
QC(countMat, negGene, posGene)


ZetaSuite documentation built on Nov. 5, 2025, 6:37 p.m.