ZetaSuite: A Comprehensive Guide to Multi-dimensional High-throughput Data Analysis
In ZetaSuite: Analyze High-Dimensional High-Throughput Dataset and Quality Control Single-Cell RNA-Seq

knitr::opts_chunk$set(warning = FALSE, message = FALSE, fig.width = 8, fig.height = 6)

Introduction

Overview

ZetaSuite is an R package designed for analyzing multi-dimensional high-throughput screening data, particularly two-dimensional RNAi screens and single-cell RNA sequencing data. The package addresses the limitations of simple Z-based statistics when dealing with complex multi-dimensional datasets where experimental noise and off-target effects accumulate.

Key Features

Quality Control Analysis: Comprehensive evaluation of experimental design and data quality
Z-score Normalization: Standardization using negative controls as reference
Event Coverage Analysis: Quantification of regulatory effects across thresholds
Zeta Score Calculation: Area-under-curve based scoring for regulatory effects
SVM-based Background Correction: Machine learning approach to filter noise
Screen Strength Analysis: Optimal threshold selection for hit identification
Single Cell Quality Control: Cell quality assessment for scRNA-seq data

Applications

Two-dimensional RNAi screens: Identify genes critical for cell fitness and proliferation
Alternative splicing analysis: Detect splicing regulators and their effects
Single-cell RNA-seq QC: Differentiate high-quality cells from damaged ones
Cancer dependency screens: Reveal genes with opposite roles in cancer biology

Installation and Setup

R Package Installation

# Install from CRAN
install.packages("ZetaSuite")

# Load the package
library(ZetaSuite)

Interactive Shiny Application

ZetaSuite includes an interactive web interface for easy-to-use analysis:

# Launch the Shiny app
ZetaSuiteApp()

# Launch without opening browser automatically
ZetaSuiteApp(launch.browser = FALSE)

# Launch on a specific port
ZetaSuiteApp(port = 3838)

The Shiny app provides: - Interactive data upload and visualization - Step-by-step analysis workflow with progress indicators - Real-time results and interactive plots - Data export capabilities for all analysis results - Built-in example dataset for demonstration - Bug report integration with GitHub issues

Original ZetaSuite Installation

Example Dataset

The package includes an example dataset from an in-house HTS2 screening experiment. This dataset contains:

countMat: 1,609 genes × 100 alternative splicing events (fold change values)
negGene: 30 negative control genes (non-specific siRNAs)
posGene: 20 positive control genes (PTB-targeting siRNAs)
nonExpGene: 50 non-expressed genes (RPKM < 1 in HeLa cells)
ZseqList: Pre-calculated Z-score thresholds
SVMcurve: Pre-calculated SVM decision boundaries

library(ZetaSuite)

# Load example data
data(countMat)
data(negGene)
data(posGene)
data(nonExpGene)
data(ZseqList)
data(SVMcurve)

# Display data dimensions
cat("Count matrix dimensions:", dim(countMat), "\n")
cat("Negative controls:", nrow(negGene), "genes\n")
cat("Positive controls:", nrow(posGene), "genes\n")
cat("Non-expressed genes:", nrow(nonExpGene), "genes\n")

Analysis Workflow

Step 1: Quality Control Analysis

Quality control evaluates the ability of functional readouts to discriminate between negative and positive controls. This step provides diagnostic plots and SSMD (Strictly Standardized Mean Difference) scores.

# Perform quality control analysis
qc_results <- QC(countMat, negGene, posGene)

# Display QC plots
cat("QC analysis completed. Generated", length(qc_results), "diagnostic plots.\n")

QC Plot 1: Score Distribution

This plot shows the distribution of raw scores across all readouts for positive and negative controls.

qc_results$score_qc

QC Plot 2: t-SNE Visualization

Global evaluation of sample separation based on all readouts.

qc_results$tSNE_QC

QC Plot 3: Box Plots

Side-by-side comparison of score distributions between control groups.

grid::grid.draw(qc_results$QC_box)

QC Plot 4: SSMD Distribution

Distribution of SSMD scores with quality threshold (SSMD ≥ 2).

qc_results$QC_SSMD

Step 2: Z-score Normalization

Z-score normalization standardizes the data using negative controls as reference, making readouts comparable across different conditions.

# Calculate Z-scores
zscore_matrix <- Zscore(countMat, negGene)

# Display first few rows and columns
cat("Z-score matrix dimensions:", dim(zscore_matrix), "\n")
cat("First 5 rows and columns:\n")
print(zscore_matrix[1:5, 1:5])

Step 3: Event Coverage Analysis

Event coverage quantifies the proportion of readouts that exceed different Z-score thresholds for each gene, creating the foundation for zeta score calculations.

# Calculate event coverage
ec_results <- EventCoverage(zscore_matrix, negGene, posGene, binNum = 100, combine = TRUE)

# Display event coverage plots
cat("Event coverage analysis completed.\n")

Event Coverage Plots

# Decrease direction (exon skipping)
ec_results[[2]]$EC_jitter_D

# Increase direction (exon inclusion)
ec_results[[2]]$EC_jitter_I

Step 4: Zeta Score Calculation

Zeta scores represent the area under the event coverage curve, quantifying the cumulative regulatory effect of each gene across all Z-score thresholds.

# Calculate zeta scores without SVM correction
zeta_scores <- Zeta(zscore_matrix, ZseqList, SVM = FALSE)

# Display summary statistics
cat("Zeta score summary:\n")
cat("Number of genes:", nrow(zeta_scores), "\n")
cat("Zeta_D range:", range(zeta_scores$Zeta_D), "\n")
cat("Zeta_I range:", range(zeta_scores$Zeta_I), "\n")

# Show top hits
cat("\nTop 10 genes by Zeta_D (decrease direction):\n")
top_decrease <- head(zeta_scores[order(zeta_scores$Zeta_D, decreasing = TRUE), ], 10)
print(top_decrease)

Step 5: SVM Background Correction (Optional)

SVM analysis creates decision boundaries to separate positive and negative controls, enabling background correction in zeta score calculations.

# Run SVM analysis (can be computationally intensive)
svm_results <- SVM(ec_results)

# Calculate zeta scores with SVM correction
zeta_scores_svm <- Zeta(zscore_matrix, ZseqList, SVMcurve = svm_results, SVM = TRUE)

Step 6: Screen Strength Analysis

Screen Strength analysis determines optimal cutoff thresholds by balancing sensitivity and specificity, using the ratio of apparent FDR to baseline FDR.

# Calculate FDR cutoffs and Screen Strength
fdr_results <- FDRcutoff(zeta_scores, negGene, posGene, nonExpGene, combine = TRUE)

# Display Screen Strength plots
cat("Screen Strength analysis completed.\n")

Zeta Score Distribution by Gene Type

fdr_results[[2]]$Zeta_type

Screen Strength Curves

fdr_results[[2]]$SS_cutOff

FDR Cutoff Results

# Display FDR cutoff results
fdr_table <- fdr_results[[1]]
cat("FDR cutoff results summary:\n")
cat("Number of thresholds tested:", nrow(fdr_table), "\n")
cat("Screen Strength range:", range(fdr_table$SS), "\n")

# Show optimal thresholds (SS > 0.8)
optimal_thresholds <- fdr_table[fdr_table$SS > 0.8, ]
cat("\nOptimal thresholds (SS > 0.8):\n")
print(head(optimal_thresholds, 10))

Step 7: Hit Selection

Based on the Screen Strength analysis, select appropriate thresholds for hit identification.

# Example: Select threshold with SS > 0.8 and reasonable number of hits
selected_threshold <- fdr_table[fdr_table$SS > 0.8 & fdr_table$TotalHits > 50, ]
if(nrow(selected_threshold) > 0) {
  best_threshold <- selected_threshold[which.max(selected_threshold$SS), ]
  cat("Recommended threshold:", best_threshold$Cut_Off, "\n")
  cat("Screen Strength:", best_threshold$SS, "\n")
  cat("Total hits:", best_threshold$TotalHits, "\n")

  # Identify hits
  combined_zeta <- zeta_scores$Zeta_D + zeta_scores$Zeta_I
  hits <- names(combined_zeta[combined_zeta >= best_threshold$Cut_Off])
  cat("Number of hits identified:", length(hits), "\n")
}

Single Cell RNA-seq Quality Control

ZetaSuite also provides functionality for single-cell RNA-seq quality control, helping to differentiate high-quality cells from damaged ones.

# Example single cell analysis (requires single cell count matrix)
# single_cell_results <- ZetaSuitSC(count_matrix_sc, binNum = 10, filter = TRUE)

Advanced Usage

Custom Parameter Settings

# Custom event coverage analysis
ec_custom <- EventCoverage(zscore_matrix, negGene, posGene, 
                          binNum = 200,    # More bins for finer resolution
                          combine = FALSE) # Separate decrease/increase directions

# Custom zeta score calculation with SVM
zeta_custom <- Zeta(zscore_matrix, ZseqList, 
                   SVMcurve = svm_results, 
                   SVM = TRUE)  # Use SVM correction

# Custom FDR analysis
fdr_custom <- FDRcutoff(zeta_scores, negGene, posGene, nonExpGene, 
                       combine = FALSE)  # Analyze directions separately

Batch Processing

# Example: Process multiple datasets
datasets <- list(dataset1 = list(countMat = countMat1, negGene = negGene1),
                dataset2 = list(countMat = countMat2, negGene = negGene2))

results <- lapply(datasets, function(ds) {
  zscore <- Zscore(ds$countMat, ds$negGene)
  ec <- EventCoverage(zscore, ds$negGene, ds$posGene, binNum = 100)
  zeta <- Zeta(zscore, ZseqList, SVM = FALSE)
  return(list(zscore = zscore, ec = ec, zeta = zeta))
})

Interpretation Guidelines

Quality Control Interpretation

SSMD ≥ 2: High-quality readouts with good separation between controls
t-SNE separation: Clear clustering indicates good experimental design
Box plot overlap: Minimal overlap suggests effective control design

Zeta Score Interpretation

Higher Zeta_D: Genes promoting exon skipping
Higher Zeta_I: Genes promoting exon inclusion
Combined score: Overall regulatory effect strength

Screen Strength Interpretation

SS = 1: Perfect separation (no false positives)
SS > 0.8: Excellent separation
SS > 0.6: Good separation
Optimal threshold: Balance between sensitivity and specificity

Troubleshooting

Common Issues

Insufficient controls: Ensure adequate number of positive/negative controls
Data quality: Check for missing values and extreme outliers
Memory issues: Reduce bin number for large datasets
SVM convergence: Adjust SVM parameters or use pre-calculated curves

Performance Optimization

Use combine = TRUE for faster event coverage analysis
Reduce binNum for large datasets
Consider using pre-calculated SVM curves for repeated analyses

References

Software Citation

Hao, Y., Shao, C., Zhao, G., Fu, X.D. (2021). ZetaSuite: A Computational Method for Analyzing Multi-dimensional High-throughput Data, Reveals Genes with Opposite Roles in Cancer Dependency. Forthcoming

Dataset Citation

Shao, C., Hao, Y., Qiu, J., Zhou, B., Li, H., Zhou, Y., Meng, F., Jiang, L., Gou, L.T., Xu, J., Li, Y., Wang, H., Yeo, G.W., Wang, D., Ji, X., Glass, C.K., Aza-Blanc, P., Fu, X.D. (2021). HTS2 Screen for Global Splicing Regulators Reveals a Key Role of the Pol II Subunit RPB9 in Coupling between Transcription and Pre-mRNA Splicing. Cell. Forthcoming

Session Information

sessionInfo()

Any scripts or data that you put into this service are public.

ZetaSuite documentation built on Nov. 5, 2025, 6:37 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ZetaSuite Analyze High-Dimensional High-Throughput Dataset and Quality Control Single-Cell RNA-Seq

ZetaSuite: A Comprehensive Guide to Multi-dimensional High-throughput Data Analysis In ZetaSuite: Analyze High-Dimensional High-Throughput Dataset and Quality Control Single-Cell RNA-Seq