A concise summary of the statistical methods implemented in splikit. For a hands-on walkthrough see the Splikit Manual; the full source is at https://github.com/csglab/splikit.
Splice junctions are grouped into local junction variants — junctions sharing either a 5-prime or 3-prime coordinate. For each junction, splikit builds an inclusion matrix M1 of its per-cell read counts and an exclusion matrix M2 holding the summed counts of the other junctions in its LJV. M1 and M2 are sparse dgCMatrix objects of dimension events x cells. A junction that participates in two LJVs (one per shared coordinate) contributes two rows with different M2 values; downstream code tolerates this by design.
find_variable_events() computes, for each event, the per-library binomial deviance of the inclusion ratio M1 / (M1 + M2) against an intercept-only baseline p_hat = sum(M1) / sum(M1 + M2). Events with the largest summed deviance are retained as highly variable.
find_variable_genes() offers two methods on the gene-expression matrix: "sum_deviance" fits a per-gene negative-binomial deviance with a method-of-moments theta estimate, and "vst" returns a Seurat-style variance-stabilising transformation.
get_pseudo_correlation() fits a per-event binomial logistic GLM of the inclusion ratio on a target covariate by iteratively reweighted least squares, and reports a Cox-Snell / Nagelkerke pseudo-R-squared computed from the residual deviance. This quantifies how strongly each event tracks the covariate (e.g. a cluster label or a gene's expression).
All four kernels are written in C++ via Rcpp / RcppArmadillo with OpenMP parallelism over rows or cells. make_m2() automatically falls back to a data.table batched path when the working set would overflow 32-bit Armadillo indices.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.