scmet_hvf_lvf: Detect highly (or lowly) variable features with scMET

scmet_hvfR Documentation

Detect highly (or lowly) variable features with scMET


Function for calling features as highly (or lowly) variable within a datasert or cell population. This can be thought as a feature selection step, where the highly variable features (HVF) can be used for diverse downstream tasks, such as clustering or visualisation. Two approaches for identifying HVFs (or LVFs): (1) If we correct for mean-dispersion relationship, then we work directly on residual dispersions epsilon, and define a percentile threshold delta_e. This is the preferred option since the residual overdispersion is not confounded by mean methylation levels. (2) Work directly with the overdispersion parameter gamma and define an overdispersion contribution threshold delta_g, above (below) of which we call HVFs (LVFs).


  delta_e = 0.9,
  delta_g = NULL,
  evidence_thresh = 0.8,
  efdr = 0.1

  delta_e = 0.1,
  delta_g = NULL,
  evidence_thresh = 0.8,
  efdr = 0.1



The scMET posterior object after performing inference, i.e. after calling scmet function.


Percentile threshold for residual overdispersion to detect variable features (between 0 and 1). Default: 0.9 for HVF and 0.1 for LVF (top 10%). NOTE: This parameter should be used when correcting for mean-dispersion relationship.


Overdispersion contribution threshold (between 0 and 1).


Optional parameter. Posterior evidence probability threshold parameter alpha_{H} (between 0.6 and 1).


Target for expected false discovery rate related to HVF/LVF detection (default = 0.1).


The scMET posterior object with an additional element named hvf or lvf according to the analysis performed. This is a list object containing the following elements:

  • summary: A data.frame containing HVF or LVF analysis output information per feature, including posterior medians for mu, gamma, and epsilon. The tail_prob column contains the posterior tail probability of a feature being called as HVF or LVF. The logical is_variable column informs whether the feature is called as variable or not.

  • evidence_thresh: The optimal evidence threshold.

  • efdr: The EFDR value.

  • efnr: The EFNR value.

  • efdr_grid: The EFDR values for the grid search.

  • efnr_grid: The EFNR values for the grid search.

  • evidence_thresh_grid: The grid where we searched for optimal evidence threshold.



See Also

scmet, scmet_differential


# Fit scMET
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 100)

# Run HVF analysis
obj <- scmet_hvf(scmet_obj = obj)

# Run LVF analysis
obj <- scmet_lvf(scmet_obj = obj)

andreaskapou/scMET documentation built on June 1, 2022, 11:47 p.m.