Add_MALAT1_Threshold: Add MALAT1 QC Threshold

View source: R/Generics.R

Add_MALAT1_ThresholdR Documentation

Add MALAT1 QC Threshold

Description

Adds TRUE/FALSE values to each cell based on calculation of MALAT1 threshold. This function incorporates a threshold calculation and procedure as described in Clarke & Bader (2024). bioRxiv \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1101/2024.07.14.603469")}. Please cite this preprint whenever using this function.

Usage

Add_MALAT1_Threshold(object, ...)

## S3 method for class 'Seurat'
Add_MALAT1_Threshold(
  object,
  species,
  sample_col = NULL,
  malat1_threshold_name = NULL,
  ensembl_ids = FALSE,
  assay = NULL,
  overwrite = FALSE,
  print_plots = NULL,
  save_plots = FALSE,
  save_plot_path = NULL,
  save_plot_name = NULL,
  plot_width = 11,
  plot_height = 8,
  whole_object = FALSE,
  homolog_name = NULL,
  bw = 0.1,
  lwd = 2,
  breaks = 100,
  chosen_min = 1,
  smooth = 1,
  abs_min = 0.3,
  rough_max = 2,
  ...
)

Arguments

object

Seurat or LIGER object

...

Arguments passed to other methods

species

Species of origin for given Seurat Object. Only accepted species are: mouse, human (name or abbreviation).

sample_col

column name in meta.data that contains sample ID information.

malat1_threshold_name

name to use for the new meta.data column containing percent IEG gene counts. Default is set dependent on species gene symbol.

ensembl_ids

logical, whether feature names in the object are gene names or ensembl IDs (default is FALSE; set TRUE if feature names are ensembl IDs).

assay

Assay to use (default is the current object default assay).

overwrite

Logical. Whether to overwrite existing meta.data columns. Default is FALSE meaning that function will abort if columns with the name provided to malat1_threshold_name is present in meta.data slot.

print_plots

logical, should plots be printed to output when running function (default is NULL). Will automatically set to FALSE if performing across samples or TRUE if performing across whole object.

save_plots

logical, whether or not to save plots to pdf (default is FALSE).

save_plot_path

path to save location for plots (default is NULL; current working directory).

save_plot_name

name for pdf file containing plots.

plot_width

the width (in inches) for output page size. Default is 11.

plot_height

the height (in inches) for output page size. Default is 8.

whole_object

logical, whether to perform calculation on whole object (default is FALSE). Should be only be run if object contains single sample.

homolog_name

feature name for MALAT1 homolog in non-default species (if annotated).

bw

The "bandwidth" value when plotting the density function to the MALAT1 distribution; default is bw = 0.1, but this parameter should be lowered (e.g. to 0.01) if you run the function and the line that's produced doesn't look like it's tracing the shape of the histogram accurately (this will make the line less "stiff" and more fitted to the data)

lwd

The "line width" fed to the abline function which adds the vertical red line to the output plots; default is 2, and it can be increased or decreased depending on the user's plotting preferences

breaks

The number of bins used for plotting the histogram of normalized MALAT1 values; default is 100

chosen_min

The minimum MALAT1 value cutoff above which a MALAT1 peak in the density function should be found. This value is necessary to determine which peak in the density function fitted to the MALAT1 distribution is likely representative of what we would expect to find in real cells. This is because some samples may have large numbers of cells or empty droplets with lower than expected normalized MALAT1 values, and therefore have a peak close to or at zero. Ideally, "chosen_min" would be manually chosen after looking at a histogram of MALAT1 values, and be the normalized MALAT1 value that cuts out all of the cells that look like they stray from the expected distribution (a unimodal distribution above zero). The default value is 1 as this works well in many test cases, but different types of normalization may make the user want to change this parameter (e.g. Seurat's original normalization function generates different results to their SCT function) which may change the MALAT1 distribution). Increase or decrease chosen_min depending on where your MALAT1 peak is located.

smooth

The "smoothing parameter" fed into the "smooth.spline" function that adjusts the trade-off between the smoothness of the line fitting the histogram, and how closely it fits the histogram; the default is 1, and can be lowered if it looks like the line is underfitting the data, and raised in the case of overfitting. The ideal scenario is for the line to trace the histogram in a way where the only inflection point(s) are between major peaks, e.g. separating the group of poor-quality cells or empty droplets with lower normalized MALAT1 expression from higher-quality cells with higher normalized MALAT1 expression.

abs_min

The absolute lowest value allowed as the MALAT1 threshold. This parameter increases the robustness of the function if working with an outlier data distribution (e.g. an entire sample is poor quality so there is a unimodal MALAT1 distribution that is very low but above zero, but also many values close to zero) and prevents a resulting MALAT1 threshold of zero. In the case where a calculated MALAT1 value is zero, the function will return 0.3 by default.

rough_max

A rough value for the location of a MALAT1 peak if a peak is not found. This is possible if there are so few cells with higher MALAT1 values, that a distribution fitted to the data finds no local maxima. For example, if a sample only has poor-quality cells such that all have near-zero MALAT1 expression, the fitted function may look similar to a positive quadratic function which has no local maxima. In this case, the function searches for the closest MALAT1 value to the default value, 2, to use in place of a real local maximum.

Value

Seurat object with added meta.data column

Author(s)

Zoe Clark (original function and manuscript) & Samuel Marsh (wrappers and updates for inclusion in package)

References

This function incorporates a threshold calculation and procedure as described in Clarke & Bader (2024). bioRxiv \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1101/2024.07.14.603469")}. Please cite this preprint whenever using this function.

Examples

## Not run: 
object <- Add_MALAT1_Threshold(object = object, species = "Human")

## End(Not run)


scCustomize documentation built on Aug. 26, 2025, 9:08 a.m.