diagnose_totalumi: Diagnostic function for UMI based datasets

View source: R/diagnose_totalumi.R

diagnose_totalumiR Documentation

Diagnostic function for UMI based datasets

Description

The function will fit loess line for total UMIs numbers over cell cycle position to diagnose non-fitting data, of which cells are not cycling.

Arguments

theta.v

The cell cycle position - a numeric vector with range between 0 to 2pi.

totalumis

The total UMIs number for each cell (without log2 transformation) - a numeric vector with the same length as theta.v.

span

The parameter α which controls the degree of smoothing. See loess. Default: 0.3

length.out

The number of data points on the fitted lines to be output in the prediction data.frame. Default: 200

plot

If TRUE, a ggplot scatter plot will be included in the output list. The figure will plot log2(totalumis) ~ theta.v with points and the fitted loess line. Default: TRUE

fig.title

The title of the figure. Default: NULL

point.size

The size of the point in scatter plot used by geom_scattermore. Default: 2.1

point.alpha

The alpha value (transparency) of the point in scatter plot used by geom_scattermore. Default: 0.6

line.size

The size of the fitted line, used by geom_path. Default: 0.8

line.alpha

The alpha value (transparency) of the fitted line, used by geom_path. Default: 0.8

x_lab

Title of x-axis. Default: "θ"

y_lab

Title of y-axis. Default: "log2(totalumis)"

...

Other arguments input to loess.

Details

This function fit a loess line between cell cycle position and log2 transformed total UMI number, as described in fit_periodic_loess. If almost all cells are not cycling in a dataset, the estimated cell cycle positions might be incorrect due to the shifted embedding center. Using the fact that the cell should have highest total UMI number at the end of S phase and almost half of that highest total UMI number at M phase, we could detect those datasets which should be analysesd and intepreted carefully when using tricycle package. For such probelmatic datasets, the defaul embedding center (0, 0) could lead to wrong inference. Thus, We don't rececommend using cell cycle position values if you get warnings from the diagnose_totalumi function.

Value

A diagnostic message and a list with the following elements:

  • fitted - The fitted vaues on the loess line. A vector of the length of y.

  • residual - The residual values from the fitted loess line, i.e. y - y.fit. A vector of the length of y.

  • pred.df - The prediction data.frame by uniformly sampling theta from 0 - 2*pi. Names of variables: x and y. The number of rows equals to length.out.

  • loess.o - The fitted loess object.

  • rsquared - The coefficient of determination R2. Calculated as 1 - residual sum of squares / the total sum of squares.

  • fig - When plot is TRUE, a ggplot scatter plot object will be returned with other items.

Author(s)

Shijie C. Zheng

See Also

fit_periodic_loess.

Examples

data(neurosphere_example, package = "tricycle")
neurosphere_example <- estimate_cycle_position(neurosphere_example)
diagnose.l <- diagnose_totalumi(neurosphere_example$tricyclePosition,
 neurosphere_example$TotalUMIs, plot = TRUE)

hansenlab/tricycle documentation built on March 19, 2022, 7:24 p.m.