mean_variance_fit: Linear regression on gene variance and mean

Description Usage Arguments Details Value

View source: R/gene_statistics.R

Description

This function performs a linear regression on the log-transformed values of the variance vs mean, i.e., log(variance)~log(mean). Genes whose (log(mean), log(variance)) points stay above the fitted line can be considered high variance genes. These values will have a negative residual (.resid column in dataframe residuals).

Usage

1
mean_variance_fit(gene_stat_df)

Arguments

gene_stat_df

A dataframe of at least three columns: gene, mean, and variance.

Details

To keep only the high variance genes, you can filter by residual: mean_variance_fit(foo)$residuals %>% dplyr::filter(.resid < 0).

Value

A list of three dataframes:

regression_coeficients

Information about the regression coeficients. Each row is a regression coefficient.

goodness_of_fit

Information on the goodness of fit.

residuals

Information about each fitted point. The column .resid contains the difference between the fitted line and the data point, i.e., \hat{y}-y. Please note this is not the most customary definition of residual: broom issue 802. But, given this definition, the high variance genes are indeed those that show a negative residual.


ramiromagno/oscillation documentation built on April 20, 2020, 10:37 a.m.