rf_importance_correlation: Correlation between gene importances computed by random...

Description Usage Arguments Details Value

View source: R/correlation.R

Description

This metric is introduced in the review paper by Saelens et al. We first choose trajectory "milestones" according to our knowledge of the true trajectory. Then, for each milestone, we compute the geodesic distance according our estimated trajectory from each cell to the milestone, and use the normalized gene expression data as a covariate matrx to predict these distances with a Random Forests model. We then extract the Gini impurity index for each gene (as a measure of gene "importance"). This process is then repeated using the geodesic distances according to the true trajectory. After computing the Gini importances for each gene at all of the milestones, we take the average importance of each gene, and compute the Pearson correlation of the estimated importances and the true importances.

Usage

1
2
3
4
5
6
7
rf_importance_correlation(
  expression_matrix,
  pseudotime,
  truetime,
  milestones,
  ...
)

Arguments

expression_matrix

A p x n normalized gene expression matrix.

pseudotime

An n-vector of pseudotime estimates.

truetime

An n-vector of true timpoints.

milestones

A subset of the true time points, used to compute Gini impurty indices.

...

Additional parameters passed to ranger.

Details

High correlation values are supposed to indicate that the estimated trajectory is informed by the same genes as the true trajectory, which we can take as a sort of surrogate for accuracy.

Value

A list containing the gene importances according to pseudotime, true time, and the Pearson correlation between the two.


pknight24/pstmeval documentation built on Nov. 18, 2020, 9:42 p.m.