rf_importance_correlation: Correlation between gene importances computed by random...
In pknight24/pstmeval: Functions for evaluation pseudotime estimates

Description Usage Arguments Details Value

View source: R/correlation.R

This metric is introduced in the review paper by Saelens et al. We first choose trajectory "milestones" according to our knowledge of the true trajectory. Then, for each milestone, we compute the geodesic distance according our estimated trajectory from each cell to the milestone, and use the normalized gene expression data as a covariate matrx to predict these distances with a Random Forests model. We then extract the Gini impurity index for each gene (as a measure of gene "importance"). This process is then repeated using the geodesic distances according to the true trajectory. After computing the Gini importances for each gene at all of the milestones, we take the average importance of each gene, and compute the Pearson correlation of the estimated importances and the true importances.

rf_importance_correlation(
  expression_matrix,
  pseudotime,
  truetime,
  milestones,
  ...
)

`expression_matrix`	A p x n normalized gene expression matrix.
`pseudotime`	An n-vector of pseudotime estimates.
`truetime`	An n-vector of true timpoints.
`milestones`	A subset of the true time points, used to compute Gini impurty indices.
`...`	Additional parameters passed to `ranger`.

High correlation values are supposed to indicate that the estimated trajectory is informed by the same genes as the true trajectory, which we can take as a sort of surrogate for accuracy.

A list containing the gene importances according to pseudotime, true time, and the Pearson correlation between the two.

pknight24/pstmeval documentation built on Nov. 18, 2020, 9:42 p.m.