psupertime: Supervised pseudotime

Description Usage Arguments Value

View source: R/psupertime.R

Description

Supervised pseudotime

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
psupertime(
  x,
  y,
  y_labels = NULL,
  assay_type = "logcounts",
  sel_genes = "hvg",
  gene_list = NULL,
  scale = TRUE,
  smooth = TRUE,
  min_expression = 0.01,
  penalization = "1se",
  method = "proportional",
  score = "xentropy",
  n_folds = 5,
  test_propn = 0.1,
  lambdas = NULL,
  max_iters = 1000,
  seed = 1234
)

Arguments

x

Either SingleCellExperiment object containing a matrix of genes * cells required, or a matrix of log TPM values (also genes * cells).

y

Vector of labels, which should have same length as number of columns in sce / x. Factor levels will be taken as the intended order for training.

y_labels

Alternative ordering and/or subset of the labels in y. All labels must be present in y. Smoothing and scaling are done on the whole dataset, before any subsetting takes place.

assay_type

If a SingleCellExperiment object is used as input, specifies which assay is to be used.

sel_genes

Method to be used to select interesting genes to be used in psupertime. Must be a string, with permitted values 'hvg', 'all', 'tf_mouse', 'tf_human' and 'list', corresponding to: highly variable genes, all genes, transcription factors in mouse, transcription factors in human, and a user-selected list. If sel_genes='list', then the parameter gene_list must also be specified as input, containing the user-specified list of genes. sel_genes may alternatively be a list, itself, specifying the parameters to be used for selecting highly variable genes via scran, with names 'hvg_cutoff', 'bio_cutoff' (optionally also 'span').

gene_list

If sel_genes is specified as 'list', gene_list specifies the list of user-specified genes.

scale

Should the log expression data for each gene be scaled to have mean zero and SD 1? Having the same scale ensures that L1-penalization functions properly; typically you would only set this to FALSE if you have already done your own scaling.

smooth

Should the data be smoothed over neighbours? This is done to denoise the data; if you already done your own denoising, set this to FALSE.

min_expression

Cutoff for excluding genes based on non-zero expression in only a small proportion of cells; default is 1% of cells.

penalization

Method of selecting level of L1-penalization. 'best' uses the value of lambda giving the best cross-validation accuracy; '1se' corresponds to largest value of lambda within 1 standard error of the best. This increases sparsity with minimal increased error (and is the default).

method

Statistical model used for ordinal logistic regression, one of 'proportional', 'forward' and 'backward', corresponding to cumulative proportional odds, forward continuation ratio and backward continuation ratio.

score

Cross-validated accuracy to be used to select model. May take values 'x_entropy' (default), or 'class_error', corresponding to cross-entropy and classification error respectively. Cross-entropy is a smooth measure, while classification error is based on discrete labels and tends to be a bit 'lumpy'.

n_folds

Number of folds to use for cross-validation; default is 5.

test_propn

Proportion of data to hold out for testing, separate to the cross-validation; default is 0.1 (10%).

lambdas

User-specified sequence of lambda values. Should be in decreasing order.

max_iters

Maximum number of iterations to run in glmnet.

seed

Random seed for specifying cross-validation folds and test data

Value

psupertime object


wmacnair/psupertime documentation built on July 10, 2020, 8:12 p.m.