runLineagePulse: LineagePulse wrapper: Differential expression analysis on...

Description Usage Arguments Details Value Author(s) Examples

Description

This function performs all steps of longitudinal or discrete differential expression analysis in a continuous covariate (such as pseudotime) or according to a grouping (such as clusters or dconditions).

Usage

1
2
3
4
5
6
7
8
runLineagePulse(counts, dfAnnotation = NULL, vecConfoundersMu = NULL,
  vecConfoundersDisp = NULL, strMuModel = "splines",
  strDispModelFull = "constant", strDispModelRed = "constant",
  strDropModel = "logistic", strDropFitGroup = "PerCell",
  scaDFSplinesMu = 6, scaDFSplinesDisp = 3, matPiConstPredictors = NULL,
  vecNormConstExternal = NULL, boolEstimateNoiseBasedOnH0 = TRUE,
  scaMaxEstimationCycles = 20, scaNProc = 1, boolVerbose = TRUE,
  boolSuperVerbose = FALSE)

Arguments

counts

(matrix genes x cells (sparseMatrix or standard), SummarizedExperiment or file) Matrix: Count data of all cells, unobserved entries are NA. SummarizedExperiment or SingleCellExperiment: Count data of all cells in assay(counts) and annotation data can be supplied as colData(counts) or separately via dfAnnotation. file: .mtx file from which count matrix is to be read.

dfAnnotation

(data frame cells x meta characteristics) [Default NULL] Annotation table which contains meta data on cells. This data frame may be supplied as colData(counts) if counts is a SummerizedExperiment or SingleCellExperiment object. May contain the following columns cell: Cell IDs. continuous: Pseudotemporal coordinates of cells. Confounder1: Batch labels of cells with respect to first confounder. Name is arbitrary: Could for example be "patient" with batch labels patientA, patientB, patientC. Confounder2: As Confounder1 for another confounding variable. ... ConfounderX. population: Fixed population assignments (for strMuModel="MM"). Cells not assigned have to be NA. groups: Discrete grouping of cells (e.g. clusters or experimental conditions which are to be used as popuation structure if strMuModel or strDispModel are "groups"). rownames: Must be IDs from column cell. Remaining entries in table are ignored.

vecConfoundersMu

(vector of strings number of confounders on mean) [Default NULL] Confounders to correct for in mu batch correction model, must be subset of column names of dfAnnotation which describe condounding variables.

vecConfoundersDisp

(vector of strings number of confounders on dispersion) [Default NULL] Confounders to correct for in dispersion batch correction model, must be subset of column names of dfAnnotation which describe condounding variables.

strMuModel

(str) "constant", "groups", "MM", "splines","impulse" [Default "splines"] Model according to which the mean parameter is fit to each gene as a function of population structure in the alternative model (H1).

strDispModelFull

(str) "constant", "groups", "splines" [Default "constant"] Model according to which dispersion parameter is fit to each gene as a function of population structure in the alternative model (H1).

strDispModelRed

(str) "constant", "groups", "splines" [Default "constant"] Model according to which dispersion parameter is fit to each gene as a function of population structure in the null model (H0).

strDropModel

(str) "logistic_ofMu", "logistic" [Default "logistic"] Definition of drop-out model. "logistic_ofMu" - include the fitted mean in the linear model of the drop-out rate and use offset and matPiConstPredictors. "logistic" - only use offset and matPiConstPredictors.

strDropFitGroup

(str) "PerCell", "AllCells" [Defaul "PerCell"] Definition of groups on cells on which separate drop-out model parameterisations are fit. "PerCell" - one parametersiation (fit) per cell "ForAllCells" - one parametersiation (fit) for all cells

scaDFSplinesMu

(sca) [Default 6] If strMuModel=="splines", the degrees of freedom of the natural cubic spline to be used as a mean parameter model.

scaDFSplinesDisp

(sca) [Default 3] If strDispModelFull=="splines" or strDispModelRed=="splines", the degrees of freedom of the natural cubic spline to be used as a dispersion parameter model.

matPiConstPredictors

(numeric matrix genes x number of constant gene-wise drop-out predictors) Predictors for logistic drop-out fit other than offset and mean parameter (i.e. parameters which are constant for all observations in a gene and externally supplied.) Is null if no constant predictors are supplied

vecNormConstExternal

(numeric vector number of cells) Model scaling factors, one per cell. These factors will linearly scale the mean model for evaluation of the loglikelihood. Must be named according to the column names of matCounts.

boolEstimateNoiseBasedOnH0

(bool) [Default TRUE] Whether to co-estimate logistic drop-out model with the constant null model or with the alternative model. The co-estimation with the noise model typically extends the run-time of this model-estimation step strongly. While the drop-out model is more accurate if estimated based on a more realistic model expression model (the alternative model), a trade-off for speed over accuracy can be taken and the dropout model can be chosen to be estimated based on the constant null expression model (set to TRUE).

scaMaxEstimationCycles

(integer) [Default 20] Maximum number of estimation cycles performed in fitZINB(). One cycle contain one estimation of of each parameter of the zero-inflated negative binomial model as coordinate ascent.

scaNProc

(scalar) [Default 1] Number of processes for parallelisation.

boolVerbose

(bool) Whether to follow convergence of the iterative parameter estimation with one report per cycle.

boolSuperVerbose

(bool) Whether to follow convergence of the iterative parameter estimation in high detail with local convergence flags and step-by-step loglikelihood computation.

Details

This function is the wrapper function for the LineagePulse algorithm which performs differential expression analysis in pseudotime. Note that LineagePulse has many input parameters but only few will be relevant for you and you will be able to leave the remaining ones as their defaults. Read up on specific input parameters in the input parameter annotation of this function in the vignette.

Value

dfDEAnalysis (data frame genes x reported variables) Summary of differential expression analysis:

Author(s)

David Sebastian Fischer

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
lsSimulatedData <- simulateContinuousDataSet(
    scaNCells = 100,
    scaNConst = 10,
    scaNLin = 10,
    scaNImp = 10,
    scaMumax = 100,
    scaSDMuAmplitude = 3,
    vecNormConstExternal=NULL,
    vecDispExternal=rep(20, 30),
    vecGeneWiseDropoutRates = rep(0.1, 30))
matDropoutPredictors <- as.matrix(data.frame(
    log_means = log(rowMeans(lsSimulatedData$counts)+1) ))
objLP <- runLineagePulse(
    counts = lsSimulatedData$counts,
    dfAnnotation = lsSimulatedData$annot,
    strMuModel = "splines", scaDFSplinesMu = 6,
    strDropModel = "logistic", 
    matPiConstPredictors = matDropoutPredictors)
tail(objLP$dfResults)

YosefLab/LineagePulse documentation built on May 6, 2019, 2:19 p.m.