flowReMix: Fitting a Mixture of Mixed Effect Models for Binomial Data

Description Usage Arguments Details Value

Description

flowReMix fits a mixture of mixed effect models to binomial or over-dispersed binomial data. The package was specifically designed for analyzing flow-cytometry cell-count data but may be suitable for other purposes as well.

Usage

1
2
3
4
5
6
flowReMix(formula, subject_id, cell_type = NULL, cluster_variable,
  data = parent.frame(), cluster_assignment = TRUE, weights = NULL,
  covariance = c("sparse", "dense", "diagonal"), ising_model = c("sparse",
  "dense", "none"), regression_method = c("betabinom", "binom", "sparse",
  "robust", "firth"), iterations = 80, parallel = TRUE, verbose = FALSE,
  control = NULL, keepSamples = FALSE, newSampler = FALSE)

Arguments

formula

an object of class formula. The response should be a matrix of two column matrix with first column containing the counts of the cell subsets of interest and the second column the difference between the reference count and the cell count.

subject_id

a vector identifying the subjects.

cell_type

a factor vector identifying which cell type each row in the data set refers to.

cluster_variable

a variable with respect to which clustering will be done. See description for more detail.

data

a data frame containing the variables in the model. It is advisable to include the subject_id, cell_type and cluster_variable variables in the data frame.

cluster_assignment

an optional matrix of known cluster assignments. Must include all subject/cell_type combinations. See description for more detail.

weights

an option vector of weights.

covariance

the method to be used for estimating the covariance structure of the random effects. pdsoft.cv will be used by default.

ising_model

a method for estimating the Ising model. Sparse neighborhood selection will be used by default.

regression_method

the regression method to be used. Default option is the glm function with family = "binomial".

iterations

the number of stochastic-EM itreations to perform.

parallel

logical. Use parallel processing to fit the model. Default TRUE.

verbose

whether to print information regrading the fitting process as the optimization algorithm runs.

control

an optional object of flowReMix_control class.

keepSamples

logical whether to keep all the samples. Fitted object takes more memory. Default TRUE.

newSampler

logical use the new sampler.. may or may not work. Default FALSE

Details

flowReMix fits a mixture of mixed effects regression models for binomial data. Accordingly, the response supplied in the formula must contain be a two column matrix the first column of which is the number of successes and the second column is the number of failiures. In the context of flow-cytomery count the left column would be the cell counts and the right columns the parent counts minus the cell count. The right side of the formula should include any number of fixed effects. For details on how the function processes the formula object see, for example, the documentation for the glm function.

The model fit by the function is a hierchical one, assuming the existence of subjects and one or more cell-types for each subject. the subject_id variable identifies different rows in the dataset as corresponding to measurements taken from specific subjects. The model assumes the existence of a random intercept for each cell_type.

The cluster_variable identifies which variable out of the covariates corresponds to the variable with respect to which clustering should be performed. The model assumes that the effect of cluster variable (and corresponding interactions) are either always zero or non-zero. For flow-cytomery experiments the cluster_variable will typically be an indicator for whether the stimulation introduced into the blood sample was an antigen or a control. A response status (zero or non-zero) is estimated for each subject/cell-type combination. The dependence between the cell-subsets is modeled via an Ising model.

cluster_assignment is an optional variable which allows the user to pre-specificy some known cluster assignments. For example, in vaccine studies we could expect all subjects who received a placebo treatment to be non-responders across all cell-subsets. This variable should be three column matrix, the first column of which should contain all unique values of subject_id, the second should column should contain all unique values of cell_type and in total the matrix should include all subject_id and cell_type combinations. The third column is an integer which takes the value 0 if the cell-type/subject combination is non-response, 1 if it is response and -1 if the response status is unknown and must be estimated.

The fitting algorithm uses one of three methods for estimating the covariance structure of the random effects. A diagonal covariance structure will be estimated if covariance = "diagonal". A dense covariance structure will be estimated with no penalization will be estimated if covariance = "dense". This may produce a singual covariance structure if the number of subjects is smaller than the number of cell-types. A sparse covariance matrix is estimated via the pdsoft.cv function by default.

The ising model describing the dependence between the response/non-resposne status of the different cell-types can be estimated via three methods. If ising_model is set to "none" then an independence model is assumed. If the ising_model is set to "dense" then the ising model is estimated via a set of firth regressions (logistf), one for each node in the graph. The default option is "sparse", where neighborhood selection with eBIC will be used.

regression_method specifies which function should be used for estimating the reqression coefficients conditionally on the values of the random effects and cluster assignments. If the default option "binom" is chosen then a binomial model is fit using the glm function. Otherwise, if "betabinom" option is selected then a beta-binomial regression model is estimated with the gamlss function. We recommend using the "sparse" method which uses the cv.glmnet procedure if the number of subjects is small and the number of predictors is large.

Value

flowReMix returns an object of class flowReMix which contains the following variables:


RGLab/flowReMix documentation built on May 8, 2019, 5:55 a.m.