Description Usage Arguments Details Value
flowReMix
fits a mixture of mixed effect models
to binomial or over-dispersed binomial data. The package was specifically
designed for analyzing flow-cytometry cell-count data but may be suitable
for other purposes as well.
1 2 3 4 5 6 | flowReMix(formula, subject_id, cell_type = NULL, cluster_variable,
data = parent.frame(), cluster_assignment = TRUE, weights = NULL,
covariance = c("sparse", "dense", "diagonal"), ising_model = c("sparse",
"dense", "none"), regression_method = c("betabinom", "binom", "sparse",
"robust", "firth"), iterations = 80, parallel = TRUE, verbose = FALSE,
control = NULL, keepSamples = FALSE, newSampler = FALSE)
|
formula |
an object of class |
subject_id |
a vector identifying the subjects. |
cell_type |
a factor vector identifying which cell type each row in the data set refers to. |
cluster_variable |
a variable with respect to which clustering will be done. See description for more detail. |
data |
a data frame containing the variables in the model. It is
advisable to include the |
cluster_assignment |
an optional matrix of known cluster assignments. Must include all subject/cell_type combinations. See description for more detail. |
weights |
an option vector of weights. |
covariance |
the method to be used for estimating the covariance
structure of the random effects. |
ising_model |
a method for estimating the Ising model. Sparse neighborhood selection will be used by default. |
regression_method |
the regression method to be used. Default option is
the |
iterations |
the number of stochastic-EM itreations to perform. |
parallel |
|
verbose |
whether to print information regrading the fitting process as the optimization algorithm runs. |
control |
an optional object of |
keepSamples |
|
newSampler |
|
flowReMix fits a mixture of mixed effects regression models for
binomial data. Accordingly, the response supplied in the formula
must contain be a two column matrix the first column of which is the number
of successes and the second column is the number of failiures. In the
context of flow-cytomery count the left column would be the cell counts and
the right columns the parent counts minus the cell count. The right side of
the formula should include any number of fixed effects. For details on how
the function processes the formula object see, for example, the
documentation for the glm
function.
The model fit by the function is a hierchical one, assuming the existence
of subjects and one or more cell-types for each subject. the
subject_id
variable identifies different rows in the dataset as
corresponding to measurements taken from specific subjects. The model
assumes the existence of a random intercept for each cell_type
.
The cluster_variable
identifies which variable out of the covariates
corresponds to the variable with respect to which clustering should be
performed. The model assumes that the effect of cluster variable (and
corresponding interactions) are either always zero or non-zero. For
flow-cytomery experiments the cluster_variable
will typically be an
indicator for whether the stimulation introduced into the blood sample was
an antigen or a control. A response status (zero or non-zero) is estimated
for each subject/cell-type combination. The dependence between the
cell-subsets is modeled via an Ising model.
cluster_assignment
is an optional variable which allows the user to
pre-specificy some known cluster assignments. For example, in vaccine
studies we could expect all subjects who received a placebo treatment to be
non-responders across all cell-subsets. This variable should be three
column matrix, the first column of which should contain all unique values
of subject_id
, the second should column should contain all unique
values of cell_type
and in total the matrix should include all
subject_id
and cell_type
combinations. The third column is an
integer which takes the value 0 if the cell-type/subject combination is
non-response, 1 if it is response and -1 if the response status is unknown
and must be estimated.
The fitting algorithm uses one of three methods for estimating the
covariance structure of the random effects. A diagonal covariance structure
will be estimated if covariance = "diagonal"
. A dense covariance
structure will be estimated with no penalization will be estimated if
covariance = "dense"
. This may produce a singual covariance
structure if the number of subjects is smaller than the number of
cell-types. A sparse covariance matrix is estimated via the
pdsoft.cv
function by default.
The ising model describing the dependence between the response/non-resposne
status of the different cell-types can be estimated via three methods. If
ising_model
is set to "none"
then an independence model is
assumed. If the ising_model
is set to "dense"
then the ising
model is estimated via a set of firth regressions
(logistf
), one for each node in the graph. The
default option is "sparse"
, where neighborhood selection with eBIC will be used.
regression_method
specifies which function should be used for
estimating the reqression coefficients conditionally on the values of the
random effects and cluster assignments. If the default option
"binom"
is chosen then a binomial model is fit using the
glm
function. Otherwise, if "betabinom"
option
is selected then a beta-binomial regression model is estimated with the
gamlss
function. We recommend using the
"sparse"
method which uses the cv.glmnet
procedure if the number of subjects is small and the number of predictors
is large.
flowReMix
returns an object of class flowReMix
which
contains the following variables:
coefficients
a list, each component of which is a vector of
regession coefficients corresponding to a single cell type.
posteriors
a matrix containing the posterior probabilities for
response computed for each subject/cell-type combination.
levelProbs
a vector of the marginal estimated probabilities of
response estiamted for each cell subset.
randomEffects
the estimated random effects for each
subject/cell-type.
covariance
the estimated covariance structure for the random
effects.
isingCov
the estimated covariance
structure of the ising model.
dispersion
the over-dispersion estimated for each cell-subset. If
regression method is not "betabinomial" then this will be a vector of large
constants.
assignmentList
a list of matrices containing the posterior cluster
assignemnt sampled for each subject at the last iteration of the stochastic
EM algorithm.
data
the input data frame.
subject_id
the value of the subject_id argument used in the call.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.