COMPASS: Fit the COMPASS Model
In COMPASS: Combinatorial Polyfunctionality Analysis of Single Cells

Description Usage Arguments Value Category Filter See Also Examples

This function fits the COMPASS model.

COMPASS(
  data,
  treatment,
  control,
  subset = NULL,
  category_filter = function(x) colSums(x > 5) > 2,
  filter_lowest_frequency = 0,
  filter_specific_markers = NULL,
  model = "discrete",
  iterations = 40000,
  replications = 8,
  keep_original_data = FALSE,
  verbose = TRUE,
  dropDegreeOne = FALSE,
  init_with_fisher = FALSE,
  ...
)

`data`	An object of class `COMPASSContainer`.
`treatment`	An R expression, evaluated within the metadata, that returns `TRUE` for those samples that should belong to the treatment group. For example, if the samples that received a positive stimulation were named `"92TH023 Env"` within a variable in `meta` called `Stim`, you could write `Stim == "92TH023 Env"`. The expression should have the name of the stimulation vector on the left hand side.
`control`	An R expression, evaluated within the metadata, that returns `TRUE` for those samples that should belong to the control group. See above for details.
`subset`	An expression used to subset the data. We keep only the samples for which the expression evaluates to `TRUE` in the metadata.
`category_filter`	A filter for the categories that are generated. This is a function that will be applied to the treatment counts matrix generated from the intensities. Only categories meeting the `category_filter` criteria will be kept.
`filter_lowest_frequency`	A number specifying how many of the least expressed markers should be removed.
`filter_specific_markers`	Similar to `filter_lowest_frequency`, but lets you explicitly exclude markers.
`model`	A string denoting which model to fit; currently, only the discrete model (`"discrete"`) is available.
`iterations`	The number of iterations (per 'replication') to perform.
`replications`	The number of 'replications' to perform. In order to conserve memory, we only keep the model estimates from the last replication.
`keep_original_data`	Keep the original `COMPASSContainer` as part of the `COMPASS` output? If memory or disk space is an issue, you may set this to `FALSE`.
`verbose`	Boolean; if `TRUE` we output progress information.
`dropDegreeOne`	Boolean; if `TRUE` we drop degree one categories and merge them with the negative subset.
`init_with_fisher`	Boolean;initialize from fisher's exact test. Any subset and subject with lower 95 Otherwise initialize very subject and subset as a responder except those where ps <= pu.
`...`	Other arguments; currently unused.

A COMPASSResult is a list with the following components:

`fit`	A list of various fitted parameters resulting from the `COMPASS` model fitting procedure.
`data`	The data used as input to the `COMPASS` fitting procedure – in particular, the counts matrices generated for the selected categories, `n_s` and `n_u`, can be extracted from here.
`orig`	If `keep_original_data` was set to `TRUE` in the `COMPASS` fit, then this will be the `COMPASSContainer` passed in. This is primarily kept for easier running of the Shiny app.

The fit component is a list with the following components:

`alpha_s`	The hyperparameter shared across all subjects under the stimulated condition. It is updated through the `COMPASS` model fitting process.
`A_alphas`	The acceptance rate of `alpha_s`, as computed through the MCMC sampling process in `COMPASS`.
`alpha_u`	The hyperparameter shared across all subjects under the unstimulated condition. It is updated through the `COMPASS` model fitting process.
`A_alphau`	The acceptance rate of `alpha_u`, as computed through the MCMC sampling process in `COMPASS`.
`gamma`	An array of dimensions `I x K x T`, where `I` denotes the number of individuals, `K` denotes the number of categories / subsets, and `T` denotes the number of iterations. Each cell in a matrix for a given iteration is either zero or one, reflecting whether individual `i` is responding to the stimulation for subset `k`.
`mean_gamma`	A matrix of mean response rates. Each cell denotes the mean response of individual `i` and subset `k`.
`A_gamma`	The acceptance rate for the gamma. Each element corresponds to the number of times an individual's `gamma` vector was updated.
`categories`	The category matrix, showing which categories entered the model.
`model`	The type of model called.
`posterior`	Posterior measures from the sample fit.
`call`	The matched call used to generate the model fit.

The data component is a list with the following components:

`n_s`	The counts matrix for stimulated samples.
`n_u`	The counts matrix for unstimulated samples.
`counts_s`	Total cell counts for stimulated samples.
`counts_u`	Total cell counts for unstimulated samples.
`categories`	The categories matrix used to define which categories will enter the model.
`meta`	The metadata. Note that only individual-level metadata will be kept; sample-specific metadata is dropped.
`sample_id`	The name of the vector in the metadata used to identify the samples.
`individual_id`	The name of the vector in the metadata used to identify the individuals.

The orig component (included if keep_original_data is TRUE) is the COMPASSContainer object used in the model fit.

The category filter is used to exclude categories (combinations of markers expressed for a particular cell) that are expressed very rarely. It is applied to the treatment counts matrix, which is a N samples by K categories matrix. Those categories which are mostly unexpressed can be excluded here. For example, the default criteria,

category_filter=function(x) colSums(x > 5) > 2

indicates that we should only retain categories for which at least three samples had at least six cells expressing that particular combination of markers.

COMPASSContainer, for constructing the data object required by COMPASS

data(COMPASS) ## loads the COMPASSContainer 'CC'
fit <- COMPASS(CC,
  category_filter=NULL,
  treatment=trt == "Treatment",
  control=trt == "Control",
  verbose=FALSE,
  iterations=100 ## set higher for a real analysis
)