get_cluster_features: Calculate cluster features for model building
In ParkerICI/kumquat: Identification of stratifying subpopulations in Flow Cytometry Data (based on the Citrus package)

Description Usage Arguments Details Value

This function takes input data and (optionally) a table of sample metadata, and rearranges data into a matrix of features to be used for model building

get_cluster_features(
  tab,
  predictors = NULL,
  metadata.tab = NULL,
  variable.var = "variable",
  value.var = "value",
  endpoint.grouping = NULL,
  sample.col = "sample.id"
)

`tab`	A `data.frame` of data in "molten" format (see Details)
`predictors`	Columns in `tab` (after optionally merging with `metadata.tab`) that identify predictors. The data will be processed using `reshape2::dcast` according to the formula `endpoint.grouping1 + endpoint.grouping2 + ... ~ variable.var + predictors1 + predictors2 + ...`
`metadata.tab`	Optional. A `data.frame` containing sample-level metadata to be merged with `tab` (see Details)
`variable.var`	The column in `tab` that identifies the variable
`value.var`	The column in `tab` that identfies the value
`endpoint.grouping`	Columns in `tab` (after optionally merging with `metadata.tab`) that identify the grouping of the endpoint (see Details). The data will be processed using `reshape2::dcast` according to the formula `endpoint.grouping1 + endpoint.grouping2 + ... ~ variable.var + predictors1 + predictors2 + ...` The combination of `predictors` and `endpoint.grouping` must uniquely identify every value in `tab` The function will throw an error if this is not the case.
`sample.col`	Optional, only used if `metadata.tab` is provided. The name of the column that will be used to merge `tab` with `metadata.tab`

The input table needs to be in molten format (i.e. see reshape2::melt) with variable.var and value.var columns identifying variables and their values (for instance cell population abundances). The metadata.tab, if provided, must contain a column (identified by the sample.col function argument), which matches the names of the samples in tab (i.e. the part after the @, "sample1" in the above example). The rest of the columns in metadata.tab represent file-level metadata, which is used to identify the data corresponding to a given combination of predictors (see below) An example will help clarify the working of this function. Suppose you have collected data from multiple patients at multiple timepoints and under multiple stimulation conditions. In this case the metadata.tab would look like this

sample.id This is used to merge sample metadata with the input data (see above)
timepoint The timepoint information
condition The stimulation condition
subject The subjet each file was derived from

Let's assume a few different scenarios.

You have subject level information (e.g. "responder" vs "non-responder") and you want to predict whether any combination of the timepoint and condition information predicts this outcome. In this case you would call the function with predictors = c("condition", "timepoint") and endpoint.grouping = "sample". The features in the resulting output would look like cluster_1_feature1_condition_timepoint
You have subject and timepoint level information, and you want to see if any of the stimulation conditions predicts it. In this case you would call the function with predictors = c("condition") and endpoint.grouping = c("sample", "timepoint"). The features in the resulting output would look like cluster_1_feature1_condition

Internally this function uses reshape2::dcast to structure the data in the appropriate format with the following formula (see the reshape2::dcast documentation for details on how the formula is interpreted): endpoint.grouping1 + endpoint.grouping2 + ... ~ variable.var + predictors1 + predictors2 + ...

Returns a matrix where each row corresponds to a combination of the levels of the variables specified in endpoint.grouping, and the columns are numeric features corresponding to combinations of the levels of the predictors

ParkerICI/kumquat documentation built on Dec. 18, 2021, 6:40 a.m.

ParkerICI/kumquat index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ParkerICI/kumquat
Identification of stratifying subpopulations in Flow Cytometry Data (based on the Citrus package)

get_cluster_features: Calculate cluster features for model building
In ParkerICI/kumquat: Identification of stratifying subpopulations in Flow Cytometry Data (based on the Citrus package)

Description

Usage

Arguments

Details

Value

Related to get_cluster_features in ParkerICI/kumquat...

R Package Documentation

Browse R Packages

We want your feedback!

ParkerICI/kumquat Identification of stratifying subpopulations in Flow Cytometry Data (based on the Citrus package)

get_cluster_features: Calculate cluster features for model building In ParkerICI/kumquat: Identification of stratifying subpopulations in Flow Cytometry Data (based on the Citrus package)

Description

Usage

Arguments

Details

Value

Related to get_cluster_features in ParkerICI/kumquat...

R Package Documentation

Browse R Packages

We want your feedback!

ParkerICI/kumquat
Identification of stratifying subpopulations in Flow Cytometry Data (based on the Citrus package)

get_cluster_features: Calculate cluster features for model building
In ParkerICI/kumquat: Identification of stratifying subpopulations in Flow Cytometry Data (based on the Citrus package)