get_cluster_features: Calculate cluster features for model building

Description Usage Arguments Details Value

View source: R/kumquat.R

Description

This function takes input data and (optionally) a table of sample metadata, and rearranges data into a matrix of features to be used for model building

Usage

1
2
3
4
5
6
7
8
9
get_cluster_features(
  tab,
  predictors = NULL,
  metadata.tab = NULL,
  variable.var = "variable",
  value.var = "value",
  endpoint.grouping = NULL,
  sample.col = "sample.id"
)

Arguments

tab

A data.frame of data in "molten" format (see Details)

predictors

Columns in tab (after optionally merging with metadata.tab) that identify predictors. The data will be processed using reshape2::dcast according to the formula endpoint.grouping1 + endpoint.grouping2 + ... ~ variable.var + predictors1 + predictors2 + ...

metadata.tab

Optional. A data.frame containing sample-level metadata to be merged with tab (see Details)

variable.var

The column in tab that identifies the variable

value.var

The column in tab that identfies the value

endpoint.grouping

Columns in tab (after optionally merging with metadata.tab) that identify the grouping of the endpoint (see Details). The data will be processed using reshape2::dcast according to the formula endpoint.grouping1 + endpoint.grouping2 + ... ~ variable.var + predictors1 + predictors2 + ... The combination of predictors and endpoint.grouping must uniquely identify every value in tab The function will throw an error if this is not the case.

sample.col

Optional, only used if metadata.tab is provided. The name of the column that will be used to merge tab with metadata.tab

Details

The input table needs to be in molten format (i.e. see reshape2::melt) with variable.var and value.var columns identifying variables and their values (for instance cell population abundances). The metadata.tab, if provided, must contain a column (identified by the sample.col function argument), which matches the names of the samples in tab (i.e. the part after the @, "sample1" in the above example). The rest of the columns in metadata.tab represent file-level metadata, which is used to identify the data corresponding to a given combination of predictors (see below) An example will help clarify the working of this function. Suppose you have collected data from multiple patients at multiple timepoints and under multiple stimulation conditions. In this case the metadata.tab would look like this

Let's assume a few different scenarios.

  1. You have subject level information (e.g. "responder" vs "non-responder") and you want to predict whether any combination of the timepoint and condition information predicts this outcome. In this case you would call the function with predictors = c("condition", "timepoint") and endpoint.grouping = "sample". The features in the resulting output would look like cluster_1_feature1_condition_timepoint

  2. You have subject and timepoint level information, and you want to see if any of the stimulation conditions predicts it. In this case you would call the function with predictors = c("condition") and endpoint.grouping = c("sample", "timepoint"). The features in the resulting output would look like cluster_1_feature1_condition

Internally this function uses reshape2::dcast to structure the data in the appropriate format with the following formula (see the reshape2::dcast documentation for details on how the formula is interpreted): endpoint.grouping1 + endpoint.grouping2 + ... ~ variable.var + predictors1 + predictors2 + ...

Value

Returns a matrix where each row corresponds to a combination of the levels of the variables specified in endpoint.grouping, and the columns are numeric features corresponding to combinations of the levels of the predictors


ParkerICI/kumquat documentation built on Dec. 18, 2021, 6:40 a.m.