standardize_metadata: standardize_metadata
In ahnjedid/MetaConIdentifier: Metadata Confounding Identifier

Description Usage Arguments Value Examples

View source: R/standardize_metadata.R

A function to standardize metadata by truncating it to a subset of clinically relevant variables, specifying their variable types, and converting missing values into a single format.

standardize_metadata(
  metadata,
  first_column_as_id = TRUE,
  variable_subset,
  variable_type_vec,
  missing_value_lst = NULL
)

`metadata`	The corresponding metadata for a gene count matrix.
`first_column_as_id`	Boolean value specifying whether the first column in the metadata is the identifier/key. If not, it is assumed that the row names are.
`variable_subset`	A character vector of the metadata variables that the user wishes to subset. This should be the most clinically relevant and population relevant variables such as age, sex, and race.
`variable_type_vec`	A named character vector specifying the type of each variable. There are 3 types: categorical, numeric, and ordinal.
`missing_value_lst`	A named character list specifying the missing value(s), if it exists, in each variable.

A data.frame object of the cleaned metadata, with classes of each column specifying the variable type and all missing values converted to NA.

# Using tcga_metadata from package.
library(MetaConIdentifier)
tcga_meta_new <- standardize_metadata(tcga_meta_original,
first_column_as_id = FALSE, variable_subset = tcga_variable_subset,
variable_type_vec = tcga_variable_type_vec, missing_value_lst = NULL)

# The clean metadata should contain 2 classes: data.frame and metaStandard.
class(tcga_meta_new)