standardize_metadata: standardize_metadata

Description Usage Arguments Value Examples

View source: R/standardize_metadata.R

Description

A function to standardize metadata by truncating it to a subset of clinically relevant variables, specifying their variable types, and converting missing values into a single format.

Usage

1
2
3
4
5
6
7
standardize_metadata(
  metadata,
  first_column_as_id = TRUE,
  variable_subset,
  variable_type_vec,
  missing_value_lst = NULL
)

Arguments

metadata

The corresponding metadata for a gene count matrix.

first_column_as_id

Boolean value specifying whether the first column in the metadata is the identifier/key. If not, it is assumed that the row names are.

variable_subset

A character vector of the metadata variables that the user wishes to subset. This should be the most clinically relevant and population relevant variables such as age, sex, and race.

variable_type_vec

A named character vector specifying the type of each variable. There are 3 types: categorical, numeric, and ordinal.

missing_value_lst

A named character list specifying the missing value(s), if it exists, in each variable.

Value

A data.frame object of the cleaned metadata, with classes of each column specifying the variable type and all missing values converted to NA.

Examples

1
2
3
4
5
6
7
8
# Using tcga_metadata from package.
library(MetaConIdentifier)
tcga_meta_new <- standardize_metadata(tcga_meta_original,
first_column_as_id = FALSE, variable_subset = tcga_variable_subset,
variable_type_vec = tcga_variable_type_vec, missing_value_lst = NULL)

# The clean metadata should contain 2 classes: data.frame and metaStandard.
class(tcga_meta_new)

ahnjedid/MetaConIdentifier documentation built on Dec. 18, 2021, 11:26 p.m.