harmonize_pheno_data: Harmonize phenotype data

View source: R/functions_pheno_data.R

harmonize_pheno_dataR Documentation

Harmonize phenotype data

Description

Extend original phenotype data by new variables with harmonized names and values including sample and subject identifiers. Values of numeric variables are converted to numeric. Values of character variables are mapped to pre-specified values. For time variables numbers are extracted and converted to days. If a study contains only patients of a specific disease, this variable can be set globally.

Usage

harmonize_pheno_data(
  project,
  pheno,
  info.var,
  col.id,
  ind.use.id = NULL,
  cols.use,
  disease = NULL
)

Arguments

project

[character(1)] name of project (used as prefix for all harmonized variables)

pheno

[data.frame] original phenotype data (e.g. as returned by extract_pheno_data)

info.var

[list] project level information about variables that should be harmonized (see Details)

col.id

[character(1)] column with information about subject identifiers

ind.use.id

[numeric(1)] part of subject identifier that should be kept after splitting using " ", "_" or "-"

cols.use

[vector(n)] vector of columns used for harmonization, named by variable names given in info.var

disease

[character(1)] disease that should be set for all samples

Details

info.var contains information about all variables that should be harmonized within a project, i.e. across studies. The list needs to be named by the names of the harmonized variables and each element contains:

  • type: either "character", "numeric" or "time"

  • values: list named by final value and original values that should be mapped (can include regular expressions)

Value

[data.frame] harmonized phenotype data

Examples

# example study
study.id = "GSE67785"

# extract phenotype data from GEO
pheno.original = extract_pheno_data(
  study.id = study.id)

# prepare information about variables to be harmonized
info.var = list(
  lesional = list(
      type = "character",
      values = list(
         lesional = "PP",
         nonlesional = "PN")),
  sex = list(
      type = "character",
      values = list(
         female = "female",
         male = "^male")),
  tissue = list(
      type = "character",
      values = list(skin = "skin")))

# define columns that should be harmonized
cols.use = c(
  lesional = "group:ch1",
  tissue = "source_name_ch1",
  sex = "gender:ch1")

pheno = harmonize_pheno_data(
  project = "project",
  pheno = pheno.original,
  info.var = info.var,
  col.id = "patient:ch1",
  cols.use = cols.use)

head(pheno[, 1:6])

szymczak-lab/harmonizeGeneExprData documentation built on Dec. 1, 2022, 9:07 p.m.