convert_biodata: Format biological data

View source: R/convert_biodata.R

convert_biodataR Documentation

Format biological data

Description

Merges gene and cell datasets with the same TCGA sample identifiers, splits samples according to the expression levels of a selected gene into two categories (below or above average) and formats into a 3-column data frame: gene expression levels, cell types, and gene expression values.

Usage

convert_biodata(
  genes,
  cells,
  select = colnames(genes)[3],
  stat = "mean",
  disease = NULL,
  tissue = NULL
)

Arguments

genes

data frame whose first two columns contain identifiers and the others float values.

cells

data frame whose first two columns contain identifiers and the others float values.

select

character for a column name in genes.

stat

character for the statistic to be chosen among "mean", "median" or "quantile".

disease

character for the type of TCGA cancer (see the list in extdata/disease_names.csv).

tissue

character for the type of TCGA tissue among : 'Additional - New Primary', 'Additional Metastatic', 'Metastatic', 'Primary Blood Derived Cancer - Peripheral Blood', 'Primary Tumor', 'Recurrent Tumor', 'Solid Tissue Normal'

Details

disease and tissue arguments should be displayed in the title of plot.biodata() only if the genes argument does not already have them in its attributes.

Value

data frame with the following columns:

  • high (logical): the expression levels of a selected gene, TRUE for below or FALSE for above average.

  • cells (factor): cell types.

  • value (float): the abundance estimation of the cell types.

Examples

data(tcga)
(df_formatted <- convert_biodata(tcga$genes, tcga$cells$Cibersort, "ICOS"))

tcgaViz documentation built on April 4, 2023, 5:14 p.m.