concatenate_protein: Concatenate GDC files into a single matrix and prepar the...

Description Usage Arguments Value Examples

View source: R/concatenate_protein.R

Description

concatenate_protein is a function designed to concatenate GDC files into a single matrix, where the columns stand for patients code and rows stand for data names.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
concatenate_protein(
  name,
  data_base,
  work_dir,
  tumor,
  tumor_data = TRUE,
  only_filter = FALSE,
  tumor_type = 1,
  normal_type = 11,
  env,
  save_data = FALSE
)

Arguments

name

A character string indicating the desired values to be used in next analysis. For instance, "HIF3A" in the legacy gene expression matrix, "mir-1307" in the miRNA quantification matrix, or "HER2" in the protein quantification matrix.

data_base

A character string specifying "GDC" for GDC Data Portal or "legacy" for GDC Legacy Archive.

work_dir

A character string specifying the path to work directory.

tumor

A character string contaning one of the 33 tumors available in the TCGA project. For instance, the "BRCA" stands for breast cancer.

tumor_data

Logical value where TRUE specifies the desire to work with tumor tissue files only. When set to FALSE, it creates two matrices, one containing tumor data and other containing data from not-tumor tissue. The default is TRUE.

only_filter

Logical value where TRUE indicates that the matrix is already concatenate and the function should choose a different name, without concatenate all the files again. The default is FALSE.

tumor_type

Numerical value(s) correspondent to barcode data types:

Tumor codes:

  • 1: Primary Solid Tumor

  • 2: Recurrent Solid Tumor

  • 3: Primary Blood Derived Cancer - Peripheral Blood

  • 4: Recurrent Blood Derived Cancer - Bone Marrow

  • 5: Additional - New Primary

  • 6: Metastatic

  • 7: Additional Metastatic

  • 8: Human Tumor Original Cells

  • 9: Primary Blood Derived Cancer - Bone Marrow

The default is 1.

normal_type

Numerical value(s) correspondent to barcode data types:

Normal codes:

  • 10: Blood Derived Normal

  • 11: Solid Tissue Normal

  • 12: Buccal Cell Normal

  • 13: EBV Immortalized Normal

  • 14: Bone Marrow Normal

  • 15: sample type 15

  • 16-19: sample type 16

or

Control codes:

  • use '20:29' without quotes

The default is 11.

env

A character string containing the environment name that should be used. If none has been set yet, the function will create one in global environment following the standard criteria:

  • '<tumor>_<data_base>protein_tumor_data' or

    '<tumor><data_base>_protein_both_data' (for tumor and not tumor data in separated matrices).

save_data

Logical value where TRUE indicates that the concatenate and filtered matrix should be saved in local storage. The default is FALSE.

Value

A matrix with data names in row and patients code in column.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
library(DOAGDC)

# Concatenating gene expression data into a single matrix
# data already downloaded using the 'download_gdc' function
concatenate_protein(
    name = "Caspase-8-M-E",
    data_base = "legacy",
    tumor = "CHOL",
    work_dir = "~/Desktop"
)

Facottons/DOAGDC documentation built on April 7, 2020, 3:17 a.m.