concatenate_methylation: Concatenate GDC files into a single matrix and prepar the...

Description Usage Arguments Value Examples

View source: R/concatenate_methylation.R

Description

concatenate_methylation is a function designed to concatenate GDC files into a single matrix, where the columns stand for patients code and rows stand for data names.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
concatenate_methylation(
  name,
  data_base,
  work_dir,
  tumor,
  tumor_data = TRUE,
  cutoff_beta_na = 0.25,
  cutoff_betasd = 0.005,
  only_filter = FALSE,
  tumor_type = 1,
  normal_type = 11,
  platform = "",
  env,
  save_data = FALSE
)

Arguments

name

A character string indicating the desired values to be used in next analysis. For instance, "HIF3A" in the legacy gene expression matrix, "mir-1307" in the miRNA quantification matrix, or "HER2" in the protein quantification matrix.

data_base

A character string specifying "GDC" for GDC Data Portal or "legacy" for GDC Legacy Archive.

work_dir

A character string specifying the path to work directory.

tumor

A character string contaning one of the 33 tumors available in the TCGA project. For instance, the "BRCA" stands for breast cancer.

tumor_data

Logical value where TRUE specifies the desire to work with tumor tissue files only. When set to FALSE, it creates two matrices, one containing tumor data and other containing data from not-tumor tissue. The default is TRUE.

cutoff_beta_na

Numerical value indicating the maximum threshold percentage (in decimal form) to tolerate and to remove rows containing NA for beta values (methylation data). The default is 0.25.

cutoff_betasd

Numerical value indicating the standard deviation threshold of beta values (methylation data). It keeps only rows that have standard deviation of beta values higher than the threshold. The default is 0.005.

only_filter

Logical value where TRUE indicates that the matrix is already concatenate and the function should choose a different name, without concatenate all the files again. The default is FALSE.

tumor_type

Numerical value(s) correspondent to barcode data types:

Tumor codes:

  • 1: Primary Solid Tumor

  • 2: Recurrent Solid Tumor

  • 3: Primary Blood Derived Cancer - Peripheral Blood

  • 4: Recurrent Blood Derived Cancer - Bone Marrow

  • 5: Additional - New Primary

  • 6: Metastatic

  • 7: Additional Metastatic

  • 8: Human Tumor Original Cells

  • 9: Primary Blood Derived Cancer - Bone Marrow

The default is 1.

normal_type

Numerical value(s) correspondent to barcode data types:

Normal codes:

  • 10: Blood Derived Normal

  • 11: Solid Tissue Normal

  • 12: Buccal Cell Normal

  • 13: EBV Immortalized Normal

  • 14: Bone Marrow Normal

  • 15: sample type 15

  • 16-19: sample type 16

or

Control codes:

  • use '20:29' without quotes

The default is 11.

platform

A character string indicating the platform name for methylation, exon quantificaton, miRNA, and mutation data.

  • For mutation and exon quantificaton data:"Illumina GA", "Illumina HiSeq" or "all".

  • For methylation data"Illumina Human Methylation 450", "Illumina Human Methylation 27" or "all".

  • For miRNA data:"Illumina GA", "Illumina HiSeq", "H-miRNA_8x15K" (for GBM tumor), "H-miRNA_8x15Kv2" (for OV tumor), or "all".

The default for all data_type cited is "all" (when downloading data).

env

A character string containing the environment name that should be used. If none has been set yet, the function will create one in global environment following the standard criteria:

  • '<tumor>_<data_base>methylation_tumor_data' or

    '<tumor><data_base>_methylation_both_data' (for tumor and not tumor data in separated matrices).

save_data

Logical value where TRUE indicates that the concatenate and filtered matrix should be saved in local storage. The default is FALSE.

Value

A matrix with data names in row and patients code in column.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
library(DOAGDC)

# Concatenating gene expression data into a single matrix
# data already downloaded using the 'download_gdc' function
concatenate_methylation(
    name = "HIF3A",
    data_base = "legacy",
    platform = "Illumina Human Methylation 450",
    tumor = "CHOL",
    work_dir = "~/Desktop"
)

Facottons/DOAGDC documentation built on April 7, 2020, 3:17 a.m.