concatenate_mutation: Concatenate GDC files into a single matrix and prepar the...

Description Usage Arguments Value Examples

View source: R/concatenate_mutation.R

Description

concatenate_mutation is a function designed to concatenate GDC files into a single matrix, where the columns stand for patients code and rows stand for data names.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
concatenate_mutation(
  name,
  data_base,
  work_dir,
  tumor,
  workflow_type,
  tumor_data = TRUE,
  platform = "",
  env,
  save_data = FALSE
)

Arguments

name

A character string indicating the desired values to be used in next analysis. For instance, "HIF3A" in the legacy gene expression matrix, "mir-1307" in the miRNA quantification matrix, or "HER2" in the protein quantification matrix.

data_base

A character string specifying "GDC" for GDC Data Portal or "legacy" for GDC Legacy Archive.

work_dir

A character string specifying the path to work directory.

tumor

A character string contaning one of the 33 tumors available in the TCGA project. For instance, the "BRCA" stands for breast cancer.

workflow_type

A character string specifying the workflow type for mutation data in "gdc". Where:

  • "varscan" - VarScan2 Variant Aggregation and Masking

  • "mutect" - MuTect2 Variant Aggregation and Masking

  • "muse" - MuSE Variant Aggregation and Masking

  • "somaticsniper" - SomaticSniper Variant Aggregation and Masking

  • "all" means to concatenate all workflows into a single matrix.

tumor_data

Logical value where TRUE specifies the desire to work with tumor tissue files only. When set to FALSE, it creates two matrices, one containing tumor data and other containing data from not-tumor tissue. The default is TRUE.

platform

A character string indicating the platform name for methylation, exon quantificaton, miRNA, and mutation data.

  • For mutation and exon quantificaton data:"Illumina GA", "Illumina HiSeq" or "all".

  • For methylation data"Illumina Human Methylation 450", "Illumina Human Methylation 27" or "all".

  • For miRNA data:"Illumina GA", "Illumina HiSeq", "H-miRNA_8x15K" (for GBM tumor), "H-miRNA_8x15Kv2" (for OV tumor), or "all".

The default for all data_type cited is "all" (when downloading data).

env

A character string containing the environment name that should be used. If none has been set yet, the function will create one in global environment following the standard criteria:

  • '<tumor>_<data_base>mutation_tumor_data' or

    '<tumor><data_base>_mutation_both_data' (for tumor and not tumor data in separated matrices).

save_data

Logical value where TRUE indicates that the concatenate and filtered matrix should be saved in local storage. The default is FALSE.

Value

A matrix with data names in row and patients code in column.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
library(DOAGDC)

# Concatenating gene expression data into a single matrix
# data already downloaded using the 'download_gdc' function
concatenate_mutation(
    name = "HIF3A",
    workflow_type = "varscan",
    platform = "Illumina GA",
    data_base = "legacy",
    tumor = "CHOL",
    work_dir = "~/Desktop"
)

Facottons/DOAGDC documentation built on April 7, 2020, 3:17 a.m.