concatenate_exon: Concatenate GDC files into a single matrix and prepar the...
In Facottons/DOAGDC: A Package to Download, Organize and Analyze Genomic Data Commons (GDC) Data

Description Usage Arguments Value Examples

concatenate_exon is a function designed to concatenate GDC files into a single matrix, where the columns stand for patients code and rows stand for data names.

concatenate_exon(
  data_type,
  normalization = TRUE,
  name,
  data_base,
  htseq = NULL,
  work_dir,
  tumor,
  workflow_type,
  tumor_data = TRUE,
  only_filter = FALSE,
  tumor_type = 1,
  normal_type = 11,
  platform = "",
  env,
  save_data = FALSE
)

`data_type`	Type of data. It could be `"methylation", "mutation", "clinical_supplement", "biospecimen", "gene", or "clinical"(biotab)`. Only present in "Legacy" database:`"protein", "Exon quantification", "miRNA gene quantification", "miRNA isoform quantification", "isoform", and "image"`. Only present in "GDC" database:`"miRNA Expression Quantification", and "Isoform Expression Quantification" (miRNA)`.
`normalization`	Logical value where `TRUE` specify the desire to work with normalized files only. When FALSE, in the second run, do not forget to set env argument. This argument is only applyable to gene and isoform expression data from GDC Legacy Archive. The default is `TRUE`.
`name`	A character string indicating the desired values to be used in next analysis. For instance, "HIF3A" in the legacy gene expression matrix, "mir-1307" in the miRNA quantification matrix, or "HER2" in the protein quantification matrix.
`data_base`	A character string specifying `"GDC"` for GDC Data Portal or `"legacy"` for GDC Legacy Archive.
`htseq`	A character string indicating which htseq workflow data should be downloaded (only applied to "GDC" gene expression): "Counts", "FPKM" or "FPKM-UQ".
`work_dir`	A character string specifying the path to work directory.
`tumor`	A character string contaning one of the 33 tumors available in the TCGA project. For instance, the `"BRCA"` stands for breast cancer.
`workflow_type`	A character string specifying the workflow type for mutation data in "gdc". Where: "varscan" stands for VarScan2 Variant Aggregation and Masking "mutect" stands for MuTect2 Variant Aggregation and Masking "muse" stands for MuSE Variant Aggregation and Masking "somaticsniper" stands for SomaticSniper Variant Aggregation and Masking "all" means to concatenate all workflows into a single matrix.
`tumor_data`	Logical value where `TRUE` specifies the desire to work with tumor tissue files only. When set to FALSE, it creates two matrices, one containing tumor data and other containing data from not-tumor tissue. The default is `TRUE`.
`only_filter`	Logical value where `TRUE` indicates that the matrix is already concatenate and the function should choose a different `name`, without concatenate all the files again. The default is FALSE.
`tumor_type`	Numerical value(s) correspondent to barcode data types: Tumor codes: 1: Primary Solid Tumor 2: Recurrent Solid Tumor 3: Primary Blood Derived Cancer - Peripheral Blood 4: Recurrent Blood Derived Cancer - Bone Marrow 5: Additional - New Primary 6: Metastatic 7: Additional Metastatic 8: Human Tumor Original Cells 9: Primary Blood Derived Cancer - Bone Marrow The default is 1.
`normal_type`	Numerical value(s) correspondent to barcode data types: Normal codes: 10: Blood Derived Normal 11: Solid Tissue Normal 12: Buccal Cell Normal 13: EBV Immortalized Normal 14: Bone Marrow Normal 15: sample type 15 16-19: sample type 16 or Control codes: use '20:29' without quotes The default is 11.
`platform`	A character string indicating the platform name for methylation, exon quantificaton, miRNA, and mutation data. For mutation and exon quantificaton data:`"Illumina GA", "Illumina HiSeq" or "all"`. For methylation data`"Illumina Human Methylation 450", "Illumina Human Methylation 27" or "all"`. For miRNA data:`"Illumina GA", "Illumina HiSeq", "H-miRNA_8x15K" (for GBM tumor), "H-miRNA_8x15Kv2" (for OV tumor), or "all"`. The default for all data_type cited is `"all"` (when downloading data).
`env`	A character string containing the environment name that should be used. If none has been set yet, the function will create one in global environment following the standard criteria: 'tumor_data_base_data_type_tumor_data' or 'tumor_data_base_data_type_both_data' (for tumor and not tumor data in separated matrices).
`save_data`	Logical value where `TRUE` indicates that the concatenate and filtered matrix should be saved in local storage. The default is FALSE.
`cutoff_beta_na`	Numerical value indicating the maximum threshold percentage (in decimal form) to tolerate and to remove rows containing NA for beta values (methylation data). The default is 0.25.
`cutoff_betasd`	Numerical value indicating the standard deviation threshold of beta values (methylation data). It keeps only rows that have standard deviation of beta values higher than the threshold. The default is `0.005`.
`use_hg19_mirbase20`	Logical value where `TRUE` indicates that only hg19.mirbase20 should be used. This parameter is needed when using `data_base = "legacy"` and one of the available miRNA `data_type` in "legacy" ("miRNA gene quantification" and "miRNA isoform quantification"). The default is FALSE.

A matrix with data names in row and patients code in column.

library(DOAGDC)

# Concatenating gene expression data into a single matrix
# data already downloaded using the 'download_gdc' function
concatenate_exon("gene",
    name = "HIF3A",
    data_base = "legacy",
    tumor = "CHOL",
    work_dir = "~/Desktop"
)

Facottons/DOAGDC documentation built on April 7, 2020, 3:17 a.m.

Facottons/DOAGDC index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Facottons/DOAGDC
A Package to Download, Organize and Analyze Genomic Data Commons (GDC) Data

concatenate_exon: Concatenate GDC files into a single matrix and prepar the...
In Facottons/DOAGDC: A Package to Download, Organize and Analyze Genomic Data Commons (GDC) Data

Description

Usage

Arguments

Value

Examples

Related to concatenate_exon in Facottons/DOAGDC...

R Package Documentation

Browse R Packages

We want your feedback!

Facottons/DOAGDC A Package to Download, Organize and Analyze Genomic Data Commons (GDC) Data

concatenate_exon: Concatenate GDC files into a single matrix and prepar the... In Facottons/DOAGDC: A Package to Download, Organize and Analyze Genomic Data Commons (GDC) Data

Description

Usage

Arguments

Value

Examples

Related to concatenate_exon in Facottons/DOAGDC...

R Package Documentation

Browse R Packages

We want your feedback!

Facottons/DOAGDC
A Package to Download, Organize and Analyze Genomic Data Commons (GDC) Data

concatenate_exon: Concatenate GDC files into a single matrix and prepar the...
In Facottons/DOAGDC: A Package to Download, Organize and Analyze Genomic Data Commons (GDC) Data