tcgaCompare: Compare mutation load against TCGA cohorts

Description Usage Arguments Details Value Source References Examples

View source: R/tcgacompare.R

Description

Compares mutation load in input MAF against all of 33 TCGA cohorts derived from MC3 project.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
tcgaCompare(
  maf,
  capture_size = NULL,
  tcga_capture_size = 35.8,
  cohortName = NULL,
  tcga_cohorts = NULL,
  primarySite = FALSE,
  col = c("gray70", "black"),
  bg_col = c("#EDF8B1", "#2C7FB8"),
  medianCol = "red",
  decreasing = FALSE,
  logscale = TRUE,
  rm_hyper = FALSE,
  rm_zero = TRUE,
  cohortFontSize = 0.8,
  axisFontSize = 0.8
)

Arguments

maf

MAF object(s) generated by read.maf

capture_size

capture size for input MAF in MBs. Default NULL. If provided plot will be scaled to mutations per mb. TCGA capture size is assumed to be 35.8 mb.

tcga_capture_size

capture size for TCGA cohort in MB. Default 35.8. Do NOT change. See details for more information.

cohortName

name for the input MAF cohort. Default "Input"

tcga_cohorts

restrict tcga data to these cohorts.

primarySite

If TRUE uses primary site of cancer as labels instead of TCGA project IDs. Default FALSE.

col

color vector for length 2 TCGA cohorts and input MAF cohort. Default gray70 and black.

bg_col

background color. Default'#EDF8B1', '#2C7FB8'

medianCol

color for median line. Default red.

decreasing

Default FALSE. Cohorts are arranged in increasing mutation burden.

logscale

Default TRUE

rm_hyper

Remove hyper mutated samples (outliers)? Default FALSE

rm_zero

Remove samples with zero mutations? Default TRUE

cohortFontSize

Default 0.8

axisFontSize

Default 0.8

Details

Tumor mutation burden for TCGA cohorts is obtained from TCGA MC3 study. For consistency TMB is estimated by restricting variants within Agilent Sureselect capture kit of size 35.8 MB.

Value

data.table with median mutations per cohort

Source

TCGA MC3 file was obtained from https://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc. See TCGAmutations R package for more details. Further downstream script to estimate TMB for each sample can be found in ‘inst/scripts/estimate_tcga_tmb.R’

References

Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines Kyle Ellrott, Matthew H. Bailey, Gordon Saksena, et. al. Cell Syst. 2018 Mar 28; 6(3): 271–281.e7. https://doi.org/10.1016/j.cels.2018.03.002

Examples

1
2
3
laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools")
laml <- read.maf(maf = laml.maf)
tcgaCompare(maf = laml, cohortName = "AML")

Example output

-Reading
-Validating
-Silent variants: 475 
-Summarizing
-Processing clinical data
--Missing clinical data
-Finished in 0.635s elapsed (0.546s cpu) 
Performing pairwise t-test for differences in mutation burden..
$median_mutation_burden
    Cohort Cohort_Size Median_Mutations Median_Mutations_log10
 1:   LAML         137              9.0              0.9542425
 2:   PCPG         183              9.0              0.9542425
 3:    AML         192              9.0              0.9542425
 4:   THCA         499             10.0              1.0000000
 5:    UVM          80             11.5              1.0606978
 6:   TGCT         133             13.0              1.1139434
 7:   THYM         123             14.0              1.1461280
 8:   KICH          66             19.5              1.2900346
 9:    ACC          92             25.5              1.4065402
10:    LGG         524             27.0              1.4313638
11:   MESO          82             27.0              1.4313638
12:   PRAD         495             27.0              1.4313638
13:   PAAD         176             35.0              1.5440680
14:   BRCA        1025             40.0              1.6020600
15:   SARC         239             40.0              1.6020600
16:   CHOL          36             40.5              1.6074550
17:    UCS          57             46.0              1.6627578
18:    GBM         398             51.0              1.7075702
19:   KIRC         370             52.0              1.7160033
20:   KIRP         282             65.0              1.8129134
21:     OV         411             66.0              1.8195439
22:   UCEC         531             75.0              1.8750613
23:   LIHC         365             82.0              1.9138139
24:   CESC         291             86.0              1.9344985
25:   READ         150             88.5              1.9469433
26:   ESCA         185            103.0              2.0128372
27:   HNSC         509            106.0              2.0253059
28:   DLBC          37            110.0              2.0413927
29:   STAD         438            114.5              2.0588055
30:   COAD         406            115.0              2.0606978
31:   BLCA         411            169.0              2.2278867
32:   LUAD         516            198.0              2.2966652
33:   LUSC         485            229.0              2.3598355
34:   SKCM         468            406.5              2.6090605
    Cohort Cohort_Size Median_Mutations Median_Mutations_log10

$mutation_burden_perSample
               Tumor_Sample_Barcode total cohort
    1: TCGA-AB-2808-03B-01W-0728-08   741   LAML
    2: TCGA-AB-2828-03B-01W-0728-08   718   LAML
    3: TCGA-AB-2826-03B-01W-0728-08   616   LAML
    4: TCGA-AB-2806-03B-01W-0728-08   491   LAML
    5: TCGA-AB-2833-03B-01W-0728-08   445   LAML
   ---                                          
10388: TCGA-FR-A2OS-01A-11D-A21A-08    13   SKCM
10389: TCGA-BF-AAP8-01A-11D-A401-08    11   SKCM
10390: TCGA-EB-A4IQ-01A-12D-A25O-08     9   SKCM
10391: TCGA-EB-A4OZ-01A-12D-A25O-08     9   SKCM
10392: TCGA-D3-A8GE-06A-11D-A372-08     6   SKCM

$pairwise_t_test
     Cohort1 Cohort2         Pval
  1:    UCEC    BRCA 1.841336e-98
  2:    UCEC    THCA 6.058054e-84
  3:    UCEC     LGG 9.059399e-80
  4:    UCEC    PRAD 4.641347e-79
  5:    UCEC    KIRC 4.898988e-66
 ---                             
557:    PAAD     GBM 9.945562e-01
558:    MESO    KICH 9.945562e-01
559:    THCA    TGCT 9.945562e-01
560:     UVM    TGCT 9.945562e-01
561:     UVM    THCA 9.945562e-01

Warning message:
In FUN(X[[i]], ...) : Removed 1 samples with zero mutations.

maftools documentation built on Feb. 6, 2021, 2 a.m.