TCGAanalyze_SurvivalKM: survival analysis (SA) univariate with Kaplan-Meier (KM)...

Description Usage Arguments Value Examples

View source: R/analyze.R

Description

TCGAanalyze_SurvivalKM perform an univariate Kaplan-Meier (KM) survival analysis (SA). It performed Kaplan-Meier survival univariate using complete follow up with all days taking one gene a time from Genelist of gene symbols. For each gene according its level of mean expression in cancer samples, defining two thresholds for quantile expression of that gene in all samples (default ThreshTop=0.67,ThreshDown=0.33) it is possible to define a threshold of intensity of gene expression to divide the samples in 3 groups (High, intermediate, low). TCGAanalyze_SurvivalKM performs SA between High and low groups using following functions from survival package

  1. survival::Surv

  2. survival::survdiff

  3. survival::survfit

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
TCGAanalyze_SurvivalKM(
  clinical_patient,
  dataGE,
  Genelist,
  Survresult = FALSE,
  ThreshTop = 0.67,
  ThreshDown = 0.33,
  p.cut = 0.05,
  group1,
  group2
)

Arguments

clinical_patient

is a data.frame using function 'clinic' with information related to barcode / samples such as bcr_patient_barcode, days_to_death , days_to_last_follow_up , vital_status, etc

dataGE

is a matrix of Gene expression (genes in rows, samples in cols) from TCGAprepare

Genelist

is a list of gene symbols where perform survival KM.

Survresult

is a parameter (default = FALSE) if is TRUE will show KM plot and results.

ThreshTop

is a quantile threshold to identify samples with high expression of a gene

ThreshDown

is a quantile threshold to identify samples with low expression of a gene

p.cut

p.values threshold. Default: 0.05

group1

a string containing the barcode list of the samples in in control group

group2

a string containing the barcode list of the samples in in disease group

Value

table with survival genes pvalues from KM.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
 # Selecting only 20 genes for example
 dataBRCAcomplete <- log2(dataBRCA[1:20,] + 1)

 # clinical_patient_Cancer <- GDCquery_clinic("TCGA-BRCA","clinical")
 clinical_patient_Cancer <- data.frame(
      bcr_patient_barcode = substr(colnames(dataBRCAcomplete),1,12),
      vital_status = c(rep("alive",3),"dead",rep("alive",2),rep(c("dead","alive"),2)),
      days_to_death = c(NA,NA,NA,172,NA,NA,3472,NA,786,NA),
      days_to_last_follow_up = c(3011,965,718,NA,1914,423,NA,5,656,1417)
 )

 group1 <- TCGAquery_SampleTypes(colnames(dataBRCAcomplete), typesample = c("NT"))
 group2 <- TCGAquery_SampleTypes(colnames(dataBRCAcomplete), typesample = c("TP"))

 tabSurvKM <- TCGAanalyze_SurvivalKM(clinical_patient_Cancer,
                                     dataBRCAcomplete,
                                     Genelist = rownames(dataBRCAcomplete),
                                     Survresult = FALSE,
                                     p.cut = 0.4,
                                     ThreshTop = 0.67,
                                     ThreshDown = 0.33,
                                     group1 = group1, # Control group
                                     group2 = group2) # Disease group

 # If the groups are not specified group1 == group2 and all samples are used
 ## Not run: 
 tabSurvKM <- TCGAanalyze_SurvivalKM(clinical_patient_Cancer,
                                     dataBRCAcomplete,
                                     Genelist = rownames(dataBRCAcomplete),
                                     Survresult = TRUE,
                                     p.cut = 0.2,
                                     ThreshTop = 0.67,
                                     ThreshDown = 0.33)

## End(Not run)

daniel615212950/TCGAbiolinks documentation built on Dec. 19, 2021, 8:06 p.m.