In ehfhcrc/ImmSigPkg: Run HIPC ImmuneSignatures Pipeline from ImmuneSpace data

PACKAGE INFORMATION

Created: January 2017

Study Authors: HIPC ImmuneSignature Collaborators

Package Developer: Evan Henrich (Gottardo Lab - FHCRC)

Contact: ehenrich@fredhutch.org

GENERAL NOTES:

This markdown file outputs the results of the HIPC ImmuneSignatures Project using the original parameters developed by the collaborators. Interested users may re-run the full pipeline using the hipc_full_pipeline() command with different parameters by following the prompts and seeing the different sections below for parameter options. For example, during the raw data pre-processing step the user may select different gene annotation methods that can affect the final results. Once the full pipeline has been run once through, you may also use the hipc_meta_analysis() command to re-run only the meta-analysis with different parameters.

library(ImmSigPkg)
options(width = 999)
knitr::opts_chunk$set(echo=FALSE, 
               cache=FALSE, 
               warning=FALSE, 
               message=FALSE, 
               tidy=TRUE,
               fig.height=7, 
               fig.width=7,
               fig.align = "center")

# Setup initial directory structure and options for Preprocessing step

studies <- c("SDY212", "SDY63", "SDY404", "SDY400", "SDY80", "SDY67")

ImmSig_dir <- file.path(getwd(),"ImmSig_Analysis")
dir.create(ImmSig_dir)

pp_dir <- file.path(ImmSig_dir,"PreProc_Data")
dir.create(path = pp_dir)

hai_dir <- file.path(pp_dir,"HAI")
dir.create(path = hai_dir)

ge_dir <- file.path(pp_dir,"GE")
dir.create(path = ge_dir)

rawdata_dir <- file.path(pp_dir,"rawdata")
dir.create(path = rawdata_dir)

yale.anno <- "original"
sdy80.anno <- "original"
sdy80.norm <- FALSE

# 1. Extract and pre-process data from ImmuneSpace, ImmPort, or GEO databases    
for(sdy in studies){
    makeGE(sdy,
           yale.anno = yale.anno,
           sdy80.anno = sdy80.anno,
           sdy80.norm = sdy80.norm,
           output_dir = ge_dir,
           rawdata_dir = rawdata_dir)
    makeHAI(sdy, output_dir = hai_dir)
    makeDemo(sdy, output_dir = ge_dir)
  }

PRE-PROCESSING

PARAMETERS:

Yale Studies Gene Expression Annotation: r yale.anno
SDY80/CHI-nih Gene Expression Annotation: r sdy80.anno
SDY80/CHI-nih GE normalization via pipeleine: r sdy80.norm

OUTPUT DIRECTORY:

r ImmSig_dir

NOTES:

Gene expression annotation methods:
original = using the original pre-processed GE table from HIPC collaborators to generate a hash of Probe IDs as keys and Gene Symbols as values. For the Yale Studies, this method means that the annotation is based on the GE table provided from the Illumina sequencing machine at the time of the run. For SDY80/CHI-nih, the information is from the Affymetrix annotation done at the time.
library = the use of a bioconductor library. For Yale Studies, this was illuminahumanv4.db. For SDY80, it was hugene10sttranscriptcluster.db.
manifest = Uses the manifest available from the Illumina website which consists of all historic probeIDs. In the case of the Yale Studies, this generates complete mapping, whereas the original table has only partial mapping and the library even less.
SDY80 / CHI-nih normalization:
In the original code used to develop the manuscript, The study's rawdata from the GEO database has been log2 transformed, but is not normalized in the same was as the discovery studies (i.e. preprocessCore::normalize.quantiles). Therefore sdy80.norm is FALSE by default. However, the user may set it to TRUE and use the same normalization procedure as the other studies.
SDY212 GE data removal:
SDY212 had two sets of GE data with the same biosample ID. These sets were removed from analysis because it was unclear why there was a duplicate. Also, one gene did not have a full set of expression information in the ImmPort file and was also removed. Questions should be directed to the study authors directly.

# Setup for Rds Generation (BioConductor eset)
rds_dir <- file.path(ImmSig_dir, "Rds_data")
dir.create(path = rds_dir)

combined_hai <- combine_hai_data(hai_dir, output_dir = rds_dir)
for(sdy in studies){
  make_rds(sdy, ge_dir, combined_hai, output_dir = rds_dir)
}

# Setup for Meta Analyis script
data("geneSetDB")
FDR.cutoff <- 0.5
pvalue.cutoff <- 0.01
endPoint <- "fc_res_max_d30"
adjusted <- FALSE
baselineOnly <- TRUE
indiv_rds <- FALSE
output_dir <- "Placeholder"

META-ANALYSIS

YOUNG COHORT RESULTS

yng_res_dfs <- meta_analysis(geneSetDB = geneSetDB,
                  rds_data_dir = rds_dir,
                  cohort = "young",
                  FDR.cutoff = FDR.cutoff,
                  pvalue.cutoff = pvalue.cutoff,
                  endPoint = endPoint,
                  adjusted = adjusted,
                  baselineOnly = baselineOnly,
                  indiv_rds = indiv_rds,
                  markdown = T,
                  output_dir = output_dir)

Discovery Group Significant Pathways Data

DT::datatable(yng_res_dfs$dsc)

Validation Study Significant Pathways Data

DT::datatable(yng_res_dfs$val)

OLD COHORT RESULTS

old_res_dfs <- meta_analysis(geneSetDB = geneSetDB,
                  rds_data_dir = rds_dir,
                  cohort = "old",
                  FDR.cutoff = FDR.cutoff,
                  pvalue.cutoff = pvalue.cutoff,
                  endPoint = endPoint,
                  adjusted = adjusted,
                  baselineOnly = baselineOnly,
                  indiv_rds = indiv_rds,
                  markdown = T,
                  output_dir = output_dir)

Discovery Group Significant Pathways Data

DT::datatable(old_res_dfs$dsc)

Validation Study Significant Pathways Data

DT::datatable(old_res_dfs$val)

PARAMETERS:

FDR.cutoff: r FDR.cutoff
pvalue.cutoff: r pvalue.cutoff
endPoint: r endPoint
adjusted: r adjusted
baselineOnly: r baselineOnly
indiv_rds: r indiv_rds
geneSetDB: BTM_for_GSEA_20131008.gmt

PARAMETERS DEFINITION:

Expected Input Data Types in [brackets]

FDR.cutoff = [FLOAT] Cut off point for q-values, in code 'combined.q <- p.adjust(combined.p, method="BH")'.
pvalue.cutoff = [FLOAT] Cut off point for p-values when selecting gene pathways as 'significant' from the discovery studies. In code, 'combined.p <- pdf.pval(combinePDFs(quSageObjList, n.points = 2^14))'.
endPoint = [INT] The integer decides which HAI column should be used for categorizing"responders" and "non-responders" to a vaccine. The user has the option of '30', which uses 'fc_res_max_d30', or 20, which uses 'fc_res_max_d20'. These columns are calculated using either the 30% / 70% points or 20% / 80% for discretization.
adjusted = [BOOL] If TRUE, selects the age-adjusted gene expression values for use in calculations.
baselineOnly = [BOOL] If TRUE, uses only day zero gene expression values in calculations.
indiv_rds = [BOOL] If TRUE, outputs the rds files for the individual studies as part of the output.
geneSetDB = [STRING] The file path used for gene set definition. Currently file is loaded with data().

ehfhcrc/ImmSigPkg documentation built on May 16, 2019, 12:16 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ehfhcrc/ImmSigPkg
Run HIPC ImmuneSignatures Pipeline from ImmuneSpace data

In ehfhcrc/ImmSigPkg: Run HIPC ImmuneSignatures Pipeline from ImmuneSpace data

PACKAGE INFORMATION

Created: January 2017

Study Authors: HIPC ImmuneSignature Collaborators

Package Developer: Evan Henrich (Gottardo Lab - FHCRC)

Contact: ehenrich@fredhutch.org

GENERAL NOTES:

PRE-PROCESSING

PARAMETERS:

OUTPUT DIRECTORY:

NOTES:

META-ANALYSIS

YOUNG COHORT RESULTS

Discovery Group Significant Pathways Data

Validation Study Significant Pathways Data

OLD COHORT RESULTS

Discovery Group Significant Pathways Data

Validation Study Significant Pathways Data

PARAMETERS:

PARAMETERS DEFINITION:

R Package Documentation

Browse R Packages

We want your feedback!

ehfhcrc/ImmSigPkg Run HIPC ImmuneSignatures Pipeline from ImmuneSpace data

In ehfhcrc/ImmSigPkg: Run HIPC ImmuneSignatures Pipeline from ImmuneSpace data

PACKAGE INFORMATION

Created: January 2017

Study Authors: HIPC ImmuneSignature Collaborators

Package Developer: Evan Henrich (Gottardo Lab - FHCRC)

Contact: ehenrich@fredhutch.org

GENERAL NOTES:

PRE-PROCESSING

PARAMETERS:

OUTPUT DIRECTORY:

NOTES:

META-ANALYSIS

YOUNG COHORT RESULTS

Discovery Group Significant Pathways Data

Validation Study Significant Pathways Data

OLD COHORT RESULTS

Discovery Group Significant Pathways Data

Validation Study Significant Pathways Data

PARAMETERS:

PARAMETERS DEFINITION:

R Package Documentation

Browse R Packages

We want your feedback!

ehfhcrc/ImmSigPkg
Run HIPC ImmuneSignatures Pipeline from ImmuneSpace data