PACKAGE INFORMATION

Created: January 2017

Study Authors: HIPC ImmuneSignature Collaborators

Package Developer: Evan Henrich (Gottardo Lab - FHCRC)

Contact: ehenrich@fredhutch.org


GENERAL NOTES:

This markdown file outputs the results of the HIPC ImmuneSignatures Project using the original parameters developed by the collaborators. Interested users may re-run the full pipeline using the hipc_full_pipeline() command with different parameters by following the prompts and seeing the different sections below for parameter options. For example, during the raw data pre-processing step the user may select different gene annotation methods that can affect the final results. Once the full pipeline has been run once through, you may also use the hipc_meta_analysis() command to re-run only the meta-analysis with different parameters.

library(ImmSigPkg)
options(width = 999)
knitr::opts_chunk$set(echo=FALSE, 
               cache=FALSE, 
               warning=FALSE, 
               message=FALSE, 
               tidy=TRUE,
               fig.height=7, 
               fig.width=7,
               fig.align = "center")
# Setup initial directory structure and options for Preprocessing step

studies <- c("SDY212", "SDY63", "SDY404", "SDY400", "SDY80", "SDY67")

ImmSig_dir <- file.path(getwd(),"ImmSig_Analysis")
dir.create(ImmSig_dir)

pp_dir <- file.path(ImmSig_dir,"PreProc_Data")
dir.create(path = pp_dir)

hai_dir <- file.path(pp_dir,"HAI")
dir.create(path = hai_dir)

ge_dir <- file.path(pp_dir,"GE")
dir.create(path = ge_dir)

rawdata_dir <- file.path(pp_dir,"rawdata")
dir.create(path = rawdata_dir)

yale.anno <- "original"
sdy80.anno <- "original"
sdy80.norm <- FALSE
# 1. Extract and pre-process data from ImmuneSpace, ImmPort, or GEO databases    
for(sdy in studies){
    makeGE(sdy,
           yale.anno = yale.anno,
           sdy80.anno = sdy80.anno,
           sdy80.norm = sdy80.norm,
           output_dir = ge_dir,
           rawdata_dir = rawdata_dir)
    makeHAI(sdy, output_dir = hai_dir)
    makeDemo(sdy, output_dir = ge_dir)
  }

PRE-PROCESSING

PARAMETERS:

OUTPUT DIRECTORY:

NOTES:

  1. Gene expression annotation methods:
  2. original = using the original pre-processed GE table from HIPC collaborators to generate a hash of Probe IDs as keys and Gene Symbols as values. For the Yale Studies, this method means that the annotation is based on the GE table provided from the Illumina sequencing machine at the time of the run. For SDY80/CHI-nih, the information is from the Affymetrix annotation done at the time.
  3. library = the use of a bioconductor library. For Yale Studies, this was illuminahumanv4.db. For SDY80, it was hugene10sttranscriptcluster.db.
  4. manifest = Uses the manifest available from the Illumina website which consists of all historic probeIDs. In the case of the Yale Studies, this generates complete mapping, whereas the original table has only partial mapping and the library even less.

  5. SDY80 / CHI-nih normalization:

  6. In the original code used to develop the manuscript, The study's rawdata from the GEO database has been log2 transformed, but is not normalized in the same was as the discovery studies (i.e. preprocessCore::normalize.quantiles). Therefore sdy80.norm is FALSE by default. However, the user may set it to TRUE and use the same normalization procedure as the other studies.

  7. SDY212 GE data removal:

  8. SDY212 had two sets of GE data with the same biosample ID. These sets were removed from analysis because it was unclear why there was a duplicate. Also, one gene did not have a full set of expression information in the ImmPort file and was also removed. Questions should be directed to the study authors directly.

# Setup for Rds Generation (BioConductor eset)
rds_dir <- file.path(ImmSig_dir, "Rds_data")
dir.create(path = rds_dir)
combined_hai <- combine_hai_data(hai_dir, output_dir = rds_dir)
for(sdy in studies){
  make_rds(sdy, ge_dir, combined_hai, output_dir = rds_dir)
}
# Setup for Meta Analyis script
data("geneSetDB")
FDR.cutoff <- 0.5
pvalue.cutoff <- 0.01
endPoint <- "fc_res_max_d30"
adjusted <- FALSE
baselineOnly <- TRUE
indiv_rds <- FALSE
output_dir <- "Placeholder"

META-ANALYSIS

YOUNG COHORT RESULTS

yng_res_dfs <- meta_analysis(geneSetDB = geneSetDB,
                  rds_data_dir = rds_dir,
                  cohort = "young",
                  FDR.cutoff = FDR.cutoff,
                  pvalue.cutoff = pvalue.cutoff,
                  endPoint = endPoint,
                  adjusted = adjusted,
                  baselineOnly = baselineOnly,
                  indiv_rds = indiv_rds,
                  markdown = T,
                  output_dir = output_dir)

Discovery Group Significant Pathways Data

DT::datatable(yng_res_dfs$dsc)

Validation Study Significant Pathways Data

DT::datatable(yng_res_dfs$val)

OLD COHORT RESULTS

old_res_dfs <- meta_analysis(geneSetDB = geneSetDB,
                  rds_data_dir = rds_dir,
                  cohort = "old",
                  FDR.cutoff = FDR.cutoff,
                  pvalue.cutoff = pvalue.cutoff,
                  endPoint = endPoint,
                  adjusted = adjusted,
                  baselineOnly = baselineOnly,
                  indiv_rds = indiv_rds,
                  markdown = T,
                  output_dir = output_dir)

Discovery Group Significant Pathways Data

DT::datatable(old_res_dfs$dsc)

Validation Study Significant Pathways Data

DT::datatable(old_res_dfs$val)

PARAMETERS:

PARAMETERS DEFINITION:

Expected Input Data Types in [brackets]




ehfhcrc/ImmSigPkg documentation built on May 16, 2019, 12:16 a.m.