Utilities for processing and analyzing HIPC gene expression data in ImmuneSpace.
analysis_dir
├── analysis
│ └── exprs_matrices
│ ├── matrix_name.tsv
│ ├── matrix_name.tsv.raw
│ ├── matrix_name.tsv.summary
│ └── matrix_name.tsv.summary.orig
└── rawdata
└── gene_expression
├── create-matrix
│ └── matrix_name
│ └── create-matrix-vars.tsv
├── RAW_FILES_FROM_IMMPORT
└── supp_files
└── sampletype_armaccession
└── SDY1630_raw_expression.txt
analysis_dir/rawdata
analysis_dir/rawdata/supp_files/sampletype_armaccession
. create-matrix/matrix_name
Methods for downstream analysis used in ImmuneSpace are also maintained in this
package. They are accessible through the HMX
object, which is an R6 object
that extends the ISConn
object defined in ImmuneSpaceR
.
To create an HMX
object:
con <- HMX$new("SDY269")
You can then perform analyses on the matrices available through the connection object. For example, to run differential expression analysis:
de_results <- con$run_de_analysis()
The Immune Response Predictors method finds genes which predict immune response, with methods to fit a model to those features and compare observed vs predicted results. These methods are used in the Immune Response Predictor module in ImmuneSpace.
Predictor genes are identified using elastic net regression using the glmnet
package.
The train_immune_response_predictors
method will identify genes which predict
immune response, measured either by HAI, Neutralizing Antibody Titer, or ELISA
results. For example, to find genes whose change in expression from baseline to
day 7 are associated with HAI response, and fit a model:
con <- HMX$new("SDY269")
fit <- con$train_immune_response_predictors(
cohorts = "LAIV group 2008_PBMC",
timepoint = 7
)
Under the hood, this function: 1. Identifies genes differentially expressed from baseline to the selected timepoint. 1. Calculates fold-change from baseline to selected timepoint for those genes. 1. Calculates immune response using the selected assay (default is HAI) for all participants in the selected cohort. 1. Selects features whose fold-change value is predictive of HAI response using elastic net regression. 1. Fits a linear model to the data using the selected features.
You can also return just the selected features instead of the whole model
with the return_type
option:
features <- con$train_immune_response_predictors(
cohorts = "LAIV group 2008_PBMC",
timepoint = 7,
return_type = "features"
)
train_immune_response_predictors
also has the option to dichotomize response
values using a threshold, using the dichotomize
and dichotomize_thresh
options.
In this case, features will be selected and the model will be fit using logistic
regression instead of linear.
Once a model has been fit, fitted values can be derived using the predict_response
method, which returns predicted values for each participant in the selected cohort.
This package provides tools for ensuring that the Gene Symbols used throughout
the ImmuneSpace portal are consistent and up to date. The UpdateAnno
package
previously managed gene annotation in ImmuneSpace.
There are a number of places where Gene Symbols play an important role:
- Gene Expression Explorer - Pulls Gene Symbols from the FeatureAnnotation query
- Gene Set Enrichment Analysis - Pulls Gene Symbols from the static gene set
module data objects in this package, e.g. chaussabel or emory.
- Differential Gene Expression Analysis - Uses the GeneExpressionAnalysisResults
query to display differentially expressed genes.
When loading a new matrix onto ImmuneSpace, it must be associated with a
feature annotation set, saved in the Feature Annotation Set table. If the
annotation for the matrix is not already loaded from a previous study which
used the same platform, you must create a new one. Each feature annotation
table and the source code to create it is saved in HIPCMatrix/inst/FeatureAnnotationSetDev
.
Please refer to the Notion Documentation for more details on how to create and upload a FAS.
The function create_gene_alias_map()
pulls the most recent annotation from
the latest HUGO Gene Nomenclature Committee dataset.
This resource is used instead of the NCBI database via biomaRt or the equivalent
org.Hs.eg.db
as those data sources contained mappings that were deemed incorrect
(such as "ACTB" > "POTEF") during the ImmuneSignatures 2 project.
This function handles 2 edge cases: 1. An alias maps to itself as a symbol as well as other symbols. In this case, we have selected to use the self-mapping and removed any other mappings. Our rationale is that the other-mapping symbols are historic artifacts and no longer accurate. 2. An alias maps to multiple symbols that do not include itself. In this case, we drop these aliases from the matrix because we do not have a good way to know which symbol is the most accurate.
This mapping table, as well as gene sets with the most current mapping, are saved in HIPCMatrix package data to be easily accessible from many locations.
The HMX
object includes methods for updating annotation in ImmuneSpace,
using the gene symbol mapping table installed with the package:
To update feature annotation sets associated with matrices associated with a connection:
con <- HMX$new("")
con$updateFAS()
To update summary.tsv expressionsets associated with a connection:
con$updateEMs()
To update these mappings and apply it throughout ImmuneSpace:
HIPCMatrix/data-raw/update_hgnc_mapping.R
.
This will update the HGNC mapping table which is part of the HIPCMatrix
package data, then use this new mapping to update the gene signatures
which are also part of the package data. ssh rsT
s
R
devtools::install("RGLab/HIPCMatrix@dev")
On the server, go to data integration module at Studies level:
Run Update Anno ETL. This will:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.