knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Data is downloaded from GEO using getGEO function in GEOquery library. Expression matrix with probeset IDs, age of the samples and covarietes to be included in the analysis are extracted from geo object. Please note that this tutorial is just to demonstrate the functionality of this package and is not a proper gene expression analysis tutorial. Thus we skip many potential QC steps and probeset -> gene ID mapping.
library(GEOquery) geo <- getGEO('GSE30272',destdir = '~/temp/')[[1]] pd <- pData(geo) expmat <- exprs(geo) ages <- setNames(as.numeric(pd$`age:ch1`), pd$geo_accession) covs <- list(array = as.factor(setNames(pd$`array batch:ch1`, pd$geo_accession)), bbs = as.factor(setNames(pd$`brain bank source:ch1`, pd$geo_accession)), sex = as.factor(setNames(pd$`Sex:ch1`, pd$geo_accession)), race = as.factor(setNames(pd$`race:ch1`, pd$geo_accession))) ages <- ages[ages >= 20] expmat <- expmat[, names(ages)] covs <- lapply(covs, function(x)x[names(ages)])
We can simply use calc.het
function in our package to have all relevant results:
library(hetAge) resx <- calc.het(exprmat = expmat[sample(rownames(expmat),1000),], # expression matrix age = ages, # age vector age_in = 'years', # scale of the age vector age_to = 'pw-0.25', # scale of ages to be used batch_corr = 'SVA+QN', # batch correction method modex = 'linear', # model for the change in expression level tr_log2 = F, # log2 transform input matrix sc_features = T, # scale genes / probesets covariates = covs, # covariates het_change_met = 'spearman', # method to measure change in heterogeneity padj_met = 'fdr') # multiple test correction method
The resulting object is a list with several fields.
head(resx$sampleID)
resx$input_expr[1:5,1:5]
head(resx$input_age)
Here we wanted the ages to be in fourth root scale.
head(resx$usedAge)
library(tidyverse) data.frame(age = resx$input_age, age0.25 = resx$usedAge) %>% ggplot(aes(x = age, y = age0.25)) + geom_point() + geom_smooth( method = 'loess') + theme_bw()
These are the expression values for the probesets after accounting for the predicting confounding factors, i.e. SVs
resx$sva_res$correctedExp[1:5,1:5]
head(resx$sva_res$SVs)
head(resx$sva_res$SV_cov_corr) %>% knitr::kable()
This is the matrix used to model age-related expression change - the reason it is different from corrected matrix is because we may further scale the expression level for the features ('sc_features' parameter.)
resx$usedMat[1:5,1:5]
Absolute values of this matrix can be considered as the level of heterogeneity for each gene, each sample.
resx$residMat[1:5,1:5]
colnames(resx$feature_result) head(resx$feature_result)
options(width = 100) sessioninfo::session_info()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.