README.md
In pharmhax/OMA: Omics Meta-Analysis

OMA - Omics Meta-Analysis

Install from the github repo via

> devtools::install_github("PfizerRD/OMA-pfz")

or from a local copy via

> install.packages("location/with/OMA", repos = NULL, type = "source")

Applicatin of OMA occurs in three stages: 1. data preparation and checking, 2. differential gene expression analysis in each dataset, and 3. meta-analysis of the DGE results from 2.

Data prep:

library(OMA)

sample_filters <- rlang::quo(
  grepl("brain|white matter|neocortex", tissue) & 
  (((disease_state == "multiple sclerosis (MS)") & (sample_pathology != "non-lesional")) | (disease_state == "normal control"))
)

dge_ready <- prepare_for_dge(exp_data, dataset_names, contrast = list(variable = "disease_state", active = "multiple sclerosis (MS)", reference = "normal control"), covariates = "gender", scrutinize_covariates = TRUE, sample_filters = sample_filters)

Let's break this down.

Looking at the arguments of prepare_for_dge() sequentially, we first send in the actual expression data, here exp_data. This needs to be a list containing the datasets, where each dataset is a list with two NAMED fields, "gene_expression_data", and "sample_data". "sample_data" is a dataframe containing all the necessary sample information, including the response variable that we're interested in, and its need to contain a column called "sample_name", which holds the names of the samples that need to correspond to the row names in "gene_expression_data".

WARNING - it's very important that the "gene_expression_data" is a data frame / matrix with rownames corresponding to the sample names in the "sample_name" column of "sample_data". The columns of "gene_expression_data" need to be named by the gene that they correspond to. Do note that this means that typically (for example for standard microarray data) this means that the number of columns >> number of rows.

Next, dataset_names is a vector of the same length as the exp_data list, with the names of the datasets in exp_data.

contrast should be a 3 member list, with the first member being called "variable" and holding the name of the variable in "sample_data" that will be used to build the contrast, "active" being the level of this variable that will serve as the active level of the contrast, and "reference", which is the level of the variable that will be used as a contrast reference.

covariates is either a vector of covariates that will be used in all datasets (in this case scrutinize_covariates can be set to TRUE to ensure that the run won't fail if a given dataset doesn't have some of the covariates), or a list of covariates with list elements having the same names as the datasets, holding vectors of covariates that will be applied in that datasets.

sample_filters needs to be an R quosure - a filter that will be passed directly into dplyr's filter() and that needs to be enclosed in rlang::quo().