interactions is a R package for classifying interaction effects from combinatorial stimuli with gene expression data. This package primarily utilizes functions in limma and edgeR to implement the theoretical framework and classification algorithm discussed in Combinatorial code governing cellular response to complex stimuli (Cappuccio et. al). Sets of gene expression profiles can be classified into well-characterized, mathematically defined definitions, potentially offering a more detailed snapshot of the transcriptional responses to combinations of stimuli.
Installing from GitHub:
# install.packages(devtools) library(devtools) devtools::install_github("taylo5jm/interactions") library(interactions)
Installing from source:
install.packages("path/to/interactions.tar.gz", repos = NULL) library(interactions)
Note, this README highlights the usage of the main functions in interactions. We will tend to avoid discussing details related to the statistics in this document, but it is highly encouraged that one reads the background information before interpreting results output with interactions.
Suppose we have a set of gene expression outputs for 4 conditions:
1) Control 2) Signal X 3) Signal Y 4) Signal X + Y
We might be interested in predicting the interaction effects between X and Y, where X and Y could be represent conditions such as drug, sex, etc.
First, we will need to assemble three objects:
1) counts
2) genes
3) metadata
counts
is a matrix where each row represents a gene and each column represents a library. Libraries for all 4 conditions should be present in this matrix.
genes
is a data.frame where each row represents a gene and each column represents a variable. For the moment, interactions might need a column named hg19_gene_id with the respective gene ID.
metadata
is another data.frame where each row represents a library and each column represents a variable. We will discuss what information should be provided. treatment_name
should be the name of the condition. For example, a library that is part of the control group might be called "None".
Please return to this section, as this will change in the near future.
The design of the classification algorithm is an optional parameter at the time of this writing. If design
is not provided, interactions will try to construct a design matrix from the metadata
data.frame.
In this case, four conditions would map to the first four columns of a design matrix. The columns following the fourth represent donors (e.g. the linear model is attempting to account for biological variation between donors), and is a function of the number of libraries provided.
If you do not pass your own design matrix to the functions discussed below, always check the design matrix constructed by interactions.
To provide your own design matrix, the number of columns should be equal to 4 (unique conditions) + n unique libraries - 1. The number of rows should be equal to the number of libraries in the counts matrix and metadata. For more help, see the help page for stats::model.matrix
or the Limma Users Guide.
To classify interaction modes with interactions, we use edgeRModeClassifier
.
interaction_modes <- edgeRModeClassifier(counts, genes, metadata, design, alpha = 0.01)
A set of gene expression outputs for 4 conditions (interaction profile) can be represented as a 6-d outcome vector, where the values are $-1$, $0$, or $1$ depending on the results of pairwise comparisons between the mean outputs. For more information on outcome vectors, see the help page [Th Outcome].
Alpha can be adjusted for a change in the frequency of type I and type II errors.
edgeRModeClassifier
returns a list of notable objects. interaction.tbl
is a data.frame where each row is a gene that was classified into a non-additive interaction mode. This table has fold changes.
voom
and DGEList
are the data with precision weights and normalized data estimated by the limma voom
and edgeR negative binomial, models respectively. modes
is a factor vector where each element maps to a row in one of the voom
or DGEList
objects.
Once interaction modes can be classified, we could consider the genes classified into a particular mode as our gene set of interest (rows in interaction.tbl
with our classified mode(s)) and all of the genes expressed (rows in DGEList
or voom
) as our universe. We find that some GO terms are enriched when we consider a set of genes classified into a particular interaction mode.
To run a topGO gene set enrichment analysis with Fisher's Exact Test, we pass a DGEList
and interaction.tbl
to getGOTable
. We also tell the function what ontology and what mode we are interested in.
go_data <- getGOTable(DGEList, interaction.tbl, ontology = "MF", filter.mode = "Pos.Syn") # molecular function for Positive synergy
If we are interested in retrieving data for all modes for each ontology (MF, BP, and CC), we can use getGOTermsForAllModes
all_go_data <- getGOTermsForAllModes(DGEList, interaction.tbl)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.