knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(MultiDataAnalysis)
This package facilitates combining data from multiple files, experiments, or datasets into one analysis. In particular, this package has two main functionalities: 1. Merging files that are split by sample into one data set by experiment/measurement type, and 2. Applying a regression model to all rows of one or more matrices using data from multiple data sets. In this tutorial, we will explore different applications and parameters for the function ModelMultiData. We will be using an example of a differential RNA Editing analysis to demonstrate.
To test for associations between RNA Editing and a trait at every edited site, we can use the ModelMultiData function from the MultiDataAnalysis package. The ModelMultiData function applies a regression model across all rows of one or more matrices with inputs from multiple data sets and returns the summary regression coefficients.
First, let's test if RNA Editing frequency at any site is associated with age using a linear model, adjusting for batch. To do this, we supply the RNA Editing Frequency matrix to x and the RNAEdSampleInfo data.frame to groups. Because we don't want to include status as a covariate, we will add that to the excludeVars parameter.
out <- ModelMultiData(x = RNAEdDataQCed$Frequency, groups = RNAEdSampleInfo, excludeVars = "status") out <- out[order(out$fdr), ] head(out)
By default, if no formula is provided, ModelMultiData will build a formula using the provided data. If groups is defined and y is not, then the left-hand variable defaults to the second column of groups after applying excludeVars and includeVars. The remaining columns in groups post-includsion/exclusion criteria are included as covariates in the model. We can also supply the formula directly, as seen below.
While ModelMultiData applies a linear model by default, we can also test binary traits with a logistic regression by changing the FUN parameter to glm and supplying family = binomial(). Here, we are interested in testing for associations between RNA Editing Frequency and case-control status, so we supply the formula status ~ frequency and note that "frequency" refers to the rows of our x matrix (RNAEdDataQCed$Frequency) by setting xName.
out <- ModelMultiData(formula = status ~ frequency, x = RNAEdDataQCed$Frequency, groups = RNAEdSampleInfo, xName = "frequency", FUN = glm, family = binomial()) out <- out[order(out$fdr), ] head(out)
In addition to testing for associations with external traits using the groups parameter, we may want to test if the Frequency of RNA Editing tends to be strongly associated with Coverage at each site. In this case, we can supply the Coverage matrix to the y parameter. Because the supplied x matrix and y matrix both share the same the same rownames, ModelMultiData will by default pair rows with the same name and test for associations (if x and y do not have the same rownames, ModelMultiData will test all possible combinations unless comparisons parameter is supplied).
out <- ModelMultiData(x = RNAEdDataQCed$Frequency, y = RNAEdDataQCed$`Coverage-q25`) out <- out[order(out$fdr), ] head(out)
While it does not appear that coverage is strongly associated with frequency, we may still want to include coverage at each site as a covariate while testing for the association between case control status and frequency.
To do this, we can supply a list of matrices to the x parameter. If we don't supply xName, the names of the list items should correspond with the respective variables in the formula, in this case, frequency and coverage (otherwise, by default, x matrices will be named x1, x2,..., xn). Like the example above, this function will pair all rows in x1 and x2 that share the same name. This means that we can adjust the frequency of editing at every site by the coverage at those sites.
In addition, we may want to change the variable coefficients that are returned. By default, ModelMultiData returns model coefficients for variables in x. However, to include all coefficients (including the intercept), we change the returnVars parameter to "*".
out <- ModelMultiData(formula = status ~ frequency + coverage + age, x = list(frequency = RNAEdDataQCed$Frequency, coverage = RNAEdDataQCed$`Coverage-q25`), groups = RNAEdSampleInfo, FUN = glm, family = binomial(), returnVars = "*") out <- out[order(out$fdr), ] head(out)
So far, we have tested for associations between RNA editing and traits with cases and control combined, but we might want to conduct a stratified analysis to test for associations within each group. We can use the "by" parameter to conduct a stratified analysis, splitting data by one or more variables in groups. We supply a character vector specifying the names of the column or columns in groups that will be used to split data if stratified analysis desired.
out <- ModelMultiData(formula = age ~ frequency + coverage + batch, x = list(frequency = RNAEdDataQCed$Frequency, coverage = RNAEdDataQCed$`Coverage-q25`), groups = RNAEdSampleInfo, by = "status") out <- lapply(out, function(x) x[order(x$fdr), ]) lapply(out, head)
To conduct an RNA Editing QTL analysis, we can supply our RNA Editing frequency matrix to the y parameter and a genotypes matrix to x.
RNAEdGenotypes
However, we may want to restrict the comparisons we are making to only test for associations between SNPs and editing sites that are within the same gene, etc. To do this, we can supply our desired comparisons to the comparisons parameter.
head(RNAEdCombinations)
The comparisons parameter allows users to pre-define which combinations of rows in x and y should be tested. Column names should match data set names provided in xName (or default xName values) and in yName if y is provided. Each row should include the respective rownames or row numbers from each dataset in x and y that should be tested together.
out <- ModelMultiData(formula = y ~ x1 + age, x = RNAEdGenotypes, y = RNAEdDataQCed$Frequency, groups = RNAEdSampleInfo, comparisons = RNAEdCombinations) out <- out[order(out$fdr), ] head(out)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.