knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Outline

actor uses isoform expression from an experimental dataset and compares the splicing patterns to a reference panel. actor provides a gene level comparison and identifies collections of genes which relate to the reference panel in a similar way called gene classes. This vignette provides code for conducting your own analysis as well as how to interpret the results. actor provides GTEx as a reference panel with the tissue of origin as the reference group. The data for this vignette is the motor neuron dataset described in the paper and the FASTQ files can be found here. Note due to speed and size constraints, only a subset of 100 genes were used in this analysis. actor also allows for the incorporation of external reference panels. See the last section of this document for information on how this is done.

Preparing Data

actor takes in as input the isoform counts from an experimental dataset. Each row of the matrix should be a transcript with the columns corresponding to the samples. The first column should be the gene id and be labled as gene_id. The second column should be the transcript id and labeled as feature_id. Counts should be scaled to account for read length but still on the same scale of counts. tximport can directly calculate this from the salmon files by using the option countsFromAbundance="scaledTPM".

The code below loads the package and prepares the data for modelling. isoData is the isoform expression data for the experimental dataset. reduceTissue=TRUE reduces the model to only the top 5 relevant tissues. m1 is the stored model.

library(actor)
set.seed(1994)
m1=actor(isoData,reduceTissue = TRUE)

Users can also modify the model parameters prior to running actorFit by changing the values in m1 as seen below. The number of iterations is set to be 10 and the number of gene classes to identify is set to be 1.

m1$numIter=10
m1$numClass=2

The model is then run using the function actorFit.

fit1=actorFit(m1)

Interpreting Results

Results of the model are stored in the object fit1. actor also supplies several ways to visualize the results. dirEta shows the Dirichlet estimates for the gene classes. This identifies which tissue groups are represented in each of the gene classes. "TissueMem" provides a heatmap of the posterior estimates of the tissue membership where genes are on the columns and the GTEx tissue groups are on the rows. "ClassMem" provides a heatmap of the posterior estimates for the gene class membership with the genes on the columns and the gene classes on the rows.

fit1$dirEta
plot(fit1,plotType = "TissueMem")
plot(fit1,plotType = "ClassMem")

Investigating genes

Plots can be made to observe the splicing patterns for an individual gene. The original data gets past in as an argument to isoData and a list of genes can be passed into geneList. This plots can also be subset to a subset of tissues and output to a pdf. Read the documentation for makeGenePlots for further information. Note: This plot is useful when the number of samples in the experimental dataset (isoData) is relatively small. The more samples present the harder this plot is to read. The list of tissues can also be specified through the tissueList parameter. Provided tissues must be a supset of the ones within the actor object. This can be found using the availableTissues function.

makeGenePlots(actorObj=fit1,isoData=isoData,geneList="ENSG00000122484.8")

availableTissues(actorObj=fit1)

makeGenePlots(actorObj=fit1,isoData=isoData,geneList="ENSG00000122484.8",tissueList = c("Brain_Amygdala","Brain_Cerebellar_Hemisphere"))

Pre-specifing the tissues to model

actor can also reduce to a subset of the tissues to consider in the modelling procedure. This is done by specifying the a vector of the tissues in tissueList. A list of all available GTEx tissues can be found by running availableTissues with no parameters.

availableTissues()

You can then create a vector of tissues you wish to model and pass it into the tissueList parameter and follow the steps for running the model and interpreting the analysis as before.

tissueList=c("Brain_Amygdala","Brain_Cerebellum","Pancreas","Nerve_Tibial","Brain_Spinal_cord_(cervical_c_1)")


m2=actor(isoData,reduceTissue = FALSE,tissueList=tissueList)
fit2=actorFit(m2)
m2$numClass=2
plot(fit1,plotType = "TissueMem")
plot(fit2,plotType="ClassMem")

External Reference Panels

Users can also provide their own reference panels through the parameter refPanel. The first two columns of the dataset must be labeled gene_id and feature_id respectively and correspond to the gene and transcript labels. Each remaining column of the dataset corresponds to each reference group and the matrix is populated by the precomputed Dirichlet parameters for that reference group. If a gene is not expressed for a specific reference group, NA should be entered for all transcripts of that gene. Below is a small portion of the GTEx reference panel. Note that gene ENSG00000000005.5 was not expressed in the adrenal gland and thus the Dirichlet parameters were replaces with NA values.

load("C:/Sean/UNC/Mike Love/GTEx/rPackages/actor/R/sysdata.rda")
 head(gtexRefPanel[,c(1,2,5)])

In the below example, the external reference panel is titled gtexRefPanel and all subsequent analyses can be run identically as before.

set.seed(1994)
m1=actor(isoData,reduceTissue = TRUE,refPanel = gtexRefPanel)


mccabes292/actor documentation built on March 1, 2020, 1:23 a.m.