knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

This module will cover how to use functions from isotwas to train an multivariate isotwas model for isoform-level expression. Along the way, we'll be covering some practical considerations for computation and feature selection.

library(isotwas)

Train predictive model

To train a multivariate model, we need:

  1. a matrix of dosages of SNPs, and

  2. a matrix of isoform-level expressions for a given gene.

Isoform expression estimated from pseudo-alignment methods like salmon and kallisto provide bootstrapped estimates expression across a number of iterations. This information can be included into model training, as illustrated below.

[Practical consideration]{.underline}: Ideally, to reduce your memory usage, we recommend storing your genetic data as PLINK format data (ideally as .bed/bim/fam or .pgen/.pvar/.psam). For a given gene, use PLINK to select only SNPs on the same chromosome and within 1 Megabase of the gene body. Then, you can use the bigsnpr package read in this "condensed" file set. Make sure your isoform expression matrix is log-transformed and residualized by covariates that are normally used in a QTL analysis (i.e., age, sex, principal components of the genotype matrix, PEER factors or hidden covariates with prior for the full expression matrix). In order to ensure maximal overlap between the model and the GWAS we will eventually use for trait mapping, we suggest restricting the SNPs in the QTL panel to only those present in the LD reference panel or to those annotated by the HapMap3 Project. This will ensure proper standard error calculations.

For this vignette about training an model, we'll use some toy genetic and transcriptomic data provided.

train_bed = system.file("extdata", "train.bed", package = "isotwas")
snps = bigsnpr::snp_attach(bigsnpr::snp_readBed2(train_bed,
                                                 backingfile = tempfile()))
dim(snps$genotypes[])

isoform_mat = readRDS(system.file("extdata", "isoform_exp.RDS", package = "isotwas"))
dim(isoform_mat)

This reads in a SNP matrix with dimensions $300 \times 4542$, with samples along the rows and SNPs along the columns and a isoform-level expression matrix of dimension $300 \times 3$, with isoforms along the columns.

[Practical consideration]{.underline}: Though this is not a requirement for the isotwas pipeline, we recommend a feature selection step based on cis-heritability of isoform-level expression. Prior of model training for isoforms of a given gene, we recommend that users compute the heritability of all expressed isoforms of a gene using GCTA (ideally the gcta-nr-robust executable in FUSION). We recommend users only train models for isoforms that have positive heritability at nominal $P < 0.05$ or $0.01$.

[Practical consideration]{.underline}: Prior to heritability analysis and model training, it is imperative that your expression data is residualized by relevant covariates (i.e. principal components of genotype matrix, hidden covariates or PEER factors of the expression matrix, clinical/demographic covariates). A general rule of thumb is if you would include the covariate in an eQTL/isoform-level eQTL analysis, you should include it in the residualization process.

Let's jitter our expression matrix a little to generate 10 bootstrapped estimates of the matrix. Keep in mind that these replicates, theoretically, should not affect the effect size estimates but will help in seeding the estimation process of the correlation between isoforms/error in the multivariate model.

boot_list = lapply(1:10,function(i){jitter(isoform_mat)}) 
isoform_mat_rep = rlist::list.rbind(boot_list) 
rownames(isoform_mat_rep) = rep(paste0('Sample',1:nrow(snps$fam)),10) 
colnames(isoform_mat_rep) = paste0('Isoform',1:ncol(isoform_mat_rep)) 
dim(isoform_mat_rep)

Now, we can train the model using the compute_isotwas() function. Make sure the SNP matrix has clear labels for the SNP identifiers that are relevant to your study. This is a large SNP matrix which will take a few minutes to run on your local machine.

snp_mat = as.matrix(snps$genotypes[])
colnames(snp_mat) = snps$map$marker.ID
rownames(snp_mat) = snps$fam$sample.ID
isotwas_model = compute_isotwas(X = snp_mat, 
                                Y = isoform_mat, 
                                Y.rep = isoform_mat_rep,
                                R = 10, 
                                id = rownames(isoform_mat_rep), 
                                omega_est = 'replicates', 
                                omega_nlambda = 5, 
                                method = c('mrce_lasso', 
                                           'multi_enet', 
                                           'univariate',
                                           'joinet',
                                           'spls'),
                                predict_nlambda = 10, 
                                family = 'gaussian', 
                                scale = FALSE, 
                                alpha = 0.5, 
                                nfolds = 5, 
                                verbose = TRUE, 
                                tx_names = paste0('Isoform',1:3), 
                                seed = 1789, 
                                run_all = FALSE, 
                                return_all = TRUE)

A few notes on the options in the function:

We can convert the model from a list (the output from compute_isotwas()) to a tibble or data.frame using convert_model(isotwas_model).

Let's take a look at what a sample model looks like:

model_isotwas = readRDS(system.file("extdata", "model_example.RDS", package = "isotwas"))
class(model_isotwas)
length(model_isotwas)
names(model_isotwas)
model_isotwas$Model[[1]]

This model object is a list with a Model and a R2 slot. The Model slot stores a list of models for each isoform.

We can convert this model from a list to a tibble using the convert_model() function. Make sure you have an annotation object to have position and REF/ALT allele information.

model_tsv = convert_model(model_isotwas,
                          snp_annot = snps$map,
                          snp_var = 'marker.ID')
model_tsv


bhattacharya-a-bt/isoTWAS documentation built on Jan. 9, 2025, 10:25 p.m.