In this vignette we illustrate how to use fsusieR to fine map seq data (ATACsaq, RNAseq, CHIPseq). In this example we focus on RNAseq data, but the process is similar for other seq data

knitr::opts_chunk$set(comment = "#",collapse = TRUE,results = "hold",
                      fig.align = "center",dpi = 120)

Let's first load the data

library(fsusieR)
data("RNA_seq_example")
Y=RNAseqData$Y
X=RNAseqData$X
base_pair=RNAseqData$pos

This are some synthetic data generated to mimic the RNAseq read in ENSG00000168710 in cortex and the corresponding SNPs effect ont the reads. the $Y$ matrix contains the RNAseq information of 96 individuals along that ENSG00000168710 spans over 39077 base pairs. The $X$ matrix contains information about the individual genotype level

print(dim(Y))
plot( base_pair, Y[1,], main=" Observed base pair count for individual 1")

To limit the computational burden we "bin" the data by the dividing the gene body into 1024 equally large bins. For each individual we sum up the individual counts over each the bin using the bin_Y function. The output of the function returns, the binned Y data The corresponding bin position ** the bon size

Y_binned= bin_Y(Y,base_pair = base_pair)
Y=Y_binned$binned_data
print( dim( Y))
pos = Y_binned$pos
print(pos[1:10])
bin_size=Y_binned$bin_size
print( bin_size)

Here the corresponding binned count

plot(pos, Y[1,], main=" Observed binned count for individual 1")

Normalization

Now we want to assert the effect of the genotype of the RNAsea profile We scale individual count profile by normalizing in a gene-wise fashion as follow. Thus allowing to assess the change of the individual profile We asses the effect of the SNP using the $f(x) =log(1+x)$ transform

size_factor=rowSums(Y)/ mean(rowSums(Y))
Y_cor =   (  Y /as.vector(size_factor))

plot(pos, log1p(Y[1,] ))

Univariate regression

the fsusieR package provide a series of function to gauge the effect of genotype of the observed counts. The simplest function to start with is the univariate_functional_regression that allow you to compute the conditional count distribution given a covariate.

Here we look at the effect of SNP 261 on the observed log1p counts.

We fit the data using an HMM model, you can also use wavelet re

uni_res= univariate_functional_regression(Y=log1p(Y_cor),
                                          X=as.matrix(X[,261],
                                                      ncol=1),
                                          method="smash",
                                          alpha = 0.1
                                          )
plot(pos, uni_res$effect_estimate, 
     ylab = "estimated effect",
     xlab="base pair",
     type="l",
     lwd=2,
     col="royalblue",
     ylim=c (min(uni_res$cred_band),max(uni_res$cred_band)))
lines( pos,uni_res$cred_band[1,], 
       lty=2,
       lwd=1.4,
       col="royalblue")
lines(pos, uni_res$cred_band[2,], 
        lty=2,
       lwd=1.4,
       col="royalblue")
abline(h=0)

Visualize effect in context

Great you have estimate the effect of a SNP on some count. Let's improve this visualization a bit so we can get some biological interpretation. Below a quick tutorial to put this results in context using GViz

library(AnnotationHub)
library(org.Hs.eg.db)
library(GenomicRanges)
library(Gviz)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)


txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
pos0 = pos[1]
pos1= pos[length(pos)]+bin_size
positions=pos
chr=RNAseqData$chr

Create the effect track

effect=effect=rbind(uni_res$effect_estimate,
             uni_res$cred_band,
             rep(0,length(uni_res$effect_estimate)))
group_colors <- c("black" ,"royalblue","royalblue","royalblue" )
group_lwd= c(1,2,1,1)
x=X[,261]
read_counts <-  rbind( colMeans(log1p(Y_cor[x == 1,])),
                       colMeans(log1p(Y_cor[x == 0,])) 
                           )
obs_effect=rbind (read_counts[2,]-read_counts[1,],
                      rep(0, length(read_counts[2,] )) ) 

ylim= c( min( c(effect,obs_effect)), max(c(  effect,obs_effect)))

group_cred= c(1:3,0)  
effect_track <-
    DataTrack(range = GRanges(seqnames = chr,
                              ranges = IRanges(start = positions,
                                                 end = positions + 1)),
              ylim=ylim,
              data = effect,
              genome = "hg38",
               groups= group_cred,
              name = "Effect SNP 261",
              type = "l",   
              lwd= group_lwd,  
              col = group_colors,
              track.margin = 0.05,
              cex.title = 0.6,
              cex.axis = 0.6,
              col.axis = "black",
              col.title = "black",
              fontface = "plain",
              background.title = "white",
              fontface.title = 1, 
              legend=FALSE)

Create the gene track

  # Create a "gene region" track.
  gene_track <- GeneRegionTrack(txdb,
                                genome = "hg38",
                                chromosome = chr,
                                pos0 = pos0,
                                pos1 = pos1,
                                name = "",
                                showId = TRUE,
                                geneSymbol = TRUE,
                                col.axis = "black",
                                col.title = "black",
                                transcriptAnnotation = "symbol",
                                rotation.title = 0,
                                cex.title = 0.6,
                                col = "salmon",
                                fill = "salmon",
                                background.title = "white")

Create the observed data point track

x       <- X[,261]

read_counts <-  rbind( colMeans(log1p(Y_cor[x == 1,])),
                       colMeans(log1p(Y_cor[x == 0,])) 
                           )

groups <- c("SNP 261 = 0",
                "SNP 261 = 1" )
geno_colors <- c("turquoise", "navyblue" )
lab_y = "avg. log1p count" 
  data_track <- DataTrack(range = GRanges(seqnames = chr,
                                          ranges = IRanges(start = positions,
                                                           end = positions + 1)),
                          data = read_counts,
                          genome = "hg38",
                          groups = groups,
                          name = lab_y  , 
                          type = "p" , #"p",#type = "l",
                          col = geno_colors  ,
                          track.margin = 0.05,
                          cex.title = 0.6,
                          cex.axis = 0.6,
                          col.axis = "black",
                          col.title = "black",
                          fontface = "plain",
                          background.title = "white",
                          fontface.title = 1,
                          cex.legend = 0.6,
                          cex=0.2)
obs_diff_track <-
    DataTrack(range = GRanges(seqnames = chr,
                                          ranges = IRanges(start = positions,
                                                           end = positions + 1)),
                          data =  read_counts[1,]- read_counts[2,],
              genome = "hg38",
              ylim=ylim,             
              name = "observed difference \n between genotype",
              type = "p", 
              lwd= 2,
              col = "blue",
              track.margin = 0.05,
              cex.title = 0.6,
              cex.axis = 0.6,
              col.axis = "black",
              col.title = "black",
              fontface = "plain",
              background.title = "white",
              fontface.title = 1, 
              legend=FALSE)
obs_diff_track2 <-
    DataTrack(range = GRanges(seqnames = chr,
                              ranges = IRanges(start = positions,
                                               end = positions + 1)),
              data =  rep(0, 1024), 
              genome = "hg38",
              ylim= ylim, 
              name ="Observed difference ",
              type = c("l"  ),
              col = c("black" ),
              track.margin = 0.05,
              cex.title =  0.6,
              cex.axis =  0.6,
              col.axis = "black",
              col.title = "black",
              fontface = "plain",
              background.title = "white",
              fontface.title = 1,
              legend = FALSE )
  obs_diff_track =OverlayTrack(trackList = list(obs_diff_track,
                                                  obs_diff_track2),background.title = "white")

Now you can stack the track to visualize them together

 tracks <- c( 
      effect_track,
      obs_diff_track,
      data_track,
      gene_track 
    )

plotTracks(tracks, from = pos0, to = pos1, sizes = c(  3,3, 3, 1 ))

Ze might have an effect of SNP 261 but it not a given, now let's fine map this data using fsusie

Fine mapping SNP effect on RNA seq

The example above is nice, however, this naive approach cannot asses if the effect of SNP 261 4 is real or do to linkage disequilbrium with another SNP.

To solve this problem with developed susiF. It can be run as follow. Note that it will take a bit of time so you can load the results instead if you are not patient

library(fsusieR)
result= susiF(Y=log1p(Y_cor),X=as.matrix(X), L=20 , post_processing = "smash"             )
#result= RNAseqData$results

Notice that our candidate in not part of the SNP being selected by susiF

result$cs

Congrats you have fine map the SNP on our mock RNAseq data. You can visualize the results using our custom functions plot_susiF to plot both the PIP and the fitted functio, or plot_susiF_pip for th PIPS only and plot_susiF_effect for visualizing just the effect

plot_susiF(result
           )
res_HMM = change_fit   ( result,
                      log1p(Y_cor),
                      X, 
                      to="HMM"
)

You can also acces the results directly as follow

cs=2 
pos=  result$outing_grid
plot(x= pos ,
     y= result$fitted_func[[cs
                         ]], 
     ylab = paste("estimated effect for CS  ",cs),
     xlab="base pair",
     type="l",
     lwd=2,
     col="royalblue",
     ylim=c (min(result$cred_band[[cs]]),max(result$cred_band[[cs]])))
lines( x= pos ,
       y= result$cred_band[[cs]][1,], 
       lty=2,
       lwd=1.4,
       col="royalblue")
lines(  x= pos ,
        y= result$cred_band[[cs]][2,], 
        lty=2,
       lwd=1.4,
       col="royalblue")
abline(h=0)
tt= do.call( rbind , result$lBF)

plot(apply(tt,2,max), col =ifelse(result$pip>0.2,2,3), pch=20)

We also developped a custom function to visualize the fSuSiE results using GViZ

You can use the



stephenslab/susiF.alpha documentation built on June 11, 2025, 1:09 p.m.