knitr::opts_chunk$set(
    collapse = TRUE,
    dpi=60
)

Synopsis

chip - tee - snee

r if(FALSE){Githubpkg("jrboyd/chiptsne")}

This package is in active development.

Future updates will be focused on decreasing RAM usage.

Features

Functions

Installation and Loading

From github

dir.create("~/test_chipstne_install")
.libPaths("~/test_chipstne_install", include.site = FALSE)
if(!require("ssvQC")) devtools::install_github("FrietzeLabUVM/ssvQC")
if(!require("chiptsne")) devtools::install_github("FrietzeLabUVM/chiptsne")

Load the library

library(chiptsne)

Load optional useful libraries

theme_set(theme_classic())

Running t-sne

Set parameters

These parameters will determine how multiple functions behave

options("mc.cores" = 20)
color_mapping = c("H3K4me3" = "forestgreen",
                  "H3K27me3" = "firebrick1")

QcConf  

options("mc.cores" = 2)
perplexity = 15

The two critical inputs are:

  1. data.table containing bigWig files and cell and mark metadata.
  2. GRanges of regions to retrieve data for and perform t-sne on.
# input_dirs = c(
#     dir("/slipstream/galaxy/uploads/working/qc_framework/output/", 
#         pattern = "K4.+_pooled$", full.names = TRUE),
#     dir("/slipstream/galaxy/uploads/working/qc_framework/output/", 
#         pattern = "K27.+_pooled$", full.names = TRUE)
# )
# #assemble peak calls and sample
# peaks = dir(input_dirs, pattern = "pooled_peaks.narrowPeak", full.names = TRUE)
# olaps = ssvOverlapIntervalSets(easyLoad_narrowPeak(peaks), ext = 100)
# olaps = olaps[rowSums(as.data.frame(mcols(olaps))) > 3] 
# length(olaps)
# set.seed(0)
# # query_gr = sample(olaps, 100)
# query_gr = olaps
# query_gr = resize(query_gr, 5000, fix = "center")
# query_gr$id = names(query_gr)
# #create config from bws
# bws = dir(input_dirs, pattern = "pooled_FE.bw", full.names = TRUE)
data("query_gr")
bw_files = dir(system.file('extdata', package = "chiptsne"), 
               full.names = TRUE, pattern = ".bw$")
cfg_dt = data.table(file = bw_files)
cfg_dt[, c("cell", "mark") := tstrsplit(basename(file), "_", keep = 1:2)]

t-sne requires a wide-matrix view where each row is a point (observation) and each column is a dimension (attribute) of the data.

tsne_input = stsFetchTsneInput(cfg_dt, query_gr)

There's a bit of art to running t-sne, see this hands-on explanation to get oriented.

Be sure to set mc.cores using options() or provide the n_cores parameter to speed up processing time.

tsne_res = stsRunTsne(tsne_input$bw_dt, perplexity = perplexity)

Plotting

Describing the space

A basic plot facetted by the 3 cell lines.

ggplot(tsne_res, aes(x = tx, y = ty, color = tall_var)) + 
    geom_point() +
    facet_wrap("tall_var")

Interpreting the t-sne landscape can be difficult but it is critical for effectiveness of the visualization. We can plot representative profiles in the t-sne landscape.

The resolution of these images is controlled by n_points. Use xrng and yrng to specify a region to view in detail. A strength of t-sne is that it maps both broad trends and finer relationships effectively.

summary_dt = prep_summary(tsne_input$bw_dt, tsne_res, x_points = n_points)
img = prep_images(summary_dt, x_points = n_points,
                    line_color_mapping = color_mapping)
img_res = plot_summary_raster(img$image_dt, x_points = n_points, min_size = 0)
img_res

The same idea as before but now facetted by cell.

summary_dt_byCell = prep_summary(tsne_input$bw_dt, tsne_res, x_points = n_points, facet_by = "tall_var")
img_byCell = prep_images(summary_dt_byCell, 
                         x_points = n_points,
                         line_color_mapping = color_mapping, 
                         facet_by  = "tall_var")
img_bytall_var_res = plot_summary_raster_byCell(img_byCell$image_dt, x_points = n_points)
img_bytall_var_res

Velocity

Arrows identity how regions behave between two groups. Each arrow is drawn from the postion in tall_var_a to position in tall_var_b with color mapped to angle to aid in pattern detection.

vel_dt = prep_velocity(tsne_res, tall_var_a = "HUES48", tall_var_b = "HUES64")
vel_res = plot_velocity_arrows(vel_dt)
vel_res

Individual arrows can be overwhelming, we can also employ a binning strategy to summarize the average destination in t-sne space for groups of points. Each arrow here is drawn to the average desination (tall_var_b) for all nearby points (tall_var_a).

plot_regional_velocity(tsne_dt, tall_var_a = "HUES48", tall_var_b = "HUES64", x_points = n_points)

We can also layer the binned arrows on top of the binned profiles.

plot_regional_velocity(tsne_dt, tall_var_a = "HUES48", tall_var_b = "HUES64", 
                       x_points = n_points, p = img_res)#, p = img_res$plot)

Other

Individual profiles can also be viewed.

plot_profiles_selected(tsne_input$bw_dt, 
                       qtall_vars = c("HUES48", "HUES64"), 
                       id_to_plot = query_gr$id[1:5], 
                       color_mapping = color_mapping)


jrboyd/seqtsne documentation built on Nov. 5, 2022, 6:37 a.m.