In AskExplain/GeneCodeR: Transforms between image and gene expression modalities from Spatial Transcriptomics

GeneCodeR analysis of Spatial Transcriptomic data

Imaging tissue contains genes spatially organised by the extracellular matrix, individual cell types and groups of cell compartments. Using linear algebra and vectors, the expected patterns of the histology images are learned (example patterns of (1): extracellular matrix stained in eosin, and (2): nuclei of cells stained in hematoxylin). Assigning each sample image to a learned pattern (by the Generative Encoder algorithm) is similar to assigning an image weight to a pattern vector. Accordingly, genes are also assigned weights to the same set of expected image patterns. This encodes the information into the same space.

The library that wraps around the Generative Encoder algorithm (gcode) is called GeneCodeR and provides a more comprehensive interface to load data from path, run gcode, extract relevant transformations, and validate the learned patterns.

# Main libraries for analysis
library(GeneCodeR)
library(gcode)

# Main libraries for plotting
library(ggplot2)
library(grid)
library(gridExtra)

Please replace these paths

main_path <- "~/Documents/main_files/AskExplain/Q4_2022/gcode/"

# Please replace these paths
path_to_save <- paste(main_path,"./temp_save_dir/",sep="")
path_to_work <- paste(main_path,"./temp_work_dir/",sep="")

Download data and format

dir.create(path_to_save)
dir.create(path_to_work)

setwd(path_to_work)
curl::curl_download(url = "https://prod-dcd-datasets-cache-zipfiles.s3.eu-west-1.amazonaws.com/29ntw7sh4r-5.zip", destfile = paste(path_to_work,"he_et_al.zip",sep=""))
utils::unzip(zipfile = paste(path_to_work,"he_et_al.zip",sep=""), exdir = path_to_work)
path_to_work <- paste(path_to_work,"29ntw7sh4r-5",sep="")

setwd(path_to_work)

main_files <- list.files(path_to_work, full.names = T)
new_files <- gsub(pattern = "BC", replacement = "BT", x = main_files)
file.rename(from = main_files, to = new_files)

GeneCodeR takes in three input files from a generic Spatial Transcriptomic dataset, per tissue slide:

1 - coordinates 2 - gene expression 3 - images

# Input list of files

file_path_list <- GeneCodeR::extract_path_framework(F)

file_path_list$meta$path <- list.files(path_to_work,pattern = "*Coords.tsv.gz",full.names = T)
file_path_list$coord$path <- list.files(path_to_work,pattern = "*csv.gz",full.names = T)
file_path_list$gex$path <- list.files(path_to_work,pattern = "*stdata.tsv.gz",full.names = T)
file_path_list$pixel$path <- list.files(path_to_work,pattern = "*jpg",full.names = T)

set.seed(1)
train_ids <- sample(c(1:length(file_path_list$coord$path)),round(length(file_path_list$coord$path)*0.9,0))
test_ids <- c(1:length(file_path_list$coord$path))[-train_ids]

train_file_path_list <- lapply(file_path_list,function(X){
  list(path=X$path[train_ids])
})
test_file_path_list <- lapply(file_path_list,function(X){
  list(path=X$path[-train_ids])
})

The GeneCodeR configuration file takes in several parameters.

The first set of parameters is information to extract spots - such as spot size, and any augmentations such as displacement of the coordinates, or rotation of the image.

The second set of parameters is for after the gcode model is learned (via GeneCodeR wrapper). This is to transfrom from one modality, to another modality (for example, from image to gene expression).

# Set up genecoder configuration parameters

genecoder.config <- GeneCodeR::extract_config_framework(F)

The meta information file takes in several parameters per coordinate, gene expression and image.

The first set of parameters is to read in the file. Here, example parameters such as the file type (for example, is it a .csv, .tsv, .jpg, .png, market matrix file (.mtx)?), and other important parameters based on file type (for example, does it contain a header, what is the separator ...).

Importantly, another set of parameters is the factor identifier in the coordinate file - which columns contain the x-coordinates, y-coordinates and the label identifier for the spots? For example, the label identifiers, are given by the rownames indicated by a 0.

# Set up meta information

meta_info_list <- GeneCodeR::extract_meta_framework(F)
meta_info_list$meta$read_file$file_type <- "tsv"
meta_info_list$coord$read_file$file_type <- "csv"
meta_info_list$gex$read_file$file_type <- "tsv"
meta_info_list$pixel$read_file$file_type <- "image"

meta_info_list$meta$read_file$meta$sep <- "\t"
meta_info_list$coord$read_file$meta$header <- meta_info_list$gex$read_file$meta$header <- meta_info_list$meta$read_file$meta$header <- T
meta_info_list$coord$read_file$meta$sep <- ","
meta_info_list$gex$read_file$meta$sep <- "\t"
meta_info_list$meta$read_file$meta$row.names <- 1
meta_info_list$coord$read_file$meta$quote <- meta_info_list$gex$read_file$meta$quote <- meta_info_list$meta$read_file$meta$quote <- ""
meta_info_list$coord$read_file$meta$row.names <- 1
meta_info_list$gex$read_file$meta$row.names <- 1

meta_info_list$coord$factor_id$labels <- 0
meta_info_list$coord$factor_id$coord_x <- 1
meta_info_list$coord$factor_id$coord_y <- 2

To run the analysis appropriately, a set of common genes are required for all spots.

common_genes <- lapply(c(1:length(file_path_list$gex$path)),function(X){
  print(X)
  colnames(GeneCodeR::read_file(path = file_path_list$gex$path[X], meta_info = meta_info_list$gex$read_file)[[1]])
})

meta_info_list$gex$factor$common_genes <- Reduce("intersect",common_genes)

Coordinate labels also need to be extracted.

meta_info_list$coord$factor$labels <- lapply(c(1:length(train_file_path_list$coord$path)),function(X){
  row.names(GeneCodeR::read_file(path = train_file_path_list$coord$path[X], meta_info = meta_info_list$coord$read_file)[[1]])
})

Gene expression data is extracted using the path list, meta information, and configuration parameters.

# Extract gex data
gex_data <- GeneCodeR::prepare_gex(file_path_list = train_file_path_list,meta_info_list = meta_info_list,config = genecoder.config)

Importantly, labels for the meta information from the gene expression data extraction is required for extracting relevant spots that contain gene expression greater than background.

# Store meta data
meta_info_list$gex$factor$labels <- gex_data$labels


# Extract spot data

genecoder.config$extract_spots$window_size <- 30

spot_data <- GeneCodeR::prepare_spot(file_path_list = train_file_path_list,meta_info_list = meta_info_list,config = genecoder.config, gex_data = gex_data$gex)

Relevant frameworks are set up to begin learning the Generative Encoder model.

Firstly, set up the data list, which stores the gene expression and spot data.

# Set up data

data_list <- list(spot_data$spot,as.matrix(spot_data$gex))

The join framework is set up, creating the model structure that learns (1): a similar function for "all" datasets, or (2): separate functions across gene expression and image pixel spots.

Here, only the incode and the beta parameters are set up to learn the individual patterns between genes and images. Given the samples are extracted from the same tissue site (image and genes at the same spot), the sample patterns are shared. The feature patterns and the sample patterns (see first paragraph - genes assigned to expected image patterns of hematoxylin and eosin stains) are mapped to a shared latent space via the alpha and beta codes. For example, the use of image patterns is analogous to pattern matching to the example image spot, the latent codes un-rotate any example image spots until it best fits the image pattern.

# Set up join information

join <- gcode::extract_join_framework(F)
join$complete$data_list <- c(1,2)
join$complete$alpha_sample <- c(1,1)
join$complete$beta_sample <- c(1,2)
join$complete$alpha_signal <- c(1,1)
join$complete$beta_signal <- c(1,2)
join$complete$code <- c(1,1)
join$covariance <- c(0,0)

reference <- gcode::extract_references_framework(F)
reference$data_list <- c(1)

Set up configuration information - the dimensions of the encoding space for the samples (i_dim), and the features (j_dim), as well as the dimensions of the latent code (k_dim) are set up. The initialisation for these parameters are set up via the irlba library using partial Singular Value Decomposition.

# Set up gcode config
gcode.config <- gcode::extract_config(F)
gcode.config$init <- list(alpha_sample="irlba",beta_sample="irlba")
gcode.config$i_dim <- 500
gcode.config$j_dim <- 500
gcode.config$dimension_reduction <- F
gcode.config$max_iter <- 100
gcode.config$seed <- 1
gcode.config$tol <- 1
gcode.config$n.cores <- 5
gcode.config$learn_rate <- 1e-1
gcode.config$batch_size <- 500

Run the Generative Encoder model

# Set up gcode model

genecoder.model <- GeneCodeR::learn_model(data_list = data_list, config = gcode.config, join = join, reference = reference)

Now begins the testing validation phase...

First extract the test set of the coordinate factor labels.

# Evaluate on test set

meta_info_list$coord$factor$labels <- lapply(c(1:length(test_file_path_list$coord$path)),function(X){
  row.names(GeneCodeR::read_file(path = test_file_path_list$coord$path[X], meta_info = meta_info_list$coord$read_file)[[1]])
})

Next extract the test set of the gene expression data.

# Extract test gex data
test_gex_data <- GeneCodeR::prepare_gex(file_path_list = test_file_path_list,meta_info_list = meta_info_list,config = genecoder.config)
meta_info_list$gex$factor$labels <- test_gex_data$labels

Save relevant information, including the model, the test meta information, the test file list and the test gene expression data. This is to analyse results later.

base_test_spot_data <- GeneCodeR::prepare_spot(file_path_list = test_file_path_list,meta_info_list = meta_info_list,config = genecoder.config, gex_data = test_gex_data$gex)
non_zero_markers <- base_test_spot_data$gex>0


save.image(file = paste(sep="",path_to_save,"all_genecoder.RData"))

AskExplain/GeneCodeR documentation built on Jan. 3, 2023, 3:52 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

AskExplain/GeneCodeR
Transforms between image and gene expression modalities from Spatial Transcriptomics

In AskExplain/GeneCodeR: Transforms between image and gene expression modalities from Spatial Transcriptomics

GeneCodeR analysis of Spatial Transcriptomic data

R Package Documentation

Browse R Packages

We want your feedback!

AskExplain/GeneCodeR Transforms between image and gene expression modalities from Spatial Transcriptomics

In AskExplain/GeneCodeR: Transforms between image and gene expression modalities from Spatial Transcriptomics

GeneCodeR analysis of Spatial Transcriptomic data

R Package Documentation

Browse R Packages

We want your feedback!

AskExplain/GeneCodeR
Transforms between image and gene expression modalities from Spatial Transcriptomics