In RaikOtto/ArtDeco: Alikeness Based on Regression on Transciptome DECOnvolution

artdeco R package

A R package that determine the differentiation stage of pancreatic neuroendocrinal neoplasia (PANnen) based on bulk RNA-seq of or mRNA array exepriments on cancer-tissue.

1 How to make it work: Quickstart

Installing artdeco

Start an R session e.g. using RStudio

if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager")

BiocManager::install("artdeco")

Test run

57 representative PANnen bulk RNA-seq samples are shipped along with the R package. These will in the following be analyzed by the artdeco algorithm to demonstrate the utilization of artdeco.

library("artdeco")

# obtain path to testdata
path_transcriptome_file = system.file(
    "/Data/Expression_data/PANnen_Test_Data.tsv",
    package="artdeco"
)

# testrun
deconvolution_results_absolute = Determine_differentiation_stage(
    transcriptome_file_path = path_transcriptome_file,
    baseline = "absolute"
)

# show results
deconvolution_results_absolute[1:2,]

The call returns a dataframe that contains the differentiation stage similarity predictions.

Explanation test run results

Percent-wise similarities of the testdata samples to all scRNA derived training samples are shown, both as written interpretation e.g. "significant similarity (to this cell type)" or "no significant similarity (to this cell type)" as well as percent scalars.

Explanation percent similarities

The percent value quantifies to what extend a given transcriptome can be reconstructed utilizing a given reference cell type e.g. by the insulin producing 'beta' cell type as basis. The nu-SVM regression based similarity determination has the limitation that the returned unit-less scalar is challenging to interpret: It is not clearly decidable when a scaler indicates a high or low similarity. Thus, artdeco interprets the returned scalar. Measurements may be calculated based on a trained model (absolute) or relative to all analyzed of within a given run (relative). The default 'absolute' mode interprets the return scalar for a given samples relative to the maximal similarity measured for each model subtype during training. I.e. the returned scaler is divided by the maximal similarity observed for training set samples. E.g. when a scRNA beta cell received the similarity 5.7 to the beta set, than all subsequent beta similarities are divided by 5.7 to show how similar they are relative to the self-identification of cell subtypes. The 'relative' mode divides by the maximal similarity measured for a given set of query data, i.e. the maximally measured similarity is taken as divisor.

# testrun in relative mode
deconvolution_results_relative = Determine_differentiation_stage(
    transcriptome_file_path = path_transcriptome_file,
    baseline = "relative"
)

# show results
deconvolution_results_relative[1:2,]

2 Visualization of similarity predictions

It is important to interpret the simialrity predictions given the clustering of the data. In order to detect clusters, a correlation heatmap can be used along with an overlay of similarity predictions. The rational is, that samples that show a comparable differentiation state should cluster together, which is why a heatmap creator is included in the artdeco package and can be utilized as follows. First the analyzed transcriptome data is load and afterwards the similarity predictions from step one are being passed on to the visualization function.

visualization_data_path = system.file(
    "/Data/Expression_data/Visualization_PANnen.tsv",
    package="artdeco")

create_heatmap_differentiation_stages(
    visualization_data_path,
    deconvolution_results_relative
)

create_heatmap_differentiation_stages(
    visualization_data_path,
    deconvolution_results_absolute
)

3 Adding and removing models

Adding a new model requires 1) training data and 2) a labeling vector of the training data specifying the cell type of each training sample

meta_data_path = system.file("Data/Meta_information/Meta_information.tsv", package = "artdeco")
meta_data      = read.table(
    meta_data_path, sep ="\t", header = TRUE,
    stringsAsFactors = FALSE
)
subtype_vector = meta_data$Test_Subtype # extract the training sample subtype labels

subtype_vector[1:6] # show subtype definition

training_data_path = system.file(
    "Data/Expression_data/PANnen_Test_Data.tsv", package = "artdeco")

add_deconvolution_training_model(
    transcriptome_data_path = training_data_path,
    model_name = "My_model",
    subtype_vector = subtype_vector,
    marker_gene_list = list(),
    training_nr_marker_genes = 5,
    training_p_value_threshold = 0.05,
    training_nr_permutations = 100
)

Note that the training in the current version (1.0.0) of artdeco has to have HGNC symbols as rownames.

Remove models is fairly straight forward:

models = list.files(system.file("Models/", package = "artdeco"))
print(models)
model_to_be_removed = models[length(models)]
remove_model(
    model_name = model_to_be_removed
)

4 Important: data format

Please note that the expression data always has to have HGNC symbols as rownames and that the first entry of the column names indicates the hgnc symbols, not the first sample name.

training_data_path = system.file(
    "Data/Expression_data/PANnen_Test_Data.tsv", package = "artdeco")
expression_matrix = read.table(
    training_data_path,
    sep ="\t",
    header = TRUE,
    row.names = 1)

expression_matrix[1:5,1:5]