knitr::opts_chunk$set( message = FALSE, warning = FALSE, collapse = TRUE, comment = "#>" )
creedenzymatic is a pipeline R package to combine, standardize and visualize kinomic analyses from different tools (KRSA, UKA, KEA3, PTM-SEA)
# install.packages("devtools") #devtools::install_github("kalganem/creedenzymatic")
library(creedenzymatic) library(tidyverse) theme_set(theme_bw())
The different kinomic tools (KRSA, UKA, KEA3, PTM-SEA) produce different outputs with various scoring metrics. To standardize the input in this package, we want to extract only two variables from each tool: Kinase and Score.
Kinase: will be the main identifier used in each tool. For example, in KRSA this variable will be indicating kinase family, but with UKA it will be individual kinases.
Score: The main metric used in the corresponding tool. For example, in KRSA this will be a Z score, in UKA it will be either kinase statistic or kinase score.
The format of the input is a dataframe with two columns: Kinase and Score. Many of these tools give outputs with many columns beyond these two. So, we need to convert the output from these tools to fit the input format needed here. An example from KRSA and UKA is shown below.
Usually the final KRSA output has many columns like: Kinase, SamplingAverage, SD, Z, ... etc. So, we need to read that file and only select Kinase and Z score columns, and then rename the Z score column to "Score"
# reading an example of KRSA output krsa_ex <- read_delim("data_files/KRSA_ex.txt", delim = "\t") krsa_ex
Now we select the Kinase and Z score columns, and then rename the Z score column to "Score"
# selecting the Kinase and Z columns, and renaming Z to Score krsa_ex %>% select(Kinase, Z) %>% rename(Score = Z) -> krsa_ex krsa_ex
We do the same with the UKA output, take raw output and select kinase and either the kinase statistic or final score as the scoring metric
# reading an example of UKA output uka_ex <- read_delim("data_files/UKA_ex.txt", delim = "\t") uka_ex
Let's say we want the final score as the metric we want to use, so we select the two columns: kinase and median final score
# selecting the Kinase and kinase statistic columns, and renaming kinase statistic to Score uka_ex %>% select(`Kinase Name`, `Median Final score`) %>% rename(Kinase = `Kinase Name`, Score = `Median Final score`) -> uka_ex uka_ex
There are dedicated functions to read and rank KRSA and UKA tables in the creedenzymatic package (read_krsa, read_uka). These two functions will check the input format (must include two columns: Kinase and Score) and then will the rank_kinases function with the specified arguments. The argument are trns and sort. The trns argument is for transformation of the score, the values accepted for this argument are abs and raw (abs: use absolute values of scores, raw: no transformation). The sort argument accepts either asc or desc (ascending and descending)
# read and rank the KRSA table and use absolute values and descending sorting read_krsa(krsa_ex, trns = "abs", sort = "desc") -> krsa_table_ranked # read and rank the UKA table and use absolute values and descending sorting read_uka(uka_ex, trns = "abs", sort = "desc") -> uka_table_ranked # preview of a ranked KRSA table krsa_table_ranked
Now we combine the ranked tables into one dataframe using the combine_tools() function.
# combine ranked tables combine_tools(KRSA_df = krsa_table_ranked, UKA_df = uka_table_ranked) -> combined_df combined_df # to save file # write_delim(combined_df,"ce_combined_ranked_file.txt", delim = "\t")
We can visualize the results using the quartile_figure() function (which will return a ggplot). Let's we want to plot all of the kinases found in quartile 3 or 4 either in KRSA or UKA.
# filter out kinases found in quartile 3 or 4 either in KRSA or UKA and use the quartile_figure() for visualization combined_df %>% filter(Qrt >= 3) %>% pull(hgnc_symbol) %>% unique() -> sig_kinases combined_df %>% filter(hgnc_symbol %in% sig_kinases) %>% quartile_figure()
To run KEA3, we need a subset of peptides as an input (this subset of peptides could represent top differentially phosphorylated peptides, peptides mapped to a recombinant kinase, ... etc). Both KRSA and UKA calculate log2 fold change (LFC) values of peptide phosphorylation levels. These LFC values could be used to determine the top peptides using either an LFC cutoff or rank order of peptides.
# an example of file with peptide IDs and LFC kea_ex <- read_delim("data_files/KEA3_ex.txt", delim = "\t") kea_ex %>% select(Peptide, Score = totalMeanLFC) -> kea_ex # run kea3, using the peptide list with a cutoff value read_kea(kea_ex, filter = T, cutoff = 0.4, cutoff_abs = T, sort = "asc", trns = "abs", rm_duplicates = T, method = "MeanRank", lib = "kinase-substrate") -> kea_table_ranked # quartile figure using kinase family as the grouping factor combine_tools(KRSA_df = krsa_table_ranked, UKA_df = uka_table_ranked, KEA3_df = kea_table_ranked) %>% filter(hgnc_symbol %in% sig_kinases) %>% quartile_figure() # quartile figure using kinase groups as the grouping factor (more useful in STK runs) combine_tools(KRSA_df = krsa_table_ranked, UKA_df = uka_table_ranked, KEA3_df = kea_table_ranked) %>% filter(hgnc_symbol %in% sig_kinases) %>% quartile_figure(grouping = "group")
To run PTM-SEA, we need the LFC table of peptides as an input (Same input as KEA3)
read_ptmsea(kea_ex) -> ptm_table_ranked # quartile figure using kinase family as the grouping factor combine_tools(KRSA_df = krsa_table_ranked, UKA_df = uka_table_ranked, KEA3_df = kea_table_ranked, PTM_SEA_df =ptm_table_ranked ) %>% filter(hgnc_symbol %in% sig_kinases) %>% quartile_figure()
# load("data_files/runData.RData") # # aFull %>% # select(ClassName, SetStat) %>% # group_by(ClassName) %>% # mutate(meanSetStat = mean(SetStat,na.rm = T)) %>% ungroup() %>% # ggplot(aes(reorder(ClassName, meanSetStat), SetStat)) + # geom_boxplot(aes(fill = meanSetStat)) + # coord_flip()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.