This script is designed to process MS-CETSA melt curve result from TMT labelling and quantification scheme.
The main tasks include:
1. Read in the quantitative CETSA data (Search result txt.file from Proteome Discoverer, PD) into R/Rstudio
2. Clean up data, mainly to remove the entries without quantitative information
3. Normalize data based on the principle that the overall median of relative protein fold changes across the
heating temperature range should follow a sigmoidal trend
4. Try to fit each entry of data into a temperature-response relationship and capture the fitting parameters
5. Visualize the data using customizable plot layout, for either all the data or a subset
For better organization of data analysisk, it is highly recommended to use the Project management feature in Rstudio.
For each project, you are suggested to create a brand new working folder as your local working directory.
Save the PD derived data files to this working directory folder.
At the right upper corner of RStudio, click Project -> New Project -> Existing Directory -> browse and choose
the designated working directory folder.
By establishing a specfic project for each experiment, all the data content and execution history will be saved
in the project for future re-analysis even when R session is terminated.
Activate mineCETSA
package by library it.
library("mineCETSA")
ms_rawread()
is designed for read in one or many files in a vector of file names.PDversion=21
assume the data is searched using PD2.1; otherwise, specify the exact PD version used.temp
(short for temperature) vector that was the heat challenge the samples/cells were exposed to in CETSA treatment, corresponding to the labelling done from 126 up to 131, of which the function default is temp=c(37,40,43,46,49,52,55,58,61,64)
. LY <- ms_rawread(c("LY_K562_Ctrl_rep1_PD21_Proteins.txt","LY_K562_Ctrl_rep2_PD21_Proteins.txt","LY_K562_Treatment_rep1_PD21_Proteins.txt","LY_K562_Treatment_rep2_PD21_Proteins.txt"))
ms_conditionrename()
.incondition
corresponses to the vector of current condition naming, whereas outcondition
correponses to vector of new condition naming, both order and length should match exactly. LY1 <- ms_conditionrename(LY, incondition=c("Treatmentrep1.3", "Treatmentrep2.4"), outcondition=c("Treatment.1","Treatment.2"))
ms_clean()
.remsinglecondprot=TRUE
, these orphan proteins will be excluded from downstream analysis. LY_cleaned <- ms_clean(LY1)
ms_isoform_resolve()
.ms_isoform_consolidate()
. LY_cleaned1 <- ms_isoform_resolve(LY_cleaned) # The next isoform cleaning step is optional. You are encouraged to double check and adjust the match table entries. LY_cleaned2 <- ms_isoform_consolidate(LY_cleaned1, matchtable="./subfolder/tobe_consolidated.txt")
ms_scaling()
or ms_forcescaling()
.ms_scaling()
function achieves: The medians of soluble protein ratios are fitted into a sigmoidal function of the temperature rangeLY_scaled <- ms_scaling(LY_cleaned2)
The ms_forcescaling()
fucntion achieves a systematic scaling by force scaling the medians to be same as the medians from a reference subset/data. However, make sure you have enough good reason to perform forcescaling.
LY_fs_all <- ms_forcescaling(LY_cleaned, refcondition="all") LY_fs <- ms_forcescaling(LY_cleaned, refcondition=c("Ctrl.1", "Ctrl.2"))
ms_fitting()
.Tm
, R2
, Slope
, Plateau
(bottom plateau)).LY_fitted <- ms_fitting(LY_scaled)
ms_ggplotting()
.ms_ggplotting(LY_scaled)
ms_ggplotting_rep()
.nread=10
, how many readings per curve (from lowest heating temperature) were used for plotting, default is 10 levelvector
, a vector of conditions to be specified, preferably starting with Ctrl or Reference sets remsinglecondprot=TRUE
, whether orphan proteins to be plotted, default to exclude them from plotting nreplicate=2
, how many replicates there are for each condition, default is duplicate thus 2, change the value accordingly, this is relavent to automatic coloring scheme, up to 4 is possible for now presetcolor=TRUE
, whether automatically apply the preset coloring scheme for each condition, otherwise, provide a vector for example colorpanel=c("black", "red", "blue", "orange")
to customize the colors pfdatabase=FALSE
, only set to TRUE when the input dataset is searched with malaria database layout=c(5,5)
, the default multiplot layout is 5 times 5 in each page pdfheight=12
, set the height of pdf file pdfwidth=12
, set the width of pdf file ms_ggplotting_rep(LY_scaled, levelvector=c("Ctrl.1", "Ctrl.2", "Treatment.1","Treatment.2")) ms_ggplotting_rep(LY_fitted, levelvector=c("Ctrl.1", "Ctrl.2", "Treatment.1","Treatment.2"))
ms_filewrite()
to a local txt file. ms_filewrite(LY_scaled, "CETSA_data_scaled.txt") ms_filewrite(LY_fitted, "CETSA_data_with_fitting_parameters.txt")
ms_fileread()
from a saved txt file. MyCETSAdata <- ms_fileread("./subfolder_name/myCETSAdata_to_readin.txt") # . indicate the current working directory # Here the subfolder_name refer to a first level folder name directly under current working directory # Remember that typically now you are still working in the main working directory # The subfolder should contain the data file to read in
ms_subsetting()
.isfile=TRUE
, whether the provided hitidlist is in a txt. file, default set to TRUE allisoform=TRUE
, whethre to retrive other isoforms of the same parental Uniprot ID, default set to TRUE revsel=FALSE
, short for reverse selection, when set to TRUE, the output is the dataset after removing the targets, default set to FALSE CETSAdata_subset <- ms_subsetting(LY_fitted, hitidlist="hit_list.txt", isfile=TRUE) CETSAdata_subset <- ms_subsetting(LY_fitted, hitidlist=c("P00000", "P12345-6"), isfile=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.