Description Usage Arguments Details Value Author(s) References Examples
View source: R/CV.Signature.TCP.R
Denoise, classify, and evaluate variables (biomarkers) from time course data such as proteomics and other high-throughput technologies.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | CV.Signature.TCP(
dat,
timepoints = NULL,
center.dat = TRUE,
scale.dat = FALSE,
denoise = c("smooth.spline", "pca", "none"),
denoise.parameter = c("cv", "cv.global"),
dist.method = c("euclidean", "cor.diss", "dtw"),
cluster.method = c("kmeans", "hclust"),
K,
evaluate = TRUE,
verbose = FALSE,
seed = NULL
)
|
dat |
a data matrix with |
timepoints |
a vector of time points for columns of dat. |
center.dat |
a logical specifying to center the input and denoised data. By default, |
scale.dat |
a logical specifying to scale the input and denoised data. By default, |
denoise |
a denoising method. By default, fitting a cubic spline. |
denoise.parameter |
a parameter for a denoising method, such as the degree of freedom in spline.smooth, the number of significant PCs in PCA. |
dist.method |
a distance method for time course data, resulting in a |
cluster.method |
a clustering method. |
K |
a number of clusters. |
evaluate |
a logical specifying to evaluate the cluster membership with the jackstraw tests. By default, |
verbose |
a logical specifying to print the computational progress. By default, |
seed |
a seed for the random number generator. |
... |
optional arguments. |
This function combines multiple steps. For more options and fine-tuning, please use individual functions in 'CV.Signature.TCP' package.
This attempts to identify temporal dynamics, by clustering denoise and/or time-wrapped data.
This requires the user to input the data (dat
) where the rows and columns are variables (e.g., genes, proteins) and observations taken at different time points, respectively.
Correspondingly, timepoints
is a vector of actual time points (e.g., hours, days) corresponding to the columns of dat
.
This function goes through the following steps:
denoise temporal data using cubic splines (denoise_spline
) or PCA (denoise_pca
).
When denoise="cubic.spline"
is chosen, individual degrees of freedom can be chosen by cross validation by setting denoise.parameter = "cv"
. If there should only one tuning parameter for all variables, set denoise.parameter = "cv.global"
.
cluster variables (proteins, genes, etc) using cluster.method
based on dist.method
and K
clusters.
When using dynamic time wrapping (DTW) dist.method = "dtw"
, hierachical clustering is applied.
K-means clustering (dist.method = "kmeans"
) does not return a distance matrix.
evaluate the cluster memberships of variables (e.g., proteins or genes) by the jackstraw tests. The jackstraw returns p-values and posterior probabilities that variables should be included in their given clusters.
This work is motivated by identifying reliable molecular signatures from time-series proteomics data of optm occupancies in the cardiovascular mouse model (see Wang et al. (2018))
Last but not least, modeling and classifying high-dimensional temporal data is notoriously challenging. This package aim to provide an analysis pipeline that is relatively robust and non-parametric, while accounting for typical -omic study involving complex phenotypes. For further implementations of related methods, see TSclust
and TSdist
.
CV.Signature.TCP
returns a list consisting of
denoised |
|
dat.dist |
|
cluster.obj |
an object returned from clustering the denoised data. |
membership |
a vector of length |
evaluated |
an object returned from applying the jackstraw tests for clusters. |
Neo Christopher Chung nchchung@gmail.com
Identifying temporal molecular signatures underlying cardiovascular diseases. In preparation.
J Wang, H Choi, NC Chung, Q Cao, DCM Ng, B Mirza, SB Scruggs, D Wang, AO Garlid, P Ping (2018). Integrated dissection of the cysteine oxidative post-translational modification proteome during cardiac hypertrophy. Journal of Proteome Research.
NC Chung (2020). Statistical significance of cluster membership for unsupervised evaluation of single cell identities. Bioinformatics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | ## Not run:
data(cys_optm)
meta <- cys_optm[,1:4]
optm <- log(cys_optm[meta$Select,5:10])
optm <- t(scale(t(optm), scale=TRUE, center=TRUE))
days <- as.numeric(colnames(optm))
output <- CV.Signature.TCP(optm,
timepoints = days,
center.dat = TRUE,
scale.dat = TRUE,
denoise = c("smooth.spline"),
denoise.parameter=c("cv"),
dist.method = "cor.diss",
cluster.method = c("kmeans"),
K = 5,
evaluate = TRUE,
verbose = TRUE,
seed = 1
)
# see the elbow plot
cluster.elbow(dat=output$denoised, FUNcluster=kmeans, method="wss", k.max=10, linecolor="black")
# make the cluster figure
optm.fig <- vis_cluster(output$denoised, group=output$membership)
# to modify/polish the figure (ggplot2 object)
optm.fig <- optm.fig + labs(y="Log-transformed Occupancy Ratio", x="Time (day)", title="All O-PTMs") + ylim(-2,2) + facet_wrap(~ cluster,nrow=1,ncol=6)
# filter the data based on jackstraw PIP and make a figure
library(jackstraw)
optm.pip <- pip(output$evaluated$p.F, pi0=sum(output$evaluated$p.F > .05)/length(output$evaluated$p.F))
hist(optm.pip,100,col="black")
optm.pip.fig <- vis_cluster(output$denoised[optm.pip > .9,], group=output$membership[optm.pip > .9])
optm.pip.fig <- optm.pip.fig + labs(y="Log-transformed Occupancy Ratio", x="Time (day)", title="O-PTMs with PIP > 0.9") + ylim(-2,2) + facet_wrap(~ cluster,nrow=1,ncol=6)
library(cowplot)
optm.fig / optm.pip.fig
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.