knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Please install the R package prior to use asmbPLS.
library(asmbPLS) ## load the R package set.seed(123) ## set seed to generate the same results
If you want to see the list of functions in asmbPLS, please use ?asmbPLS
.
If you want to see the detailed description for each function, please use ?function
, for example ?asmbPLS.cv
.
data(asmbPLS.example) ## load the example data set for asmbPLS
8 components are included in the asmbPLS.example
:
1) X.matrix
, a matrix with 100 samples (rows) and 400 features (columns, 1-200 are microbial taxa, 201-400 are metabolites);
2) X.matrix.new
, a matrix to be predicted with 100 samples (rows) and 400 features (columns, 1-200 are microbial taxa, 201-400 are metabolites);
3) Y.matrix
, a matrix with 100 samples (rows) and 1 column (log-transformed survival time);
4) X.dim
, dimension of the two blocks in X.matrix
;
5) PLS.comp
, selected number of PLS components;
6) quantile.comb
, selected quantile combinations;
7) quantile.comb.table.cv
, pre-defined quantile combinations for cross validation;
8) Y.indicator
, a vector containing the event indicator for each sample.
Pre-processing has been applied to the two different types of data in X.matrix
.
Different types of omics data require specific pre-processing steps tailored to their unique characteristics.
## show the first 5 microbial taxa and the first 5 metabolites for the first 5 samples. asmbPLS.example$X.matrix[1:5, c(1:5, 201:205)] ## show the outcome for the first 5 samples. asmbPLS.example$Y.matrix[1:5,]
The 5-fold CV with 5 repetitions is implemented to help find the best quantile combination for each PLS component as well as the optimal number of PLS components.
X.matrix = asmbPLS.example$X.matrix X.matrix.new = asmbPLS.example$X.matrix.new Y.matrix = asmbPLS.example$Y.matrix PLS.comp = asmbPLS.example$PLS.comp X.dim = asmbPLS.example$X.dim quantile.comb.table.cv = asmbPLS.example$quantile.comb.table.cv Y.indicator = asmbPLS.example$Y.indicator ## cv to find the best quantile combinations for model fitting cv.results <- asmbPLS.cv(X.matrix = X.matrix, Y.matrix = Y.matrix, PLS.comp = PLS.comp, X.dim = X.dim, quantile.comb.table = quantile.comb.table.cv, Y.indicator = Y.indicator, k = 5, ncv = 5) ## obtain the best quantile combination for each PLS component quantile.comb <- cv.results$quantile_table_CV[,1:length(X.dim)] ## obtain the optimal number of PLS components n.PLS <- cv.results$optimal_nPLS
The selected quantile combination for each PLS component and the optimal number of PLS components can be used as input for the asmbPLS.fit
function to fit the final model.
asmbPLS.results <- asmbPLS.fit(X.matrix = X.matrix, Y.matrix = Y.matrix, PLS.comp = n.PLS, X.dim = X.dim, quantile.comb = quantile.comb)
Once the model is fitted, you can use the model to do the prediction using the new data set (X.matrix.new
).
Y.pred <- asmbPLS.predict(asmbPLS.results, X.matrix.new, n.PLS) head(Y.pred$Y_pred)
Also, you can do the prediction using the original data set X.matrix
to check the model fit.
## prediction for original data to check the data fit Y.fit <- asmbPLS.predict(asmbPLS.results, X.matrix, n.PLS) check.fit <- cbind(Y.matrix, Y.fit$Y_pred) head(check.fit)
data(asmbPLSDA.example) ## load the example data set for asmbPLS-DA
8 components are included in the asmbPLSDA.example
:
1) X.matrix
, a matrix with 100 samples (rows) and 400 features, features 1-200 are from block 1 and features 201-400 are from block 2;
2) X.matrix.new
, a matrix to be predicted with 100 samples (rows) and 400 features, features 1-200 are from block 1 and features 201-400 are from block 2;
3) Y.matrix.binary
, a matrix with 100 samples (rows) and 1 column;
4) Y.matrix.morethan2levels
, a matrix with 100 samples (rows) and 3 columns (3 levels);
5) X.dim
, dimension of the two blocks in X.matrix
;
6) PLS.comp
, selected number of PLS components;
7) quantile.comb
, selected quantile combinations;
8) quantile.comb.table.cv
, pre-defined quantile combinations for cross validation.
## show the first 5 features from block 1 and the first 5 features from block 2 for the first 5 samples. asmbPLSDA.example$X.matrix[1:5, c(1:5, 201:205)] ## show the binary outcome for the first 5 samples. asmbPLSDA.example$Y.matrix.binary[1:5,] ## show the multiclass outcome for the first 5 samples. asmbPLSDA.example$Y.matrix.morethan2levels[1:5,]
In the example data set, we include both binary outcome and multiclass outcome.
Similarly, the 5-fold CV with 5 repetitions is implemented to help find the best quantile combination for each PLS component as well as the optimal number of PLS components.
You can use different decision rules (method
in the function, the default is fixed_cutoff
for the binary outcome and Max_Y
for the multiclass outcome) and different measure
(The default is balanced accuracy B_accuracy
) for the CV.
Also, note that you need to set different outcome.type
for different types of outcomes.
Extract the components from the example data list:
X.matrix = asmbPLSDA.example$X.matrix X.matrix.new = asmbPLSDA.example$X.matrix.new Y.matrix.binary = asmbPLSDA.example$Y.matrix.binary Y.matrix.multiclass = asmbPLSDA.example$Y.matrix.morethan2levels X.dim = asmbPLSDA.example$X.dim PLS.comp = asmbPLSDA.example$PLS.comp quantile.comb.table.cv = asmbPLSDA.example$quantile.comb.table.cv
CV for the binary outcome:
## cv to find the best quantile combinations for model fitting (binary outcome) cv.results.binary <- asmbPLSDA.cv(X.matrix = X.matrix, Y.matrix = Y.matrix.binary, PLS.comp = PLS.comp, X.dim = X.dim, quantile.comb.table = quantile.comb.table.cv, outcome.type = "binary", k = 5, ncv = 5) quantile.comb.binary <- cv.results.binary$quantile_table_CV[,1:length(X.dim)] n.PLS.binary <- cv.results.binary$optimal_nPLS
CV for the multiclass outcome:
## cv to find the best quantile combinations for model fitting ## (categorical outcome with more than 2 levels) cv.results.multiclass <- asmbPLSDA.cv(X.matrix = X.matrix, Y.matrix = Y.matrix.multiclass, PLS.comp = PLS.comp, X.dim = X.dim, quantile.comb.table = quantile.comb.table.cv, outcome.type = "multiclass", k = 5, ncv = 5) quantile.comb.multiclass <- cv.results.multiclass$quantile_table_CV[,1:length(X.dim)] n.PLS.multiclass <- cv.results.multiclass$optimal_nPLS
asmbPLSDA.fit
function is used to fit the final model for both the binary and multiclass outcome.
Model fit for the binary outcome:
## asmbPLSDA fit using the selected quantile combination (binary outcome) asmbPLSDA.fit.binary <- asmbPLSDA.fit(X.matrix = X.matrix, Y.matrix = Y.matrix.binary, PLS.comp = n.PLS.binary, X.dim = X.dim, quantile.comb = quantile.comb.binary, outcome.type = "binary")
Model fit for the multiclass outcome:
## asmbPLSDA fit (categorical outcome with more than 2 levels) asmbPLSDA.fit.multiclass <- asmbPLSDA.fit(X.matrix = X.matrix, Y.matrix = Y.matrix.multiclass, PLS.comp = n.PLS.multiclass, X.dim = X.dim, quantile.comb = quantile.comb.multiclass, outcome.type = "multiclass")
asmbPLSDA.predict
function is used to classify the sample group for the new sample.
## classification for the new data based on the asmbPLS-DA model with the binary outcome. Y.pred.binary <- asmbPLSDA.predict(asmbPLSDA.fit.binary, X.matrix.new, PLS.comp = n.PLS.binary) ## classification for the new data based on the asmbPLS-DA model with the multiclass outcome. Y.pred.multiclass <- asmbPLSDA.predict(asmbPLSDA.fit.multiclass, X.matrix.new, PLS.comp = n.PLS.multiclass)
When we have multiple models using different decision rules, we can use the vote functions to combine the classification results.
For example, for the binary outcome, we have already built the asmbPLS-DA model with fixed cutoff as our decision rule. We want to build two more models with different decision rules Euclidean_distance_X
and Mahalanobis_distance_X
and then combine the results using the vote function.
cv.results.cutoff <- cv.results.binary quantile.comb.cutoff <- cv.results.cutoff$quantile_table_CV ## Cross validation using Euclidean distance of X super score cv.results.EDX <- asmbPLSDA.cv(X.matrix = X.matrix, Y.matrix = Y.matrix.binary, PLS.comp = PLS.comp, X.dim = X.dim, quantile.comb.table = quantile.comb.table.cv, outcome.type = "binary", method = "Euclidean_distance_X", k = 5, ncv = 5) quantile.comb.EDX <- cv.results.EDX$quantile_table_CV ## Cross validation using Mahalanobis distance of X super score cv.results.MDX <- asmbPLSDA.cv(X.matrix = X.matrix, Y.matrix = Y.matrix.binary, PLS.comp = PLS.comp, X.dim = X.dim, quantile.comb.table = quantile.comb.table.cv, outcome.type = "binary", method = "Mahalanobis_distance_X", k = 5, ncv = 5) quantile.comb.MDX <- cv.results.MDX$quantile_table_CV
Put selected quantile combination with corresponding measure from different models in one list:
#### vote list #### cv.results.list = list(fixed_cutoff = quantile.comb.cutoff, Euclidean_distance_X = quantile.comb.EDX, Mahalanobis_distance_X = quantile.comb.MDX)
Use asmbPLSDA.vote.fit
function to fit the vote model, the order of nPLS
should correspond to the order of different decision rules in cv.results.list
.
Also, you can try with different vote function, the default is method = "weighted"
.
vote.fit <- asmbPLSDA.vote.fit(X.matrix = X.matrix, Y.matrix = Y.matrix.binary, X.dim = X.dim, nPLS = c(cv.results.cutoff$optimal_nPLS, cv.results.EDX$optimal_nPLS, cv.results.MDX$optimal_nPLS), cv.results.list = cv.results.list, outcome.type = "binary", method = "weighted")
Final classification using the vote function:
## classification vote.predict <- asmbPLSDA.vote.predict(vote.fit, X.matrix.new) head(vote.predict)
You can use function plotCor
to visualize correlations between PLS components from different blocks using the model fitted by the function asmbPLSDA.fit
. For here, we use the first block score from each block to make the plot.
block.name
should be a vector containing the named character for each block. It must be ordered and match each block.
group.name
should be a vector containing the named character for each sample group. For binary outcome, first group name matchs Y.matrix = 0, second group name matchs Y.matrix = 1. For multiclass outcome, ith group name matches ith column of Y.matrix = 1.
## custom block.name and group.name plotCor(asmbPLSDA.fit.binary, ncomp = 1, block.name = c("mRNA", "protein"), group.name = c("control", "case"))
You can use function plotPLS
to visualize cluster of samples using super score of different PLS components. It can only be used for the output of asmbPLSDA.fit
function.
## custom block.name and group.name plotPLS(asmbPLSDA.fit.binary, comp.X = 1, comp.Y = 2, group.name = c("control", "case"))
You can use function plotRelevance
to visualize the most relevant features (relevant to the outcome) in each block. Both the fitted asmbPLS and asmbPLS-DA models can be used as input.
plotRelevance(asmbPLSDA.fit.binary, n.top = 5, block.name = c("mRNA", "protein"))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.