Description Usage Arguments Details Value References See Also Examples
Performs replications of sPLSDA on random subsamplings of the data
1 2 3 |
X |
Input matrix of dimension n * p; each row is an observation vector. |
Y |
Factor with at least q>2 levels. |
near.zero.var |
Logical. If TRUE, a pre-screening step is performed to remove predictors with near-zero variance. See |
many |
How many replications of the sPLS-DA analysis are to be done? |
ncomp |
How many component are to be included in the sPLS-DA analysis? |
dist |
Indicates the distance that is used to classify the samples. One of "max.dist", "centroids.dist", "mahalanobis.dist". Default is "max.dist" |
save.file |
If the outputs are to be saved, this argument allows you to do it at the end of each replication. A full path is expected. Convenient if you run this function on a cluster and it is killed before completion, e.g. due to a too short requested time. |
ratio |
Number between 0 and 1. It is the proportion of the n samples that are put aside and considered as an internal testing set. The (1-ratio)*n samples are used as a training set and the |
kCV |
Number of fold for the cross validation. Default is 10. |
grid |
A vector of value for the tuning of the |
cpus |
Number of cpus to use when running the code in parallel. |
nrepeat |
Number of times the Cross-Validation process is repeated for each of the |
showProgress |
Logical. If TRUE, shows the progress of the algorithm. It also gives a list of which variables are selected on each component. |
Performs replication of tune.splsda
on random subsamplings of the data and record which variables are selected on which subsamplings. It also gives a confusion matrix for each component and for each subsamplings.
A 'bootsPLS' object is returned for which plot
, fit.model
and prediction
are available.
ClassifResult |
A 4-dimensional array. The two first dimensions consists in the confusion matrix. The third dimension is relative to the number of components |
loadings.X |
A 3-dimensional array. Loadings vector of X, for each component and each replication. |
selection.variable |
A 3-dimensional array. Gives the selected variables for each component and each replication. It is obtained by replacing each non zero value in |
frequency |
A matrix of size ncomp*p. Gives the frequency of selection for each variable on each component. It is obtained as a mean over the third dimension of |
nbr.var |
Matrix of size many*ncomp. Gives the number of variables that have been selected on each component for each replication. |
learning.sample |
Matrix of size n*many. Gives the samples that have been used in the internal training set over the |
prediction |
A 3-dimensional array of size n*many*ncomp. Gives the prediction for the chosen |
data |
A list of the input data X, Y and of the distance used to classify the sample ("max.dist", "centroids.dist" or "mahalanobis.dist"). |
Rohart et al. (2016). A Molecular Classification of Human Mesenchymal Stromal Cells. PeerJ, DOI 10.7717/peerj.1845
splsda
, plot.bootsPLS
, fit.model
, prediction
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ## Not run:
data(MSC)
X=MSC$X
Y=MSC$Y
dim(X)
table(Y)
boot=bootsPLS(X=X,Y=Y,ncomp=3,many=5,kCV=5)
# saving the outputs in a Rdata file, the file is saved after each iteration
# if used on a cluster, you can use the `cpus' argument as well
save.file=paste(getwd(),"/MSC.",Sys.getpid(),".Rdata",sep="")
boot=bootsPLS(X=X,Y=Y,ncomp=3,many=5,kCV=5,save.file=save.file)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.