Description Usage Arguments Value Author(s) References Examples
PCA, PLS, and OPLS regression, classification, and cross-validation with the NIPALS algorithm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | ## S4 method for signature 'MultiDataSet'
opls(
x,
y = NULL,
fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
info.txtC = c("none", "interactive", "myfile.txt")[2],
...
)
## S4 method for signature 'ExpressionSet'
opls(x, y = NULL, ...)
## S4 method for signature 'data.frame'
opls(x, ...)
## S4 method for signature 'matrix'
opls(
x,
y = NULL,
predI = NA,
orthoI = 0,
algoC = c("default", "nipals", "svd")[1],
crossvalI = 7,
log10L = FALSE,
permI = 20,
scaleC = c("none", "center", "pareto", "standard")[4],
subset = NULL,
plotSubC = NA,
fig.pdfC = c("none", "interactive", "myfile.pdf")[2],
info.txtC = c("none", "interactive", "myfile.txt")[2],
printL = TRUE,
plotL = TRUE,
.sinkC = NULL,
...
)
|
x |
Numerical data frame or matrix (observations x variables; NAs are allowed); or ExpressionSet object with non empty exprs, for PCA, and phenoData@data, for (O)PLS(-DA), slots |
y |
Response to be modelled: Either 1) 'NULL' for PCA (default) or 2) a numerical vector (same length as 'x' row number) for single response (O)PLS, or 3) a numerical matrix (same row number as 'x') for multiple response PLS, 4) a factor (same length as 'x' row number) for (O)PLS-DA, or 5) a character indicating the name of the column of the phenoData@data to be used, when x is an ExpressionSet object. Note that, for convenience, character vectors are also accepted for (O)PLS-DA as well as single column numerical (resp. character) matrix for (O)PLS (respectively (O)PLS-DA). NAs are allowed in numeric responses. |
fig.pdfC |
Character: File name with '.pdf' extension for the figure; if 'interactive' (default), figures will be displayed interactively; if 'none', no figure will be generated |
info.txtC |
Character: File name with '.txt' extension for the printed results (call to sink()'); if 'interactive' (default), messages will be printed on the screen; if 'none', no verbose will be generated |
... |
Currently not used. |
predI |
Integer: number of components (predictive componenents in case of PLS and OPLS) to extract; for OPLS, predI is (automatically) set to 1; if set to NA [default], autofit is performed: a maximum of 10 components are extracted until (i) PCA case: the variance is less than the mean variance of all components (note that this rule requires all components to be computed and can be quite time-consuming for large datasets) or (ii) PLS case: either R2Y of the component is < 0.01 (N4 rule) or Q2Y is < 0 (for more than 100 observations) or 0.05 otherwise (R1 rule) |
orthoI |
Integer: number of orthogonal components (for OPLS only); when set to 0 [default], PLS will be performed; otherwise OPLS will be peformed; when set to NA, OPLS is performed and the number of orthogonal components is automatically computed by using the cross-validation (with a maximum of 9 orthogonal components). |
algoC |
Default algorithm is 'svd' for PCA (in case of no missing values in 'x'; 'nipals' otherwise) and 'nipals' for PLS and OPLS; when asking to use 'svd' for PCA on an 'x' matrix containing missing values, NAs are set to half the minimum of non-missing values and a warning is generated |
crossvalI |
Integer: number of cross-validation segments (default is 7); The number of samples (rows of 'x') must be at least >= crossvalI |
log10L |
Should the 'x' matrix be log10 transformed? Zeros are set to 1 prior to transformation |
permI |
Integer: number of random permutations of response labels to estimate R2Y and Q2Y significance by permutation testing [default is 20 for single response models (without train/test partition), and 0 otherwise] |
scaleC |
Character: either no centering nor scaling ('none'), mean-centering only ('center'), mean-centering and pareto scaling ('pareto'), or mean-centering and unit variance scaling ('standard') [default] |
subset |
Integer vector: indices of the observations to be used for training (in a classification scheme); use NULL [default] for no partition of the dataset; use 'odd' for a partition of the dataset in two equal sizes (with respect to the classes proportions) |
plotSubC |
Character: Graphic subtitle |
printL |
Logical: deprecated: use the 'info.txtC' argument instead |
plotL |
Logical: deprecated: use the 'fig.pdfC' argument instead |
.sinkC |
Character: deprecated: use the 'info.txtC' argument instead |
An S4 object of class 'opls' containing the following slots:
typeC Character: model type (PCA, PLS, PLS-DA, OPLS, or OPLS-DA)
descriptionMC Character matrix: Description of the data set (number of samples, variables, etc.)
modelDF Data frame with the model overview (number of components, R2X, R2X(cum), R2Y, R2Y(cum), Q2, Q2(cum), significance, iterations)
summaryDF Data frame with the model summary (cumulated R2X, R2Y and Q2); RMSEE is the square root of the mean error between the actual and the predicted responses
subsetVi Integer vector: Indices of observations in the training data set
pcaVarVn PCA: Numerical vector of variances of length: predI
vipVn PLS(-DA): Numerical vector of Variable Importance in Projection; OPLS(-DA): Numerical vector of Variable Importance for Prediction (VIP4,p from Galindo-Prieto et al, 2014)
orthoVipVn OPLS(-DA): Numerical vector of Variable Importance for Orthogonal Modeling (VIP4,o from Galindo-Prieto et al, 2014)
xMeanVn Numerical vector: variable means of the 'x' matrix
xSdVn Numerical vector: variable standard deviations of the 'x' matrix
yMeanVn (O)PLS: Numerical vector: variable means of the 'y' response (transformed into a dummy matrix in case it is of 'character' mode initially)
ySdVn (O)PLS: Numerical vector: variable standard deviations of the 'y' response (transformed into a dummy matrix in case it is of 'character' mode initially)
xZeroVarVi Numerical vector: indices of variables with variance < 2.22e-16 which were excluded from 'x' before building the model
scoreMN Numerical matrix of x scores (T; dimensions: nrow(x) x predI) X = TP' + E; Y = TC' + F
loadingMN Numerical matrix of x loadings (P; dimensions: ncol(x) x predI) X = TP' + E
weightMN (O)PLS: Numerical matrix of x weights (W; same dimensions as loadingMN)
orthoScoreMN OPLS: Numerical matrix of orthogonal scores (Tortho; dimensions: nrow(x) x number of orthogonal components)
orthoLoadingMN OPLS: Numerical matrix of orthogonal loadings (Portho; dimensions: ncol(x) x number of orthogonal components)
orthoWeightMN OPLS: Numerical matrix of orthogonal weights (same dimensions as orthoLoadingMN)
cMN (O)PLS: Numerical matrix of Y weights (C; dimensions: number of responses or number of classes in case of qualitative response) x number of predictive components; Y = TC' + F
coMN) (O)PLS: Numerical matrix of Y orthogonal weights; dimensions: number of responses or number of classes in case of qualitative response with more than 2 classes x number of orthogonal components
uMN (O)PLS: Numerical matrix of Y scores (U; same dimensions as scoreMN); Y = UC' + G
weightStarMN Numerical matrix of projections (W*; same dimensions as loadingMN); whereas columns of weightMN are derived from successively deflated matrices, columns of weightStarMN relate to the original 'x' matrix: T = XW*; W*=W(P'W)inv
suppLs List of additional objects to be used internally by the 'print', 'plot', and 'predict' methods
Etienne Thevenot, etienne.thevenot@cea.fr
Eriksson et al. (2006). Multi- and Megarvariate Data Analysis. Umetrics Academy. Rosipal and Kramer (2006). Overview and recent advances in partial least squares Tenenhaus (1990). La regression PLS : theorie et pratique. Technip. Wehrens (2011). Chemometrics with R. Springer. Wold et al. (2001). PLS-regression: a basic tool of chemometrics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | ## PCA
data(foods) ## see Eriksson et al. (2001); presence of 3 missing values (NA)
head(foods)
foodMN <- as.matrix(foods[, colnames(foods) != "Country"])
rownames(foodMN) <- foods[, "Country"]
head(foodMN)
foo.pca <- opls(foodMN)
## PLS with a single response
data(cornell) ## see Tenenhaus, 1998
head(cornell)
cornell.pls <- opls(as.matrix(cornell[, grep("x", colnames(cornell))]),
cornell[, "y"])
## Complementary graphics
plot(cornell.pls, typeVc = c("outlier", "predict-train", "xy-score", "xy-weight"))
#### PLS with multiple (quantitative) responses
data(lowarp) ## see Eriksson et al. (2001); presence of NAs
head(lowarp)
lowarp.pls <- opls(as.matrix(lowarp[, c("glas", "crtp", "mica", "amtp")]),
as.matrix(lowarp[, grepl("^wrp", colnames(lowarp)) |
grepl("^st", colnames(lowarp))]))
## PLS-DA
data(sacurine)
attach(sacurine)
sacurine.plsda <- opls(dataMatrix, sampleMetadata[, "gender"])
## OPLS-DA
sacurine.oplsda <- opls(dataMatrix, sampleMetadata[, "gender"], predI = 1, orthoI = NA)
## Application to an ExpressionSet
sacSet <- Biobase::ExpressionSet(assayData = t(dataMatrix),
phenoData = new("AnnotatedDataFrame",
data = sampleMetadata),
featureData = new("AnnotatedDataFrame",
data = variableMetadata),
experimentData = new("MIAME",
title = "sacurine"))
sacPlsda <- opls(sacSet, "gender")
sacSet <- getEset(sacPlsda)
head(Biobase::pData(sacSet))
head(Biobase::fData(sacSet))
detach(sacurine)
## Application to a MultiDataSet
# Loading the 'NCI60_4arrays' from the 'omicade4' package
data("NCI60_4arrays", package = "omicade4")
# Selecting two of the four datasets
setNamesVc <- c("agilent", "hgu95")
# Creating the MultiDataSet instance
nciMset <- MultiDataSet::createMultiDataSet()
# Adding the two datasets as ExpressionSet instances
for (setC in setNamesVc) {
# Getting the data
exprMN <- as.matrix(NCI60_4arrays[[setC]])
pdataDF <- data.frame(row.names = colnames(exprMN),
cancer = substr(colnames(exprMN), 1, 2),
stringsAsFactors = FALSE)
fdataDF <- data.frame(row.names = rownames(exprMN),
name = rownames(exprMN),
stringsAsFactors = FALSE)
# Building the ExpressionSet
eset <- Biobase::ExpressionSet(assayData = exprMN,
phenoData = new("AnnotatedDataFrame",
data = pdataDF),
featureData = new("AnnotatedDataFrame",
data = fdataDF),
experimentData = new("MIAME",
title = setC))
# Adding to the MultiDataSet
nciMset <- MultiDataSet::add_eset(nciMset, eset, dataset.type = setC,
GRanges = NA, warnings = FALSE)
}
# Summary of the MultiDataSet
nciMset
# Principal Component Analysis of each data set
nciPca <- ropls::opls(nciMset)
# Coloring the Score plot according to cancer types
ropls::plot(nciPca, y = "cancer", typeVc = "x-score")
# Getting the updated MultiDataSet (now including scores and loadings)
nciMset <- ropls::getMset(nciPca)
# Restricting to the 'ME' and 'LE' cancer types
sampleNamesVc <- Biobase::sampleNames(nciMset[["agilent"]])
cancerTypeVc <- Biobase::pData(nciMset[["agilent"]])[, "cancer"]
nciMset <- nciMset[sampleNamesVc[cancerTypeVc %in% c("ME", "LE")], ]
# Building PLS-DA models for the cancer type, and getting back the updated MultiDataSet
nciPlsda <- ropls::opls(nciMset, "cancer", predI = 2)
nciMset <- ropls::getMset(nciPlsda)
# Viewing the new variable metadata (including VIP and coefficients)
lapply(Biobase::fData(nciMset), head)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.