View source: R/d.spls.calval.R
d.spls.calval | R Documentation |
The function d.spls.calval
divides the data X
into a calibration and a validation.
It uses a variation on the Kennard and Stone strategy CalValXy by dividing observations into groups (see details for more explanations).
d.spls.calval(X,pcal=NULL,Datatype=NULL,y=NULL,ncells=10,Listecal=NULL,
center=TRUE,method="euclidean",pc=0.9)
X |
a numeric matrix of predictors values. |
pcal |
a positive integer between 0 and 100. |
Datatype |
A vector of index specifying each observation belonging to which group index.
Default value is |
y |
a numeric vector of responses. Default value is |
ncells |
a positive integer. |
Listecal |
a numeric vector specifying how many observations from each group should be selected as calibration.
Default value is |
center |
logical value indicating wether the matrix |
method |
the method and norm used for the distance computation. It is by default equal to "euclidean" which means
original |
pc |
a positive real value indicating the number of component to consider when applying the SVD transformation or the PCA.
If |
The algorithm allows to select samples using the classical Kennard and Stone on
each group of observations one by one. It starts by selecting the point that is the furthest away from the centroid.
This point is assigned as the calibration set and is removed from the list of candidates. Then, it identifies to which
group belongs this first observation and considers the group g
that comes after.
It computes the distance \delta_{P_{i,g}}
between the remaining points
P_{i,g}
belonging to the group the group g
and the calibration point assigned. The point with the
largest \delta_{P_{i,g}}
is selected and removed from the set then the procedure moves on to the group that comes
after.
When there is more than one calibration sample, the procedure computes the distance between each P_{i,g}
from
the concerned group and each P_{i,cal}
from the calibration set. The minimal distance for each P_{i,g}
is noted distmin(P_{i,g})
. The selected final candidate verifies the following equation:
P_{selected}=\{ P_{i,g} | max(distmin(P_{i,g}))\}
Once each of the vector Listecal
elements are null; the procedure is done.
The algorithm for only one group corresponds to the classical Kennard and Stone algorithm.
If Datatype
is not specified, the function devides the observations into ncells
groups. First, the observations
are sorted according to the values of y
. Second, the observations is divided into equal ncells
according to the
cumulative empirical probabilities.
Finally, each observation with a value of y
belonging to a sub interval is assigned the number of the corresponding cell.
A list
of the following attributes
indcal |
a numeric vector giving the row indices of the input data selected for calibration. |
indval |
a numeric vector giving the row indices of the remaining observations. |
Louna Alsouki François Wahl
Kennard, Ronald W, and Larry A Stone. 1969. “Computer Aided Design of Experiments.” Technometrics 11 (1): 137–48.
d.spls.split,d.spls.type,d.spls.listecal
### load dual.spls library
library(dual.spls)
### parameters
n <- 100
p <- 50
nondes <- 20
sigmaondes <- 0.5
data=d.spls.simulate(n=n,p=p,nondes=nondes,sigmaondes=sigmaondes)
X <- data$X
y <- data$y
###calibration parameters for split1
pcal <- 70
ncells <- 3
split1 <- d.spls.calval(X=X,pcal=pcal,y=y,ncells=ncells)
###plotting split1
plot(X[split1$indcal,1],X[split1$indcal,2],xlab="Variable 1",
ylab="Variable 2",pch=19,col="red",main="Calibration and validation split1")
points(X[split1$indval,1],X[split1$indval,2],pch=19,col="green")
legend("topright", legend = c("Calibration points", "Validation points"),
cex = 0.8, col = c("red","green"), pch = c(19,19))
###calibration parameters for split2
ncells <- 3
dimtype=floor(n/3)
# type of observations
Datatype <- c(rep(1,dimtype),rep(2,dimtype),rep(3,(n-dimtype*2)))
# how many observations of each type are to be selected in the calibration set
L1=floor(0.7*length(which(Datatype==1)))
L2=floor(0.8*length(which(Datatype==2)))
L3=floor(0.6*length(which(Datatype==3)))
Listecal <- c(L1,L2,L3)
split2 <- d.spls.calval(X=X,y=y,Datatype=Datatype,Listecal=Listecal)
###plotting split2
plot(X[split2$indcal,1],X[split2$indcal,2],xlab="Variable 1",
ylab="Variable 2",pch=19,col="red",main="Calibration and validation split2")
points(X[split2$indval,1],X[split2$indval,2],pch=19,col="green")
legend("topright", legend = c("Calibration points", "Validation points"),
cex = 0.8, col = c("red","green"), pch = c(19,19))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.