View source: R/d.spls.simulate.R
d.spls.simulate | R Documentation |
The function d.spls.simulate
simulates G
mixtures of nondes
Gaussians from which it builds
a data set of predictors X
and response y
in a way that X
can be divided into G
groups and
the values of y
depend on the values of X
.
d.spls.simulate(n=200,p=100,nondes=50,sigmaondes=0.05,sigmay=0.5,int.coef=1:5)
n |
a positive integer. |
p |
a numeric vector of length |
nondes |
a numeric vector of length |
sigmaondes |
a numeric vector of length |
sigmay |
a real value. |
int.coef |
a numeric vector of the coefficients of the linear combination in the construction of the response
vector |
The predictors matrix X
is a concatenations of G
predictors sub matrices. Each is computed using
a mixture of Gaussian i.e. summing the following Gaussians:
A \exp{(-\frac{(\textrm{xech}-\mu)^2}{2 \sigma^2})}.
Where
A
is a numeric vector of random values between 0 and 1,
xech is an element from the sequence of p(g)
equally spaced values from 0 to 1. p(g)
is the number
of variables of the sub matrix g
, for g \in \{1, \dots, G\}
,
\mu
is a random value in [0,1]
representing the mean of the Gaussians,
\sigma
is a positive real value specified by the user and representing the standard
deviation of the Gaussians.
The response vector y
is a linear combination of the predictors to which we add a noise of uncertainty sigmay
. It is computed as follows:
y_i= \sigma_y \times V_i +\sum_{g=1}^G \sum_{k=1}^K \textrm{int.coeff}_k \times \textrm{sum}X^{g}_{ik}
Where
G
is the number of predictor sub matrices,
i
is the index of the observation,
V
is a normally distributed vector of 0 mean and unitary standard deviation,
K
is the length of the vector int.coeff
,
\textrm{sum}X^{g}
is a matrix of n
rows and K
columns.
The values of the column k
are the sum of selected parts of each row of the sub matrix X^g
. The columns of X^g
are
separated equally and each part is used for the K
columns of \textrm{sum}X^{g}
.
A list
of the following attributes
X |
the concatenated predictors matrix. |
y |
the response vector. |
y0 |
the response vector without noise |
sigmay |
the uncertainty on |
sigmaondes |
the standard deviation of the Gaussians. |
G |
the number of groups. |
Louna Alsouki François Wahl
### load dual.spls library
library(dual.spls)
####one predictors matrix
### parameters
n <- 100
p <- 50
nondes <- 20
sigmaondes <- 0.5
data1=d.spls.simulate(n=n,p=p,nondes=nondes,sigmaondes=sigmaondes)
Xa <- data1$X
ya <- data1$y
###plotting the data
plot(Xa[1,],type='l',ylim=c(0,max(Xa)),main='Data', ylab='Xa',col=1)
for (i in 2:n){ lines(Xa[i,],col=i) }
####two predictors matrix
### parameters
n <- 100
p <- c(50,100)
nondes <- c(20,30)
sigmaondes <- c(0.05,0.02)
data2=d.spls.simulate(n=n,p=p,nondes=nondes,sigmaondes=sigmaondes)
Xb <- data2$X
X1 <- Xb[,(1:p[1])]
X2 <- Xb[,(p[1]+1):(p[1]+p[2])]
yb <- data2$y
###plotting the data
plot(Xb[1,],type='l',ylim=c(0,max(Xb)),main='Data', ylab='Xb',col=1)
for (i in 2:n){ lines(Xb[i,],col=i) }
###plotting the data
plot(X1[1,],type='l',ylim=c(0,max(X1)),main='Data X1', ylab='X1',col=1)
for (i in 2:n){ lines(X1[i,],col=i) }
###plotting the data
plot(X2[1,],type='l',ylim=c(0,max(X2)),main='Data X2', ylab='X2',col=1)
for (i in 2:n){ lines(X2[i,],col=i) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.