d.spls.simulate: Simulation of a data

View source: R/d.spls.simulate.R

d.spls.simulateR Documentation

Simulation of a data

Description

The function d.spls.simulate simulates G mixtures of nondes Gaussians from which it builds a data set of predictors X and response y in a way that X can be divided into G groups and the values of y depend on the values of X.

Usage

d.spls.simulate(n=200,p=100,nondes=50,sigmaondes=0.05,sigmay=0.5,int.coef=1:5)

Arguments

n

a positive integer. n is the number of observations. Default value is 200.

p

a numeric vector of length G representing the number of variables. Default value is 100.

nondes

a numeric vector of length G. nondes is the number of Guassians in each mixture. Default value is 50.

sigmaondes

a numeric vector of length G. sigmaondes is the standard deviation of the Gaussians for each group g. Default value is 0.05.

sigmay

a real value. sigmay is the uncertainty on y. Default value is 0.5.

int.coef

a numeric vector of the coefficients of the linear combination in the construction of the response vector y.

Details

The predictors matrix X is a concatenations of G predictors sub matrices. Each is computed using a mixture of Gaussian i.e. summing the following Gaussians:

A \exp{(-\frac{(\textrm{xech}-\mu)^2}{2 \sigma^2})}.

Where

  • A is a numeric vector of random values between 0 and 1,

  • xech is an element from the sequence of p(g) equally spaced values from 0 to 1. p(g) is the number of variables of the sub matrix g, for g \in \{1, \dots, G\},

  • \mu is a random value in [0,1] representing the mean of the Gaussians,

  • \sigma is a positive real value specified by the user and representing the standard deviation of the Gaussians.

The response vector y is a linear combination of the predictors to which we add a noise of uncertainty sigmay. It is computed as follows:

y_i= \sigma_y \times V_i +\sum_{g=1}^G \sum_{k=1}^K \textrm{int.coeff}_k \times \textrm{sum}X^{g}_{ik}

Where

  • G is the number of predictor sub matrices,

  • i is the index of the observation,

  • V is a normally distributed vector of 0 mean and unitary standard deviation,

  • K is the length of the vector int.coeff,

  • \textrm{sum}X^{g} is a matrix of n rows and K columns. The values of the column k are the sum of selected parts of each row of the sub matrix X^g. The columns of X^g are separated equally and each part is used for the K columns of \textrm{sum}X^{g}.

Value

A list of the following attributes

X

the concatenated predictors matrix.

y

the response vector.

y0

the response vector without noise sigmay.

sigmay

the uncertainty on y.

sigmaondes

the standard deviation of the Gaussians.

G

the number of groups.

Author(s)

Louna Alsouki François Wahl

Examples

### load dual.spls library
library(dual.spls)
####one predictors matrix
### parameters
n <- 100
p <- 50
nondes <- 20
sigmaondes <- 0.5
data1=d.spls.simulate(n=n,p=p,nondes=nondes,sigmaondes=sigmaondes)

Xa <- data1$X
ya <- data1$y

###plotting the data
plot(Xa[1,],type='l',ylim=c(0,max(Xa)),main='Data', ylab='Xa',col=1)
for (i in 2:n){ lines(Xa[i,],col=i) }

####two predictors matrix
### parameters
n <- 100
p <- c(50,100)
nondes <- c(20,30)
sigmaondes <- c(0.05,0.02)
data2=d.spls.simulate(n=n,p=p,nondes=nondes,sigmaondes=sigmaondes)

Xb <- data2$X
X1 <- Xb[,(1:p[1])]
X2 <- Xb[,(p[1]+1):(p[1]+p[2])]
yb <- data2$y

###plotting the data
plot(Xb[1,],type='l',ylim=c(0,max(Xb)),main='Data', ylab='Xb',col=1)
for (i in 2:n){ lines(Xb[i,],col=i) }

###plotting the data
plot(X1[1,],type='l',ylim=c(0,max(X1)),main='Data X1', ylab='X1',col=1)
for (i in 2:n){ lines(X1[i,],col=i) }

###plotting the data
plot(X2[1,],type='l',ylim=c(0,max(X2)),main='Data X2', ylab='X2',col=1)
for (i in 2:n){ lines(X2[i,],col=i) }

dual.spls documentation built on April 19, 2023, 1:07 a.m.