fitKDSN: Fit kernel deep stacking network with random Fourier...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

Estimates the kernel deep stacking network with identity link function (Gaussian) for the response in each level. Sparsity may be included with the addition of dropout (experimental), variable pre- and internal selection (experimental) as well as L1 penalty of the cofficients.

Usage

1
2
3
4
5
6
7
8
fitKDSN(y, X, levels=1, Dim=rep(round(sqrt(dim(X) [1]) / 2), levels), 
             sigma=rep(1, levels), lambdaRel=rep(0.5, levels), 
             alpha=rep(0, levels), info=FALSE, seedW=NULL, standX=TRUE, standY=FALSE,
             varSelect=rep(FALSE, levels), varRanking=NULL, varExFreq=rep(NA, levels),
             dropHidden=rep(FALSE, levels), 
             dropHiddenProb=rep(NA, levels), 
             seedDrop=NULL, baggingInd=NULL, randSubsetInd=NULL,
             maxItOpt=10000)

Arguments

y

Response variable as numeric vector.

X

Design matrix. All factors must be encoded, e.g. dummy coding.

levels

Number of levels of the kernel deep stacking network (integer scalar).

Dim

Dimension of the random Fourier transformations in each level (integer vector). The first entry corresponds to the first level, the second entry to the second level and so on.

sigma

Variance of the Gaussian kernel of each level. The higher the variance, on average the more different transformed observations will be generated.

lambdaRel

Relative ridge regularization parameter of each level. The relative ridge regularization parameter is limited to the interval [0, 1] and 0 corresponds to no shrinkage and 1 stands for maximum penalty to shrink all coefficients except the intercept to zero. Default is set in the middle between those extremes.

alpha

Weight parameter between lasso and ridge penalty (numeric vector) of each level. Default=0 corresponds to ridge penalty and 1 equals lasso.

info

Determines if additional infos of the level computation are displayed during training (logical value). Default is FALSE.

seedW

Random seed for drawing Fourier transformation weights of the multivariate normal distribution (integer vector). Each entry in the vector corresponds to one level.

standX

Should the design matrix be standardized by median and median absolute deviation? Default is TRUE.

standY

Should the response be standardized by median and median absolute deviation? Default is FALSE.

varSelect

Should unimportant variables be excluded in each level? Default is that all available variables are used. (logical scalar)

varRanking

Defines a variable ranking in increasing order. The first variable is least important and the last is the most important. (integer vector)

varExFreq

Gives the relative frequency of variables to drop within the interval [0, 1]. Latent variables are always included in the model.

dropHidden

Should dropout be applied on the random Fourier transformation? Each entry corresponds to the one level. Default is without dropout in all levels (logical vector).

dropHiddenProb

Probability of inclusion of cells in random Fourier transformation. Each entry corresponds to the probability of one level.

seedDrop

Specifies the seed of the random dropouts in the calculation of random Fourier transformation per level. Default is random (integer vector).

baggingInd

Gives the indices of the bootstrap samples in list format. Each entry of the list (integer vector) corresponds to one level.

randSubsetInd

Gives the indices of the random variable subsets in list format. Each entry of the list (integer vector) corresponds to one level.

maxItOpt

Gives the maximum number of iterations in the optimization procedure of the elastic net regression. For details see glmnet.

Details

Note that the implementation of drop-out and variable selection is still experimental. \ \ A kernel deep stacking network is an approximation of artificial neural networks with multiple number of levels. In each level there is a random Fourier transformation of the data, based on Gaussian kernels, which can be represented as an infinite series. The random Fourier transform converges exponentially fast to the Gaussian kernel matrix by increasing dimension. Then a kernel ridge regression is applied given the transformed data. The predictions of the all last levels are included as covariates in the next level.

Value

Note

It is recommended to standardize the design matrix before training. Otherwise the resulting values of the Fourier transform can be either all very small or very high, which results in poor prediction performance.

Author(s)

Thomas Welchowski welchow@imbie.meb.uni-bonn.de

References

Po-Seng Huang and Li Deng and Mark Hasegawa-Johnson and Xiaodong He, (2013), Random Features for kernel deep convex network, Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

D.S. Broomhead and David Lowe, (1988), Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks, Controller HMSO, London

Jerome Friedman and Trevor Hastie and Rob Tibshirani, (2008), Regularization Paths for Generalized Linear Models via Coordinate Descent, Department of Statistics, Stanford University

See Also

predict.KDSN, randomFourierTrans, robustStandard, glmnet

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
####################################
# Example with binary outcome

# Generate covariate matrix
sampleSize <- 100
X <- matrix(0, nrow=100, ncol=5)
for(j in 1:5) {
  set.seed (j)
  X [, j] <- rnorm(sampleSize)
}

# Generate Bernoulli response
rowSumsX <- rowSums(X)
logistVal <- exp(rowSumsX) / (1 + exp(rowSumsX))
set.seed(-1)
y <- sapply(1:100, function(x) rbinom(n=1, size=1, prob=logistVal[x]))

# Fit kernel deep stacking network with three levels
fitKDSN1 <- fitKDSN(y=y, X=X, levels=3, Dim=c(20, 10, 5), 
             sigma=c(6, 8, 10), lambdaRel=c(1, 0.1, 0.01), 
             alpha=rep(0, 3), info=TRUE, 
             seedW=c(1882335773, -1640543072, -931660653))

# Generate new test data
sampleSize <- 100
Xtest <- matrix(0, nrow=100, ncol=5)
for(j in 1:5) {
  set.seed (j+50)
  Xtest [, j] <- rnorm(sampleSize)
}
rowSumsXtest <- rowSums(Xtest)
logistVal <- exp(rowSumsXtest) / (1 + exp(rowSumsXtest))
set.seed(-1)
ytest <- sapply(1:100, function(x) rbinom(n=1, size=1, prob=logistVal[x]))

# Evaluate on test data with auc
library(pROC)
preds <- predict(fitKDSN1, Xtest)
auc(response=ytest, predictor=c(preds))

####################################
# Example with continuous outcome

# Generate covariate matrix
sampleSize <- 100
X <- matrix(0, nrow=100, ncol=5)
for(j in 1:5) {
  set.seed (j)
  X [, j] <- rnorm(sampleSize)
}

# Generate Gaussian random variable conditional on the covariates
linPred <- 1*X[,1] + 2*X[,2] + 0.5*X[,3] + 0.5
set.seed(-1)
y <- sapply(1:100, function(x) rnorm(n=1, mean=linPred[x], sd=2))

# Fit kernel deep stacking network with five levels
fitKDSN1 <- fitKDSN(y=y, X=X, levels=5, Dim=c(40, 20, 10, 7, 5), 
             sigma=c(0.125, 0.25, 0.5, 1, 2), lambdaRel=c(1, 0.5, 0.1, 0.01, 0.001), 
             alpha=rep(0, 5), info=TRUE, 
             seedW=c(-584973296, -251589341, -35931358, 313178052, -1322344272))

# Generate new data
sampleSize <- 100
Xtest <- matrix(0, nrow=100, ncol=5)
for(j in 1:5) {
  set.seed (j+10)
  Xtest [, j] <- rnorm(sampleSize)
}
linPred <- 1*Xtest[,1] + 2*Xtest[,2] + 0.5*Xtest[,3] + 0.5
set.seed(-10)
ytest <- sapply(1:100, function(x) rnorm(n=1, mean=linPred[x], sd=2))

# Predictions on new data and compute root mean squared error
preds <- predict(obj=fitKDSN1, newx=Xtest)
RMSE <- sqrt(mean((ytest-c(preds))^2))
RMSE

kernDeepStackNet documentation built on May 2, 2019, 8:16 a.m.