panelNNET.default: Semi-parametric fixed-effects models for panel data, using...

Description Usage Arguments Details Value Note Author(s) References Examples

Description

Function fits semiparametric models in which the nonparametric part is represented by a neural network. Fixed-effects are available to represent repeated observations of cross-sectional units.

Usage

1
2
3
4
5
6
function(y, X, hidden_units, fe_var, maxit, lam, time_var, param,
                          parapen, penalize_toplayer, parlist, verbose,
                          report_interval, gravity, convtol, RMSprop, start_LR,
                          activation, batchsize, maxstopcounter, OLStrick, OLStrick_interval,
                          initialization, dropout_hidden, dropout_input, convolutional,
                          LR_slowing_rate, return_best, stop_early, ...)

Arguments

y

The response data

X

Variables to enter non-parametrically, the "inputs". This may be a data frame, matrix, or list of data frames or matrices. If a list, panelNNET will fit an "additive net", in which separate neural networks will be concatenated at the top later.

hidden_units

Integer vector of hidden units within hidden layers, or list of integer vectors for the additive nets case. First entry is the lowest layer, subsequent entries are higher layers.

fe_var

A factor indicating the cross-sectional unit. At present only one cross-sectional unit is supported.

maxit

Maximum number of epochs

lam

Lambda, the L2 penalty (or "weight decay") factor. Non-L2 penalties not yet supported.

time_var

Numeric vector of the time variable

param

Terms to enter parametrically, at the top layer

parapen

Numeric vector multiplying the penalties for the parametric terms. Defaults to a vector of zeros; parametric terms are unpenalized.

penalize_toplayer

Defaults to TRUE – if FALSE, the top layer of the nonparametric part of the model will not be penalized. This is rarely useful.

parlist

A list of starting values for the parameters. Chosen randomly if omitted (see the "initialization" argument). Useful when re-starting where another net left off

verbose

If true will print progress to console, and make plots of the algorithm's progress

gravity

The learning rate will be multiplied by this factor after each step in which the loss decreases.

convtol

Convergence tolerance. When <<maxstopcounter>> successive iterations fail to improve MSE by this amount, gradient descent exits

RMSprop

Gradient descent by RMSprop. If FALSE, step size is equal along each dimension.

start_LR

The initial learning rate a.k.a. step size

activation

"tanh", "logistic", "relu", or "lrelu" (for "leaky ReLU")

batchsize

Size of batches for minibatch gradient descent. Defaults to nrow(X), which is batch gradient descent.

maxstopcounter

How many times should the learning rate be halved after an epoch increases MSE, before panelNNET exits?

OLStrick

At the end of each (OLStrick_interval)th epoch, find the closed-form solution on the top layer of the network that minimizes the penalized loss function.

OLStrick_interval

Perform the OLS trick after this number of epochs.

initialization

If "HRZS", the weight initialization scheme proposed by He, Zhang, Ren, and Jian (2015). If "XG", the weight initialization scheme proposed by Glorot and Bengio (2010). Otherwise, draws from a uniform distribution with bounds of -.7 and .7, following recommndations in Hastie, Tibshirani, and Friedman.

dropout_hidden

Proportion of the hidden layers to keep during each epoch. Values below 1 correspond to varying degrees of dropout regularization.

dropout_input

Proportion of the input layers to keep during each epoch. Values below 1 correspond to varying degrees of dropout regularization.

convolutional

When not null, a list with the following elements. topology: an integer vector indicating when variables were measured, for example days in a season. span: the width of local connectivity, in units of the topology. step: the distance between centers of spans.

LR_slowing_rate

After an iteration in which the loss increases, decrease learning rate by gravity to the power of the LR_slowing_rate

return_best

If TRUE, return the parlist that minimizes loss over the course of training, rather than the most recent parlist.

stop_early

A list with elements (1) "check_every" – defaults to 20. Check test set performance after this number of iterations. (2) "y_test" – a vector of resposes in the test set. (3) X_test – a matrix (or list of matrices) of test-set variables for the nonparametric part of the model. (4) P_test – a matrix of test-set variables for the parametric part of the model. (5) fe_test – a vector of fixed effect labels for the test set.

Details

Function fits a model of the form

y_it = u_i + P_it B + Z_it C + e_it

Z_it = activation(X_it D)

in the single-layer case, and generalized in the multi-layer cased. Parameters [B, C, and D] are fit by one of two methods of gradient descent, subject regularization by the parameter lam. Given that the top-layer is linear, estimation is facilitated by the "within" transformation – subtraction of the group-wise mean, in order to eliminate the fixed effects u_i.

Value

yhat

The fitted values

parlist

The estimated parameters

fe

Estimates of the fixed effects, for each observation

converged

TRUE or FALSE

mse

mean squared error (in-sample)

loss

final value of the loss function

lam

The supplied penalty

hidden_layers

The network architecture

time_var

The time variable supplied

X

The data supplied to enter non-parametrically

y

The supplied outcome

param

The data supplied to enter linearly

fe_var

The supplied cross-sectional unit

hidden_layers

The pseudodata at each layer

final_improvement

The last improvement to MSE at exit

msevec

The evolution of MSE over the iterations

RMSprop

Whether RMSprop was used

convtol

The convergence tolerance used

grads

The gradients at exit

activation

The activation function used

parapen

The factor that multiplies lambda for the parametric terms

batchsize

User-supplied

initialization

User-supplied

Note

This package is in active development.

Author(s)

Andrew Crane-Droesch

References

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Vol. 1. Springer, Berlin: Springer series in statistics, 2001.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
  set.seed(1)
  #Fake dataset
  N <- 1000
  p <- 20
  X <- as.data.frame(mvrnorm(N, rep(0, p), diag(rep(1, p))))
  id <- factor(0:(N-1)%%20+1)
  id.eff <- rnorm(nlevels(id), sd = 5)
  time <- 0:(N - 1)%/%20+1
  u <- rnorm(N, sd = 5)
  y <- sin(3*X$V1) - cos(4*X$V2) + 3*tanh((-2*X$V1+X$V2+X$V4)*X$V3) + X$V6/(X$V7+8) + id.eff[id] +
     .5*time - .005*time^2 + u
  hist(y)


  #Parametric and nonparametric terms
  X <- X
  P <- cbind(time, time^2)

  #Traiing and test set
  tr <- time<35
  te <- tr == FALSE

  #Fitting a two-layer neural net with 5 and 3 hidden units
  pnn <- panelNNET(y[tr], X[tr,], hidden_units = c(5,3)
    , fe_var = id[tr], lam = 1
    , time_var = time[tr], param = P[tr,], verbose = FALSE
    , gravity = 1.01
    , RMSprop = TRUE, convtol = 1e-5, maxit = 10000
    , activation = 'tanh', parapen = c(0,0)
  )

  plot(pnn)
  summary(pnn) #Approx inference

cranedroesch/panelNNET documentation built on May 14, 2019, 11:31 a.m.