s_soda: S-SODA algorithm for general index model variable selection

Description Usage Arguments Value Examples

View source: R/pure_soda.R

Description

S-SODA is an extension of SODA to conduct variable selection for general index models with continuous response. S-SODA first evenly discretizes the continuous response into H slices, and then apply SODA on the discretized response. Compared with existing variable selection methods based on the Sliced Inverse Regression (SIR), SODA requires neither the linearity nor the constant variance condition and is much more robust.

Usage

1
s_soda(x, y, H = 5, gam = 0, minF = 3, norm = F, debug = F)

Arguments

x

The design matrix, of dimensions n * p, without an intercept. Each row is an observation vector.

y

The response vector of dimension n * 1.

H

The number of slices.

gam

EBIC penalization coefficient parameter for SODA.

minF

Minimum number of steps in forward interaction screening. Default is minF=3.

norm

If set as True, S-SODA first marginally quantile-normalize each predictor to the standard normal distribution.

debug

If print debug information.

Value

BIC

Trace of extended Bayesian information criterion (EBIC) score.

Var

Trace of selected variables.

Term

Trace of selected main and interaction terms.

best_BIC

Final selected term set EBIC score.

best_Var

Final selected variables.

best_Term

Final selected main and interaction terms.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# # (uncomment the code to run)
# # Simulation:  x1 / (1 + x2^2) example 
# N = 500
# x1 = runif(N, -3, +3)
# x2 = runif(N, -3, +3)
# x3 = x1 / exp(x2^2) + rnorm(N, 0, 0.2)
# ss = s_soda_model(cbind(x1,x2), x3, H=25)
# 
# # true surface in grid
# MM = 50
# xx1 = seq(-3, +3, length.out = MM)
# xx2 = seq(-3, +3, length.out = MM)
# yyy = matrix(0, MM, MM)
# for(i in 1:MM)
#   for(j in 1:MM)
#     yyy[i,j] = xx1[i] / exp(xx2[j]^2)
# 
# # predicted surface
# ppp = s_soda_pred_grid(xx1, xx2, ss, po=1)
# 
# par(mfrow=c(1, 2), mar=c(1.75, 3, 1.25, 1.5))
# persp(xx1, xx2, yyy, theta=-45, xlab="X1", ylab="X2", zlab="Y")
# persp(xx1, xx2, ppp, theta=-45, xlab="X1", ylab="X2", zlab="Pred")
# 
# # Pumadyn dataset
# #data(pumadyn);
# #s_soda(pumadyn_isample_x, pumadyn_isample_y, H=25, gam=0)

sodavis documentation built on May 2, 2019, 12:38 p.m.

Related to s_soda in sodavis...