binsamp: Bin-Samples Strategic Knot Indices

Description Usage Arguments Value Warnings Note Author(s) Examples

View source: R/binsamp.R

Description

Breaks the predictor domain into a user-specified number of disjoint subregions, and randomly samples a user-specified number of observations from each (nonempty) subregion.

Usage

1
binsamp(x,xrng=NULL,nmbin=11,nsamp=1,alg=c("new","old"))

Arguments

x

Matrix of predictors \mathbf{X}=\{x_{ij}\}_{n \times p} where n is the number of observations, and p is the number of predictors.

xrng

Optional matrix of predictor ranges: \mathbf{R}=\{r_{kj}\}_{2 \times p} where r_{1j}=\min_{i}x_{ij} and r_{2j}=\max_{i}x_{ij}.

nmbin

Vector \mathbf{b}=(b_{1},…,b_{p})', where b_{j}≥q1 is the number of marginal bins to use for the j-th predictor. If length(nmbin)<ncol(x), then nmbin[1] is used for all columns. Default is nmbin=11 marginal bins for each dimension.

nsamp

Scalar s≥q1 giving the number of observations to sample from each bin. Default is sample nsamp=1 observation from each bin.

alg

Bin-sampling algorithm. New algorithm forms equidistant grid, whereas old algorithm forms approximately equidistant grid. New algorithm is default for versions 1.0-1 and later.

Value

Returns an index vector indicating the rows of x that were bin-sampled.

Warnings

If x_{ij} is nominal with g levels, the function requires b_{j}=g and x_{ij}\in\{1,…,g\} for i\in\{1,…,n\}.

Note

The number of returned knots will depend on the distribution of the covariate scores. The maximum number of possible bin-sampled knots is s∏_{j=1}^{p}b_{j}, but fewer knots will be returned if one (or more) of the bins is empty (i.e., if there is no data in one or more bins).

Author(s)

Nathaniel E. Helwig <[email protected]>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
##########   EXAMPLE 1   ##########

# create 2-dimensional predictor (both continuous)
set.seed(123)
xmat <- cbind(runif(10^6),runif(10^6))

# Default use:
#   10 marginal bins for each predictor
#   sample 1 observation from each subregion
xind <- binsamp(xmat)

# get the corresponding knots
bknots <- xmat[xind,]

# compare to randomly-sampled knots
rknots <- xmat[sample(1:(10^6),100),]
par(mfrow=c(1,2))
plot(bknots,main="bin-sampled")
plot(rknots,main="randomly sampled")



##########   EXAMPLE 2   ##########

# create 2-dimensional predictor (continuous and nominal)
set.seed(123)
xmat <- cbind(runif(10^6),sample(1:3,10^6,replace=TRUE))

# use 10 marginal bins for x1 and 3 marginal bins for x2 
# and sample one observation from each subregion
xind <- binsamp(xmat,nmbin=c(10,3))

# get the corresponding knots
bknots <- xmat[xind,]

# compare to randomly-sampled knots
rknots <- xmat[sample(1:(10^6),30),]
par(mfrow=c(1,2))
plot(bknots,main="bin-sampled")
plot(rknots,main="randomly sampled")



##########   EXAMPLE 3   ##########

# create 3-dimensional predictor (continuous, continuous, nominal)
set.seed(123)
xmat <- cbind(runif(10^6),runif(10^6),sample(1:2,10^6,replace=TRUE))

# use 10 marginal bins for x1 and x2, and 2 marginal bins for x3 
# and sample one observation from each subregion
xind <- binsamp(xmat,nmbin=c(10,10,2))

# get the corresponding knots
bknots <- xmat[xind,]

# compare to randomly-sampled knots
rknots <- xmat[sample(1:(10^6),200),]
par(mfrow=c(2,2))
plot(bknots[1:100,1:2],main="bin-sampled, x3=1")
plot(bknots[101:200,1:2],main="bin-sampled, x3=2")
plot(rknots[rknots[,3]==1,1:2],main="randomly sampled, x3=1")
plot(rknots[rknots[,3]==2,1:2],main="randomly sampled, x3=2")

taylerablake/thin-plate-splines documentation built on Sept. 19, 2017, 9:45 a.m.