binsamp: Bin-Samples Strategic Knot Indices In bigsplines: Smoothing Splines for Large Samples

Description

Breaks the predictor domain into a user-specified number of disjoint subregions, and randomly samples a user-specified number of observations from each (nonempty) subregion.

Usage

 1 binsamp(x,xrng=NULL,nmbin=11,nsamp=1,alg=c("new","old"))

Arguments

 x Matrix of predictors \mathbf{X}=\{x_{ij}\}_{n \times p} where n is the number of observations, and p is the number of predictors. xrng Optional matrix of predictor ranges: \mathbf{R}=\{r_{kj}\}_{2 \times p} where r_{1j}=\min_{i}x_{ij} and r_{2j}=\max_{i}x_{ij}. nmbin Vector \mathbf{b}=(b_{1},…,b_{p})', where b_{j}≥q1 is the number of marginal bins to use for the j-th predictor. If length(nmbin)

Value

Returns an index vector indicating the rows of x that were bin-sampled.

Warnings

If x_{ij} is nominal with g levels, the function requires b_{j}=g and x_{ij}\in\{1,…,g\} for i\in\{1,…,n\}.

Note

The number of returned knots will depend on the distribution of the covariate scores. The maximum number of possible bin-sampled knots is s∏_{j=1}^{p}b_{j}, but fewer knots will be returned if one (or more) of the bins is empty (i.e., if there is no data in one or more bins).

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

Examples

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 ########## EXAMPLE 1 ########## # create 2-dimensional predictor (both continuous) set.seed(123) xmat <- cbind(runif(10^6),runif(10^6)) # Default use: # 10 marginal bins for each predictor # sample 1 observation from each subregion xind <- binsamp(xmat) # get the corresponding knots bknots <- xmat[xind,] # compare to randomly-sampled knots rknots <- xmat[sample(1:(10^6),100),] par(mfrow=c(1,2)) plot(bknots,main="bin-sampled") plot(rknots,main="randomly sampled") ########## EXAMPLE 2 ########## # create 2-dimensional predictor (continuous and nominal) set.seed(123) xmat <- cbind(runif(10^6),sample(1:3,10^6,replace=TRUE)) # use 10 marginal bins for x1 and 3 marginal bins for x2 # and sample one observation from each subregion xind <- binsamp(xmat,nmbin=c(10,3)) # get the corresponding knots bknots <- xmat[xind,] # compare to randomly-sampled knots rknots <- xmat[sample(1:(10^6),30),] par(mfrow=c(1,2)) plot(bknots,main="bin-sampled") plot(rknots,main="randomly sampled") ########## EXAMPLE 3 ########## # create 3-dimensional predictor (continuous, continuous, nominal) set.seed(123) xmat <- cbind(runif(10^6),runif(10^6),sample(1:2,10^6,replace=TRUE)) # use 10 marginal bins for x1 and x2, and 2 marginal bins for x3 # and sample one observation from each subregion xind <- binsamp(xmat,nmbin=c(10,10,2)) # get the corresponding knots bknots <- xmat[xind,] # compare to randomly-sampled knots rknots <- xmat[sample(1:(10^6),200),] par(mfrow=c(2,2)) plot(bknots[1:100,1:2],main="bin-sampled, x3=1") plot(bknots[101:200,1:2],main="bin-sampled, x3=2") plot(rknots[rknots[,3]==1,1:2],main="randomly sampled, x3=1") plot(rknots[rknots[,3]==2,1:2],main="randomly sampled, x3=2")

bigsplines documentation built on May 2, 2019, 9:27 a.m.