# bin.sample: Bin Sample a Vector, Matrix, or Data Frame In npreg: Nonparametric Regression via Smoothing Splines

## Description

Bin elements of a vector (or rows of a matrix/data frame) and randomly sample a specified number of elements from each bin. Returns sampled data and (optionally) indices of sampled data and/or breaks for defining bins.

## Usage

 ```1 2``` ```bin.sample(x, nbin = 5, size = 1, equidistant = FALSE, index.return = FALSE, breaks.return = FALSE) ```

## Arguments

 `x` Vector, matrix, or data frame to bin sample. Factors are allowed. `nbin` Number of bins for each variable (defaults to 5 bins for each dimension of `x`). If `length(bins) != ncol(x)`, then `nbin[1]` is used for each variable. `size` Size of sample to randomly draw from each bin (defaults to 1). `equidistant` Should bins be defined equidistantly for each predictor? If `FALSE` (default), sample quantiles define bins for each predictor. If `length(equidistant) != ncol(x)`, then `equidistant[1]` is used for each variable. `index.return` If `TRUE`, returns the (row) indices of the bin sampled observations. `breaks.return` If `TRUE`, returns the (lower bounds of the) breaks for the binning.

## Details

For a single variable, the unidimensional bins are defined using the `.bincode` function. For multiple variables, the multidimensional bins are defined using the algorithm described in the appendix of Helwig et al. (2015), which combines the unidimensional bins (calculated via `.bincode`) into a multidimensional bin code.

## Value

If `index.return = FALSE` and `breaks.return = FALSE`, returns the bin sampled `x` observations.

If `index.return = TRUE` and/or `breaks.return = TRUE`, returns a list with elements:

 `x ` bin sampled `x` observations. `ix ` row indices of bin sampled observations (if `index.return = TRUE`). `bx ` lower bounds of breaks defining bins (if `breaks.return = TRUE`).

## Note

For factors, the number of bins is automatically defined to be the number of levels.

## Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

## References

Helwig, N. E., Gao, Y., Wang, S., & Ma, P. (2015). Analyzing spatiotemporal trends in social media data via smoothing spline analysis of variance. Spatial Statistics, 14(C), 491-504. doi: 10.1016/j.spasta.2015.09.002

`.bincode` for binning a numeric vector
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54``` ```########## EXAMPLE 1 ########## ### unidimensional binning # generate data x <- seq(0, 1, length.out = 101) # bin sample (default) set.seed(1) bin.sample(x) # bin sample (return indices) set.seed(1) xs <- bin.sample(x, index.return = TRUE) xs\$x # sampled data x[xs\$ix] # indexing sampled data # bin sample (return indices and breaks) set.seed(1) xs <- bin.sample(x, index.return = TRUE, breaks.return = TRUE) xs\$x # sampled data x[xs\$ix] # indexing sampled data xs\$bx # breaks ########## EXAMPLE 2 ########## ### bidimensional binning # generate data x <- expand.grid(x1 = seq(0, 1, length.out = 101), x2 = seq(0, 1, length.out = 101)) # bin sample (default) set.seed(1) bin.sample(x) # bin sample (return indices) set.seed(1) xs <- bin.sample(x, index.return = TRUE) xs\$x # sampled data x[xs\$ix,] # indexing sampled data # bin sample (return indices and breaks) set.seed(1) xs <- bin.sample(x, index.return = TRUE, breaks.return = TRUE) xs\$x # sampled data x[xs\$ix,] # indexing sampled data xs\$bx # breaks # plot breaks and 25 bins plot(xs\$bx, xlim = c(0, 1), ylim = c(0, 1), xlab = "x1", ylab = "x2", main = "25 bidimensional bins") grid() text(xs\$bx + 0.1, labels = 1:25) ```