bin.sample: Bin Sample a Vector, Matrix, or Data Frame

View source: R/bin.sample.R

bin.sampleR Documentation

Bin Sample a Vector, Matrix, or Data Frame

Description

Bin elements of a vector (or rows of a matrix/data frame) and randomly sample a specified number of elements from each bin. Returns sampled data and (optionally) indices of sampled data and/or breaks for defining bins.

Usage

bin.sample(x, nbin = 5, size = 1, equidistant = FALSE, 
           index.return = FALSE, breaks.return = FALSE)

Arguments

x

Vector, matrix, or data frame to bin sample. Factors are allowed.

nbin

Number of bins for each variable (defaults to 5 bins for each dimension of x). If length(bins) != ncol(x), then nbin[1] is used for each variable.

size

Size of sample to randomly draw from each bin (defaults to 1).

equidistant

Should bins be defined equidistantly for each predictor? If FALSE (default), sample quantiles define bins for each predictor. If length(equidistant) != ncol(x), then equidistant[1] is used for each variable.

index.return

If TRUE, returns the (row) indices of the bin sampled observations.

breaks.return

If TRUE, returns the (lower bounds of the) breaks for the binning.

Details

For a single variable, the unidimensional bins are defined using the .bincode function. For multiple variables, the multidimensional bins are defined using the algorithm described in the appendix of Helwig et al. (2015), which combines the unidimensional bins (calculated via .bincode) into a multidimensional bin code.

Value

If index.return = FALSE and breaks.return = FALSE, returns the bin sampled x observations.

If index.return = TRUE and/or breaks.return = TRUE, returns a list with elements:

x

bin sampled x observations.

ix

row indices of bin sampled observations (if index.return = TRUE).

bx

lower bounds of breaks defining bins (if breaks.return = TRUE).

Note

For factors, the number of bins is automatically defined to be the number of levels.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Helwig, N. E., Gao, Y., Wang, S., & Ma, P. (2015). Analyzing spatiotemporal trends in social media data via smoothing spline analysis of variance. Spatial Statistics, 14(C), 491-504. doi: 10.1016/j.spasta.2015.09.002

See Also

.bincode for binning a numeric vector

Examples

##########   EXAMPLE 1   ##########
### unidimensional binning

# generate data
x <- seq(0, 1, length.out = 101)

# bin sample (default)
set.seed(1)
bin.sample(x)

# bin sample (return indices)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE)
xs$x             # sampled data
x[xs$ix]         # indexing sampled data

# bin sample (return indices and breaks)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE, breaks.return = TRUE)
xs$x             # sampled data
x[xs$ix]         # indexing sampled data
xs$bx            # breaks



##########   EXAMPLE 2   ##########
### bidimensional binning

# generate data
x <- expand.grid(x1 = seq(0, 1, length.out = 101),
                 x2 = seq(0, 1, length.out = 101))

# bin sample (default)
set.seed(1)
bin.sample(x)

# bin sample (return indices)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE)
xs$x             # sampled data
x[xs$ix,]        # indexing sampled data

# bin sample (return indices and breaks)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE, breaks.return = TRUE)
xs$x             # sampled data
x[xs$ix,]        # indexing sampled data
xs$bx            # breaks

# plot breaks and 25 bins
plot(xs$bx, xlim = c(0, 1), ylim = c(0, 1),
     xlab = "x1", ylab = "x2", main = "25 bidimensional bins")
grid()
text(xs$bx + 0.1, labels = 1:25)


npreg documentation built on July 21, 2022, 1:06 a.m.

Related to bin.sample in npreg...