bin.sample | R Documentation |
Bin elements of a vector (or rows of a matrix/data frame) and randomly sample a specified number of elements from each bin. Returns sampled data and (optionally) indices of sampled data and/or breaks for defining bins.
bin.sample(x, nbin = 5, size = 1, equidistant = FALSE,
index.return = FALSE, breaks.return = FALSE)
x |
Vector, matrix, or data frame to bin sample. Factors are allowed. |
nbin |
Number of bins for each variable (defaults to 5 bins for each dimension of |
size |
Size of sample to randomly draw from each bin (defaults to 1). |
equidistant |
Should bins be defined equidistantly for each predictor? If |
index.return |
If |
breaks.return |
If |
For a single variable, the unidimensional bins are defined using the .bincode
function. For multiple variables, the multidimensional bins are defined using the algorithm described in the appendix of Helwig et al. (2015), which combines the unidimensional bins (calculated via .bincode
) into a multidimensional bin code.
If index.return = FALSE
and breaks.return = FALSE
, returns the bin sampled x
observations.
If index.return = TRUE
and/or breaks.return = TRUE
, returns a list with elements:
x |
bin sampled |
ix |
row indices of bin sampled observations (if |
bx |
lower bounds of breaks defining bins (if |
For factors, the number of bins is automatically defined to be the number of levels.
Nathaniel E. Helwig <helwig@umn.edu>
Helwig, N. E., Gao, Y., Wang, S., & Ma, P. (2015). Analyzing spatiotemporal trends in social media data via smoothing spline analysis of variance. Spatial Statistics, 14(C), 491-504. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.spasta.2015.09.002")}
.bincode
for binning a numeric vector
########## EXAMPLE 1 ##########
### unidimensional binning
# generate data
x <- seq(0, 1, length.out = 101)
# bin sample (default)
set.seed(1)
bin.sample(x)
# bin sample (return indices)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE)
xs$x # sampled data
x[xs$ix] # indexing sampled data
# bin sample (return indices and breaks)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE, breaks.return = TRUE)
xs$x # sampled data
x[xs$ix] # indexing sampled data
xs$bx # breaks
########## EXAMPLE 2 ##########
### bidimensional binning
# generate data
x <- expand.grid(x1 = seq(0, 1, length.out = 101),
x2 = seq(0, 1, length.out = 101))
# bin sample (default)
set.seed(1)
bin.sample(x)
# bin sample (return indices)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE)
xs$x # sampled data
x[xs$ix,] # indexing sampled data
# bin sample (return indices and breaks)
set.seed(1)
xs <- bin.sample(x, index.return = TRUE, breaks.return = TRUE)
xs$x # sampled data
x[xs$ix,] # indexing sampled data
xs$bx # breaks
# plot breaks and 25 bins
plot(xs$bx, xlim = c(0, 1), ylim = c(0, 1),
xlab = "x1", ylab = "x2", main = "25 bidimensional bins")
grid()
text(xs$bx + 0.1, labels = 1:25)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.