Description Usage Arguments Details Value See Also Examples
Do random sampling from an Xdf file
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ## S3 method for class 'RxXdfData'
sample_n(tbl, size = 1, replace = FALSE,
weight = NULL, .env = NULL)
## S3 method for class 'RxXdfData'
sample_frac(tbl, size = 1, replace = FALSE,
weight = NULL, .env = NULL)
## S3 method for class 'grouped_tbl_xdf'
sample_n(tbl, size = 1, replace = FALSE,
weight = NULL, .env = NULL)
## S3 method for class 'grouped_tbl_xdf'
sample_frac(tbl, size = 1, replace = FALSE,
weight = NULL, .env = NULL)
|
tbl |
An Xdf file or a tbl wrapping the same. |
size |
For |
replace, weight, .env |
Not used. |
Sampling from Xdf files is slightly more limited than the data frame case. Only unweighted sampling without replacement is supported, and attempts to specify otherwise will result in a warning. Unlike the other single-table dplyr verbs, sample_n
and sample_frac
do not delete tbl inputs; this is because it's unlikely that a sample is intended to replace the input data entirely.
Currently sampling on HDFS data works in the local compute context (on the edge node) but not in the Hadoop or Spark compute contexts.
An Xdf tbl.
sample_frac
and sample_n
in package dplyr, sample
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | mtx <- as_xdf(mtcars, overwrite=TRUE)
tbl <- sample_n(mtx, 10)
nrow(tbl)
tbl2 <- sample_frac(mtx, 0.5)
nrow(tbl2)
tbl3 <- group_by(mtx, vs) %>% sample_frac(0.5)
nrow(tbl3)
# to get an _approximate_ sample, use filter()
tbl4 <- filter(mtx, runif(.rxNumRows) < 0.4) # keep 40% of rows in the data
nrow(tbl4)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.