sample: Do random sampling from an Xdf file
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description Usage Arguments Details Value See Also Examples

Do random sampling from an Xdf file

## S3 method for class 'RxXdfData'
sample_n(tbl, size = 1, replace = FALSE,
  weight = NULL, .env = NULL)

## S3 method for class 'RxXdfData'
sample_frac(tbl, size = 1, replace = FALSE,
  weight = NULL, .env = NULL)

## S3 method for class 'grouped_tbl_xdf'
sample_n(tbl, size = 1, replace = FALSE,
  weight = NULL, .env = NULL)

## S3 method for class 'grouped_tbl_xdf'
sample_frac(tbl, size = 1, replace = FALSE,
  weight = NULL, .env = NULL)

`tbl`	An Xdf file or a tbl wrapping the same.
`size`	For `sample_n`, the number of rows to select. For `sample_frac`, the fraction of rows to select. For a grouped dataset, `size` applies to each group.
`replace, weight, .env`	Not used.

Sampling from Xdf files is slightly more limited than the data frame case. Only unweighted sampling without replacement is supported, and attempts to specify otherwise will result in a warning. Unlike the other single-table dplyr verbs, sample_n and sample_frac do not delete tbl inputs; this is because it's unlikely that a sample is intended to replace the input data entirely.

Currently sampling on HDFS data works in the local compute context (on the edge node) but not in the Hadoop or Spark compute contexts.

An Xdf tbl.

sample_frac and sample_n in package dplyr, sample

mtx <- as_xdf(mtcars, overwrite=TRUE)

tbl <- sample_n(mtx, 10)
nrow(tbl)

tbl2 <- sample_frac(mtx, 0.5)
nrow(tbl2)

tbl3 <- group_by(mtx, vs) %>% sample_frac(0.5)
nrow(tbl3)

# to get an _approximate_ sample, use filter()
tbl4 <- filter(mtx, runif(.rxNumRows) < 0.4)  # keep 40% of rows in the data
nrow(tbl4)

RevolutionAnalytics/dplyrXdf documentation built on June 3, 2019, 9:08 p.m.

RevolutionAnalytics/dplyrXdf index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RevolutionAnalytics/dplyrXdf
Tools for working with Microsoft R Server Xdf files and the dplyr package

sample: Do random sampling from an Xdf file
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to sample in RevolutionAnalytics/dplyrXdf...

R Package Documentation

Browse R Packages

We want your feedback!

RevolutionAnalytics/dplyrXdf Tools for working with Microsoft R Server Xdf files and the dplyr package

sample: Do random sampling from an Xdf file In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to sample in RevolutionAnalytics/dplyrXdf...

R Package Documentation

Browse R Packages

We want your feedback!

RevolutionAnalytics/dplyrXdf
Tools for working with Microsoft R Server Xdf files and the dplyr package

sample: Do random sampling from an Xdf file
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package