sample-methods: Methods for sampling rows of data from a table/view randomly

sample-methodsR Documentation

Methods for sampling rows of data from a table/view randomly

Description

This method samples rows of data from a table/view randomly. The sampled result is stored in a temporary table.

Usage

## S4 method for signature 'db.obj'
sample(x, size, replace = FALSE, prob = NULL, ...)

Arguments

x

A db.obj object, which is the wrapper to the data table.

size

An integer. The size of the random sample. When replace is FALSE, size must be smaller than the data table/view's total row number.

replace

A logical value, default is FALSE. When it is TRUE, the data is sampled with replacement, which means a row might be sampled for multiple times. When it is FALSE, each row can only be sampled at most once.

prob

A vector of double values, default is NULL. The probabilityies of each row to sample. Not implemented yet.

...

Extra parameters. Not implemented.

Details

When replace is FALSE, the data is just sorted randomly (see sort,db.obj-method) and selected, which is similar to sort(x, FALSE, "random"). When replace is TRUE, we have to scan the table multiple times to select repeated items.

Value

A db.data.frame object, which is a wrapper to a temporary table. The table contains the sampled data.

Author(s)

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io

Examples

## Not run: 


## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)

y <- as.db.data.frame(abalone, conn.id = cid, verbose = FALSE)
lk(y, 10)

dim(y)

a <- sample(y, 20)

dim(a)

lookat(a)

b <- sample(y, 40, replace = TRUE)

dim(b)

lookat(b)

delete(b)

db.disconnect(cid, verbose = FALSE)

## End(Not run)

greenplum-db/GreenplumR documentation built on Sept. 2, 2023, 8:09 a.m.