Description Usage Arguments Details Examples
View source: R/computeSample.R
Draws a sample of rows from the table randomly. The function offers two sampling approaches and three stratum strategies. Sampling approaches by
sample fraction: a simple binomial (Bernoulli) sampling on a row-by-row basis with given sample rate(s) (see sampleFraction
)
sample size: sampling a given number of rows without replacement (see sampleSize
)
Stratum strategies:
single stratum: the whole table or its subset (defined using where
).
by column values: using conditionColumn
and conditionValues
arguments define stratum per value in the table column.
by SQL expression: using conditionStratum
and conditionValues
arguments define stratum using SQL
expression (with SQL CASE
function but not necessarily) per value.
1 2 3 |
channel |
connection object as returned by |
tableName |
table name |
sampleFraction |
one or more sample fractions to use in the sampling of data. Multipe
sampling fractions are applicable only in combination with the arguments |
sampleSize |
total sample size (applies only when |
conditionColumn |
if you use this argument, you must also use the |
conditionStratum |
if you use this argument, you must also use the |
conditionValues |
see argument |
include |
a vector of column names to include. Output never contains attributes other than in the list. |
except |
a vector of column names to exclude. Output never contains attributes from the list. |
where |
specifies criteria to satisfy by the table rows before applying
computation. The creteria are expressed in the form of SQL predicates (inside
|
... |
additional arguments to be passed to |
test |
logical: if TRUE show what would be done, only (similar to parameter |
The sampling can be applied to the entire table or can be refined with either conditionColumn
or conditionStratum
.
In each case a subset of the table defined with where
argument may apply too. The resulting stratum models are:
Single Sample Fraction: provide only one value in sampleFraction
, this single fraction is
targeted throughout the whole population or across all the strata defined by the sample conditions
conditionColumn
or conditionStrata
in combination with conditionValues
.
Variable Sample Fractions: provide multiple values in sampleFraction
, each of them is used for
sampling a particular stratum defined by the conditionColumn
or conditionStratum
arguments in
combination with conditionValues
. Number of values in sampleFraction
and conditionValues
must
be the same.
Total Sample Size: provide only one value in sampleSize
for the total sample size for the
entire population. If in addition you specify the conditionColumn
or conditionStratum
arguments,
the function proportionally generates sample units for each stratum defined by the conditionColumn
or conditionStratum
arguments in combination with conditionValues
.
Variable Sample Sizes: provide multiple sizes in sampleSize
so that each size corresponds
to a stratum defined by conditionColumn
or conditionStratum
arguments in combination with conditionValues
.
The sample function generates sample units directly for each stratum based on the supplied sizes. Number of values
in sampleSize
and conditionValues
must be the same.
No columns returned as character and not excluded by as.is
are
converted to factors by default, i.e. stringsAsFactors = FALSE
when calling
sqlQuery
if not specified when calling this function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | if(interactive()){
# initialize connection to Lahman baseball database in Aster
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")
batters = computeSample(conn, "batting", sampleFraction=0.01)
dim(batters)
pitchersAL = computeSample(conn, "pitching", sampleSize=1000, where="lgid = 'AL'")
dim(ptichersAL)
battersByDecadesSingleSize = computeSample(conn, tableName = 'batting_enh',
sampleSize=1000,
conditionColumn = 'decadeid',
conditionValues = c(1990,2000,2010))
dim(battersByDecadesSingleSize)
battersByDecades = computeSample(conn, tableName = 'batting_enh',
sampleFraction = c(0.01,0.01,0.02),
conditionColumn = 'decadeid', conditionValues = c(1990,2000,2010))
dim(battersByDecades)
battersByOddEvenYears = computeSample(channel=NULL, tableName = 'batting_enh',
sampleFraction = c(0.01,0.02),
include = c('decadeid','g','ab','r','h'),
conditionStratum = "yearid % 2",
conditionValues = c('0','1'),
where = "lgid = 'NL'")
dim(battersByOddEvenYears)
battersBeforeAfter1960 = computeSample(channel=NULL, tableName = 'batting_enh',
sampleSize = c(200, 200),
conditionStratum = "CASE WHEN yearid <- 1960 THEN 'before'
ELSE 'after'
END",
conditionValues = c('before','after'))
dim(battersBeforeAfter1960)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.