Description Usage Arguments Details Value Note Author(s) References See Also Examples

Adds to a data frame of survey data the replicate weights calculated according to the *"Delete-A-Group Jackknife"* (DAGJK) method.

1 2 3 |

`data` |
Data frame of survey data. |

`ids` |
Formula identifying clusters selected at subsequent sampling stages (PSUs, SSUs, ...). |

`strata` |
Formula identifying the stratification variable; |

`weights` |
Formula identifying the initial weights for the sampling units. |

`nrg` |
Number of "random groups" (and replicate weights) you want to create. |

`self.rep.str` |
Formula identifying self-representing strata (SR), if any; |

`check.data` |
Boolean ( |

`aux` |
If |

This function creates an object of class `kott.design`

. A `kott.design`

object is made up by the union of the replicated survey data and the metadata describing the sampling design. The metadata (stored as attributes of the object) are used to enable and guide processing and analyses provided by other functions in the EVER package (such as `kottcalibrate`

, `kottby`

, `desc`

, ...).

The `data`

, `ids`

, `weights`

and `nrg`

arguments are mandatory, while `strata`

, `check.data`

and `aux`

arguments are optional. The `data`

variables that are referenced by `ids`

, `weights`

and `strata`

(if specified) must not contain any missing value (`NA`

).

The `ids`

argument specifies the cluster identifiers. It is possible to specify a multi-stage sampling design by simply using a formula with the identifiers of clusters selected at subsequent sampling stages. For example, `ids=~id.PSU+id.SSU`

declares a two-stage sampling in which the first stage units are identified by the `id.PSU`

variable and second stage ones by the `id.SSU`

variable.

The `strata`

argument identifies the stratification variable. The `data`

variable referenced by `strata`

(if specified) must be a `factor`

. By default the sample is assumed to be non-stratified.

The `weights`

argument identifies the initial (or direct) weights for the units included in the sample. The `data`

variable referenced by `weights`

must be `numeric`

.

The `nrg`

argument selects the number of "random groups" (and replicate weights) you want to create by means of the DAGJK method [Kott 98-99-01]. The value of `nrg`

must be greater than 1 and less than or equal to the number of sampled PSUs (otherwise the function stops and prints an error message). If `nrg`

equals the number of sampled PSUs, the DAGJK method "reduces" to (that is, it provides identical results to) the traditional stratified jackknife method. The advantage of the DAGJK method over the traditional jackknife is that, unlike the latter, it remains computationally manageable even when dealing with "complex and big" surveys (tens of thousands of PSUs arranged in a large number of strata with widely varying sizes). In fact, the DAGJK method is known to provide, for a broad range of sampling designs and estimators, (near) unbiased standard error estimates even with a "small" number (e.g. a few tens) of replicate weights.

When dealing with a multistage, stratified sampling design that includes *self-representing (SR) strata* (i.e. strata containing PSUs selected with probability 1), the main contribution to the variance of the SR strata arises from the second stage units (*"variance PSUs"*). In this instance, the user can exploit the `self.rep.str`

argument to specify, by a formula, the `data`

variable identifying the SR strata: as a result the function will build the variance PSUs and take care of them. When choosing this option, the user must ensure that the variable referenced by `self.rep.str`

is `logical`

(with value `TRUE`

for SR strata and `FALSE`

otherwise) or `numeric`

(with value `1`

for SR strata and `0`

otherwise).

As an alternative, the user can attend to develop by himself the appropriate identifiers for the sampling units in `ids`

. To be precise, the identifier for the PSUs (say `id.PSU`

) must have, in the SR strata, values in correspondence 1:1 (for example they can be equal, provided this does not cause undesired duplications) with those of the SSUs identifier (say `id.SSU`

).

The optional argument `check.data`

allows to check the correct nesting of `data`

clusters (PSUs, SSUs, ...). If `check.data=TRUE`

the function checks that every unit selected at stage `k+1`

is associated to one and only one unit selected at stage `k`

. For a stratified design the function checks also the correct nesting of clusters within strata.

The optional argument `aux`

can usually be ignored: its default value selects the standard behaviour of the function. Invoking `kottdesign`

with `aux=TRUE`

can, on the other hand, prove useful for any user who wants to fully understand how the DAGJK method builds the replicate weights. If `aux=TRUE`

, the output data frame contains auxiliary columns that provide: the number of PSUs per stratum, the number of PSUs per stratum and random group and the multiplicative coefficients that transform the initial weights into replicate weights.

An object of class `kott.design`

. The data frame it contains includes (in addition to the original survey data):

`-` |
A new column named |

`-` |
The replicate weights columns (one per random group, |

The `kott.design`

class is a specialisation of the `data.frame`

class; this means that an object created by `kottdesign`

inherits from the `data.frame`

class and you can use on it every method defined on that class.

The EVER package implements the extended version of the DAGJK method [Kott 99-01]. It guarantees unbiased estimates of standard errors even when the number of PSUs sampled in some strata is small (that is, less than `nrg`

).

The rigorous [Kott 98-99-01] results were derived under the hypothesis of with replacement selection of PSUs. This means that the DAGJK method cannot include finite population corrections (*fpc*): this restriction is fully reflected in the EVER package.

If only one PSU (*lonely PSU*) has been selected in some non-self-representative strata (NSR), the `kottdesign`

function does not report an error message, rather a warning one. In fact, the extended DAGJK method automatically removes the contribution of strata containing lonely PSUs from the estimation of standard errors (obviously, their contribution remains when calculating the estimates). This is all the users have to remember, if they come across the warning message produced by `kottdesign`

. Whenever the described behaviour seems to be undesirable, a viable alternative in order to eliminate the lonely PSUs is to collapse strata in a suitable manner. In such a case, the price to pay is the possibility of ending up with an over-estimation of the standard errors. As far as the strata collapsing strategie is concerned, the EVER package does not provide (in the current version) any support to the user.

Unlike the conventional jackknife method, the DAGJK is a stochastic replication method. If, having fixed the sampling design and the number of replicates, it is applied a number of times to the same sample data frame, generally a different random groups composition results. This means that repeated invocations of the `kottdesign`

function, even if run with identical actual parameters, generate different `kott.design`

objects (and, consequently, different standard error estimates). What has been stated obviously does not apply when `nrg`

equals the number of sampled PSUs. If you really need it, you can however generate exactly the same results for subsequent applications of `kottdesign`

: you have only to keep fixed the seed of **R**'s random numbers generator (using the `set.seed`

function).

Diego Zardetto.

Kott, Phillip S. (1998) *"Using the Delete-A-Group Jackknife Variance Estimator in NASS Surveys"*, RD Research Report No. RD-98-01, USDA, NASS: Washington, DC.

Kott, Phillip S. (1999) *"The Extended Delete-A-Group Jackknife"*. Bulletin of the International Statistical Instititute. 52nd Session. Contributed Papers. Book 2, pp. 167-168.

Kott, Phillip S. (2001) *"The Delete-A-Group Jackknife"*. Journal of Official Statistics, Vol.17, No.4, pp. 521-526.

`desc`

for a concise description of `kott.design`

objects, `kottby`

, `kott.ratio`

, `kott.regcoef`

, `kott.quantile`

and `kottby.user`

for calculating estimates and standard errors, `kottcalibrate`

for calibrating replicate weights.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | ```
# Creation of kott.design objects starting with survey data sampled
# with different sampling designs (actually the survey data frame is
# always the same: the examples serve the purpose of illustrating
# the syntax).
data(data.examples)
# Two-stage stratified cluster sampling design (notice the presence of
# lonely PSUs):
kdes<-kottdesign(data=example,ids=~towcod+famcod,strata=~stratum,
weights=~weight,nrg=15)
desc(kdes)
# The same using collapsed strata (SUPERSTRATUM variable) to remove
# lonely PSUs:
kdes<-kottdesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM,
weights=~weight,nrg=15)
desc(kdes)
# Same design, but using the self.rep.str argument to identify
# the SR strata (actually towcod identifies the
# "variance PSUs" by construction):
kdes<-kottdesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM,
weights=~weight,nrg=15,self.rep.str=~sr)
desc(kdes)
# Two stage cluster sampling (no stratification):
kdes<-kottdesign(data=example,ids=~towcod+famcod,weights=~weight,nrg=15)
desc(kdes)
# One-stage stratified cluster sampling:
kdes<-kottdesign(data=example,ids=~towcod,strata=~SUPERSTRATUM,
weights=~weight,nrg=15)
desc(kdes)
# Stratified independent sampling design:
kdes<-kottdesign(data=example,ids=~key,strata=~SUPERSTRATUM,
weights=~weight,nrg=15)
desc(kdes)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.