Description Usage Arguments Details Value Defining a user estimator function Note Author(s) References See Also Examples
Calculates estimates, standard errors and confidence intervals for userdefined estimators (even nonanalytic) in subpopulations.
1 2 3 4 5 6 
deskott 
Object of class 
by 
Formula specifying the variables that define the "estimation domains". If 
user.estimator 
R function to compute the value of the desired estimator on the original survey sample (see also 'Details' and 'Defining a user estimator function'). 
na.replace 
Value to be used to replace any 
vartype 

conf.int 
Boolean ( 
conf.lev 
Probability specifying the desired confidence level: the default value is 
df 
Degrees of freedom for the t distribution used to build confidence intervals (see 'Details'). 
... 
Additional parameters (if any) to be passed to the 
The kottby.user
function is designed to fully exploit the versatility of the DAGJK [Kott 9901] replication method. It is intended to provide the user with a userfriendly tool for calculating estimates, standard errors and confidence intervals for estimators defined by the user themselves. As is obvious, weighted estimates for the "userdefined estimator" are computed using suitable weights depending on the class of deskott
: calibrated weights for class kott.cal.design
and direct weights otherwise.
The optional argument by
specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If by=NULL
(the default option), the estimates produced by kottby
refer to the whole population. Estimation domains must be defined by a formula: for example the statement by=~B1:B2
selects as estimation domains the subpopulations determined by crossing the modalities of variables B1
and B2
. The deskott
variables referenced by by
(if any) must be factor
and must not contain any missing value (NA
).
The mandatory argument user.estimator
is used to specify the calculation method for the "userdefined estimator". In more precise terms: the value bound to the formal argument user.estimator
must be a function (an R object of class function
, even anonymous) able to compute the value of the required estimator on the sample data frame contained in deskott
. It is not necessary for the user.estimator
function's return value to be a single numerical value (it can be a vector, a matrix, an array, ...). In any case, it will be tacitly coerced to array by kottby.user
. More detailed indications on how the user.estimator
function must be constructed can be found in the 'Defining a user estimator function' section below.
The optional argument na.replace
makes it possible to specify a value to be used to replace any missing values generated by user.estimator
in the kottby.user
function output. By default na.replace=NULL
and the missing values are returned as NA
s.
The conf.int
argument allows to request the confidence intervals for the estimates. By default conf.int=FALSE
, that is the confidence intervals are not provided.
Whenever confidence intervals are requested (i.e. conf.int=TRUE
), the desired confidence level can be specified by means of the conf.lev
argument. The conf.lev
value must represent a probability (0<=conf.lev<=1
) and its default is chosen to be 0.95
.
Given an input kott.design
object with nrg
random groups, by default kottby.user
builds the confidence intervals making use of a t distribution with nrg1
degrees of freedom. Indeed the argument df
has a default value of nrg1
. Notice, however, that this default value should be used only when the userdefined function user.estimator
estimates a univariate parameter of interest. As an example, if user.estimator
were designed to estimate regression coefficients for a multiple linear regression with p predictors and no intercept, the right choice would be df = nrgp
.
The special argument ...
(dotdotdot) allows to specify additional parameters to be passed to the userdefined user.estimator
function.
The return value depends on the value of the input parameters. In the most general case, the function returns an object of class list
(typically a list made up of data frames).
In order to be correctly invoked by kottby.user
, the function that codifies the "userdefined estimator" must comply with specific syntactical restrictions. On the other hand there is not any constraint (at least in principle) on the semantics of the function, that is on "what it calculates".
The fundamental constraint is that the function's formal arguments list meets some minimal requirements. Suppose, for simplicity, that the function bound to the user.estimator
formal argument is named user.estfun
; than its structure must necessarily be of the following type:
user.estfun=function(data, weights, etc){body}
[1]
The structure [1] has to be interpreted as follows: user.estfun
body must contain all the instructions that would make it possible to compute the required estimator on the sample data contained in the data
data frame using the weights contained in its weights
column. The "etc"
symbol represents in [1] any other user.estfun
's formal arguments whose actual values can be specified, when invoking kottby.user
, using its special argument ...
(dotdotdot).
Sometimes users may need to employ "global" quantities in the body of the user.estfun
function, that is, quantities that, even when dealing with subpopulation estimates, should not be recalculated for the subpopulations themselves (the latter being the standard kottby.user
behaviour). This need is met by the global
function: the user has only to reference, wherever the need arises, the user.estfun
input data frame by means of the global(data)
expression rather than the standard one data
.
The global
function only accepts kott.design
class objects and can only be used within functions invoked by user.estfun
. An example that clearly illustrates the utility of global
is provided by the calculation of poverty estimates (see the poverty
function documented in the 'Examples' section below).
The freedom granted to the user in developing the user.estimator
function has important consequences that are worth highlighting. The key point is that, since only the user knows the semantics of user.estimator
, he must vouch for its correct functioning. In particular:
(i) The kottby.user
function must be able to invoke the user.estimator
function on the deskott
sample data frame and, if necessary, on its subsets defined by the by
variables. Consequently, when developing the function, the user must make sure that the instructions in its body
refer to variables that are actually contained in that data frame. This check could not be done by the kottby.user
caller function albeit at the expense of limiting the user's freedom in constructing his user.estimator
;
(ii) In the same way, due to user's freedom in developing user.estimator
, the kottby.user
function cannot prevent the generation of missing values in its output. The usefulness of the na.replace
parameter must, therefore, be considered as purely "cosmetic".
Diego Zardetto
Kott, Phillip S. (1999) "The Extended DeleteAGroup Jackknife". Bulletin of the International Statistical Instititute. 52nd Session. Contributed Papers. Book 2, pp. 167168.
Kott, Phillip S. (2001) "The DeleteAGroup Jackknife". Journal of Official Statistics, Vol.17, No.4, pp. 521526.
kottby
for estimating totals and means, kott.ratio
for estimating ratios between totals, kott.quantile
for estimating quantiles and kott.regcoef
for estimating regression coefficients.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133  # Some examples of userdefined estimators and illustration
# of their use via kottby.user. Remember that R functions
# expressing userdefined estimators must comply with the
# condition indicated in [1]. The 3 functions that appear
# in the following examples ('ones', 'ratio' and 'poverty')
# are contained in the data.examples file.
# The 'poverty' function (also) illustrates the correct use
# of the 'global' function.
data(data.examples)
# Creation of a kott.design object:
kdes<kottdesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM,
weights=~weight,nrg=15)
# 1) Estimator of the number of final units in the population.
# Use the name 'ones' to refer to the R function that
# expresses the estimator and define it as follows:
# ones < function (d, w)
# ######################################
# # Number of final units estimator. #
# ######################################
# {
# sum(d[, w])
# }
# Now using kottby.user is easy, for instance:
kottby.user(kdes,user.estimator=ones)
# 2) Estimator of ratios between totals (or means) for 2
# quantitative variables. Use the name 'ratio' to refer
# to the R function that expresses the estimator and
# define it as follows (notice the use of the etc
# arguments in [1]):
# ratio < function (d, w, num, den)
# ###########################################
# # Ratio estimator for totals (or means) #
# # of quantitative variables. #
# ###########################################
# {
# sum(d[, w] * d[, num])/sum(d[, w] * d[, den])
# }
# Calculating ratio estimates and standard errors
# is easy (notice the use of the \dots argument
# of kottby.user):
kottby.user(kdes,user.estimator=ratio,num="y1",den="x1")
# 3) A nonanalytic estimator: population percentage
# with income below the poverty threshold (defined,
# for the sake of simplicity, as 0.6 times the
# average income for the whole population).
# Call 'poverty' the estimator and define it as follows:
# poverty < function (d, w, y, threshold)
# ####################################################################
# # Population percentage with income below the poverty threshold. #
# # Suppose poverty threshold is defined as 0.6 times the average #
# # income for the whole population. #
# ####################################################################
# {
# if (missing(threshold)) {
# # if I do want to take into account the variance of the poverty
# # threshold letting it be recalculated replicate by replicate.
# d.global = global(d)
# th.value = 0.6 * sum(d.global[, w] * d.global[, y])/sum(d.global[, w])
# }
# else {
# # if I do not want to take into account the variance of the poverty
# # threshold, I will supply its point estimate to the 'threshold' argument.
# th.value = threshold
# }
# est = 100 * sum(d[d[, y] < th.value, w])/sum(d[, w])
# est
# }
# 3.1) First use: neglect the variance of the poverty threshold
# and supply to 'threshold' (by means of the \dots argument
# of kottby.user) its point estimate obtained using kottby:
pov.line<0.6*kottby(kdes,~income,estimator="mean")$mean
kottby.user(kdes,user.estimator=poverty,y="income",threshold=pov.line)
# 3.2) Second use: do take into account the variance of the poverty
# threshold letting it be recalculated replicate by replicate
# (thus not supplying any actual value to 'threshold'):
kottby.user(kdes,user.estimator=poverty,y="income")
# Notice that the standard error estimate for the 'poverty' estimator
# obtained in 3.2) cannot be calculated analytically by Taylor
# linearization.
# Notice the use of the 'global' function in the body of 'poverty':
# since the poverty status of each final unit depends on a global
# value (that is, the average income for the whole population)
# 'global' is used to prevent, whenever a subpopulation poverty
# estimate is needed, this global value being calculated locally
# i.e. within the subpopulation itself.
# In fact:
pov.line<0.6*kottby(kdes,~income,estimator="mean")$mean
kdes2<kott.addvars(kdes,pov.status=as.factor(ifelse(income<pov.line,
"poor","notpoor")))
kottby.user(kdes2,by=~pov.status,user.estimator=poverty,y="income")
# If the 'global' function were not used in 'poverty'
# the poverty threshold would be calculated relative to
# each individual subpopulation:
poverty2 < function (d, w, y, threshold)
###############################################
# Whithout relying on the 'global' function #
###############################################
{
if (missing(threshold)) {
th.value = 0.6 * sum(d[, w] * d[, y])/sum(d[, w])
}
else {
th.value = threshold
}
est = 100 * sum(d[d[, y] < th.value, w])/sum(d[, w])
est
}
kottby.user(kdes2,by=~pov.status,user.estimator=poverty2,y="income")
# This means that without 'global' a nonnull fraction of poors
# would be paradoxically estimated for the "nonpoors" subpopulation
# (and, conversely, a nonnull fraction of nonpoors among the "poors").

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.