svystatQ | R Documentation |
Calculates estimates, standard errors and confidence intervals for Quantiles of numeric variables in subpopulations.
svystatQ(design, y, probs = c(0.25, 0.5, 0.75), by = NULL,
vartype = c("se", "cv", "cvpct", "var"),
conf.lev = 0.95, na.rm = FALSE,
ties=c("discrete", "rounded"))
## S3 method for class 'svystatQ'
coef(object, ...)
## S3 method for class 'svystatQ'
SE(object, ...)
## S3 method for class 'svystatQ'
VAR(object, ...)
## S3 method for class 'svystatQ'
cv(object, ...)
## S3 method for class 'svystatQ'
confint(object, ...)
design |
Object of class |
y |
Formula defining the interest variable. |
probs |
Vector of probability values to be used to calculate the quantiles estimates. The default value selects estimates of quartiles. |
by |
Formula specifying the variables that define the "estimation domains". If |
vartype |
|
conf.lev |
Probability specifying the desired confidence level: the default value is |
na.rm |
Should missing values (if any) be removed from the variable of interest? The default is
|
ties |
How should duplicated observed values be treated? Select |
object |
An object of class |
... |
Additional arguments to |
This function calculates weighted estimates for the Quantiles of a quantitative variable using suitable weights depending on the class of design
: calibrated weights for class cal.analytic
and direct weights otherwise.
Standard errors are calculated using the so-called "Woodruff method" [Woodruff 52][Sarndal, Swensson, Wretman 92]: (i) first a confidence interval (at a given confidence level 1-a) is constructed for the relative frequency of units with values below the estimated quantile, (ii) then the inverse of the estimated cumulative relative frequency distribution (ECDF) is used to map this interval to a confidence interval for the quantile, (iii) lastly the desired standard error is estimated by dividing the length of the obtained confidence interval by the value 2*qnorm(1-a/2). Notice that the procedure above builds, in general, asymmetric confidence intervals around the estimated quantiles.
The mandatory argument y
identifies the variable of interest, that is the variable for which estimates of quantiles have to be calculated. The design
variable referenced by y
must be numeric
.
The optional argument probs
specifies the probability values (0.001<=probs[i]<=0.999
) corresponding to the quantiles one wants to estimate; the default option selects quartiles.
The optional argument by
specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If by=NULL
(the default option), the estimates produced by svystatQ
refer to the whole population. Estimation domains must be defined by a formula: for example the statement by=~B1:B2
selects as estimation domains the subpopulations determined by crossing the modalities of variables B1
and B2
. Notice that a formula like by=~B1+B2
will be automatically translated into the factor-crossing formula by=~B1:B2
: if you need to compute estimates for domains B1
and B2
separately, you have to call svystatQ
twice. The design
variables referenced by by
(if any) should be of type factor
, otherwise they will be coerced.
The conf.int
argument allows to request the confidence intervals for the estimates. By default conf.int=FALSE
, that is the confidence intervals are not provided.
Whenever confidence intervals are requested (i.e. conf.int=TRUE
), the desired confidence level can be specified by means of the conf.lev
argument. The conf.lev
value must represent a probability (0<=conf.lev<=1
) and its default is chosen to be 0.95
.
Missing values (NA
) in interest variables should be avoided. If na.rm=FALSE
(the default) they generate NAs in estimates (or even an error, if design
is calibrated). If na.rm=TRUE
, observations containing NAs are dropped, and estimates get computed on non missing values only. This implicitly assumes that missing values hit interest variables completely at random: should this not be the case, computed estimates would be biased.
Argument ties
addresses the problem of how to treat duplicated observed values (if any) when computing the ECDF. Option 'discrete'
(the default) is appropriate when the variable of interest is genuinely discrete, while 'rounded'
is a better choice for a continuous variable, i.e. when duplicates stem from rounding. In the first case the ECDF will show a vertical step corresponding to a duplicated value, in the second a smoother shape will be achieved by linear interpolation.
An object inheriting from the data.frame
class, whose detailed structure depends on input parameters' values.
Diego Zardetto
Woodruff, R.S. (1952) “Confidence Intervals for Medians and Other Position Measures”, Journal of the American Statistical Association, Vol. 47, No. 260, pp. 635-646.
Sarndal, C.E., Swensson, B., Wretman, J. (1992) “Model Assisted Survey Sampling”, Springer Verlag.
Estimators of Totals and Means svystatTM
, Ratios between Totals svystatR
, Shares svystatS
, Ratios between Shares svystatSR
, Multiple Regression Coefficients svystatB
, Complex Analytic Functions of Totals and/or Means svystatL
, and all of the above svystat
.
# Creation of a design object:
data(data.examples)
des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM,
weights=~weight)
# Estimate of the deciles of the income variable for
# the whole population:
svystatQ(des,~income,probs=seq(0.1,0.9,0.1),ties="rounded")
# Another design object:
data(sbs)
des<-e.svydesign(data=sbs,ids=~id,strata=~strata,weights=~weight,
fpc=~fpc)
# Estimation of the median value added
# for economic activity macro-sectors:
svystatQ(des,~va.imp2,probs=0.5,by=~nace.macro,
ties="rounded",vartype="cvpct")
# Estimation of the Interquartile Range (IQR) of the number
# of employees for economic activity macro-sectors:
apply(svystatQ(des,~emp.num,probs=c(0.25,0.75),by=~nace.macro)[,2:3],1,diff)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.