geo_ratios: Compute rates or ratios for a set of geographic entities over...
In sae2: Small Area Estimation: Time-Series Models

View source: R/geo_ratios.R

geo_ratios

R Documentation

Compute rates or ratios for a set of geographic entities over a set of years

Description

The function computes rates or ratios by a geographic code and the variable Year. If designvars is specified, the function also returns a data frame with linear substitutes to compute Taylor series variances.

Usage

geo_ratios(data,  geocode,  numerators,  denominators,  geonames, 
           new.names,  designvars)

Arguments

`data`	A data frame with the required variables, including a variable named `Year`.
`geocode`	A character variable with the name of the geographic variable for which separate estimates are of interest.
`numerators`	A character vector listing the names in `data` of the numerators of the ratios.
`denominators`	A character vector listing the names in `data` of the denominators of the ratios. If a single value is given, it will be used for all of the ratios.
`geonames`	An optional data frame containing `geocode` and one or more geographic variables, such as names, that will be merged into the results. There should be only one row for each value of `geocode`.
`new.names`	An optional character vector of the same length as `numerators` naming the resulting ratios. If `new.names` is not specified, the output ratios will have the same names as `numerators`.
`designvars`	Optional. If given, a character vector naming one or more survey design variables in `data` to use in forming linear substitutes for variance calculation of the ratios cross-classified by `Year` and the variable named by `geocode`. The vector should not include the `geocode` variable or `Year`.

Details

For programming simplicity, the function enforces the requirement that names should not be repeated in either numerators or new.names. Names may be repeated in denominators.

Rather than a typical survey file, the function expects the data frame data to contain weighted estimates for each analytic variable. As a simple example, to find the variance of the weighted mean of y with weights w, data should contain w and y * w. For convenience, the weighted estimates can still be assigned their original names in data, such as y. In this case,

numerators = y, denominators=w

would create the appropriate linear substitutes for the variance of the weighted mean.

This design of the function allows complex possibilities, such as estimating the variance of a rate where the numerator is based on one weight and the denominator is based on another. For example, estimation for the National Crime Victimization Survey requires this capability.

Value

If designvars is not specified, a named list with one element, a data frame containing the ratios sorted by geocode and Year.

If designvars is specified, a second element is added to the list, a data frame giving the totals of the linear substitutes by Year, geocode, and designvars. The elements of the list are named estimates and linear.subs.

Author(s)

Robert E. Fay

References

- Woodruff, R.S. (1971). A simple method for approximating the variance of a complex estimate. Journal of the American Statistical Association 66, 411-414.

Examples

require(survey)
require(MASS)
D <- 20 # number of domains
T <- 5 # number of years
samp <- 16 # number of sample cases per domain
set.seed(1)
# use conditional.mean=TRUE to generate true small area values
# without sampling error
Y.list <- mvrnormSeries(D=D, T=T, rho.dyn=.9, sigma.v.dyn=1, 
   sigma.u.dyn=.19, sigma.e=diag(5), conditional.mean=TRUE)
# generate sampling errors
e <- rnorm(samp * T * D, mean=0, sd=4)
Y <- Y.list[[2]] + tapply(e, rep(1:100, each=16), mean) 
data <- data.frame(Y=Y, X=rep(1:T, times=D))
# model fit with the true sampling variances
result.dyn  <- eblupDyn(Y ~ X, D, T, vardir = diag(100), data=data)
# individual level observations consistent with Y
Y2 <- rep(Y.list[[2]], each=16) + e 
data2 <- data.frame(Y=Y2, X=rep(rep(1:T, each=samp), times=D), 
                   Year=rep(rep(1:T, each=samp), times=D), 
                   weight=rep(1, times=samp*T*D),
                   d=rep(1:D, each=samp*T), 
                   strata=rep(1:(D*T), each=samp),
                   ids=1:(D*T*samp))
# geo_ratios with designvars specified
geo.results <- geo_ratios(data2, geocode="d", numerators="Y",
                          denominators="weight",
                          designvars=c("strata", "ids"))
# illustrative check                          
max(abs(geo.results[[1]]$Y - Y))
vcov.list <- vcovgen(geo.results[[2]], year.list=1:5, geocode="d", 
      designvars=c("strata", "ids"))
vcov.list[[1]]
# model fitted with directly estimated variance-covariances
result2.dyn <- eblupDyn(Y ~ X, D, T, vardir=vcov.list, data=data)
cor(result.dyn$eblup, result2.dyn$eblup)

sae2 documentation built on Aug. 23, 2023, 5:07 p.m.