depvar: Variance Estimates for Means with Constrained Dependent...

Description Usage Arguments Value Author(s) References Examples

Description

Function for variance estimates of a mean given general constraints on the degree of dependence of the observations.

Usage

1
	depvar(y, theta, d, GR = NULL, case = "heteroskedastic", solver = "glpk", approximate = 1)

Arguments

y

a vector with the outcome variable.

theta

a vector with the coefficients or weights.

d

a vector with the observed degree; if length(d)=1 then d is the total number of non-zero (off-diagonal) elements of the adjancency matrix A (for solving a more relaxed version of the problem); if length(d)=nrow(v) then d is the total number of non-zero (off-diagonal) elements of each row of A.

GR

a matrix with the known dependencies. The default is GR = NULL.

case

a string with the variance estimate to be calculated. If case = heteroskedastic, then (1) in section 3.1 of Aronow et al. (2016) is calculated. If case = homoskedastic and GR is not NULL, then (7) in section 3.2 of the same paper is calculated. Finally, if case = homoskedastic and GR is NULL, (9) is calculated.

solver

a string with the optimization solver to be used. The options are: cplex, glpk and gurobi. The default solver is glpk but cplex or gurobi are much faster. Note that cplex and gurobi require a license, which is free for people affiliated to universities, but not otherwise. Between cplex and gurobi, the gurobi interface for R is much easier to install.

approximate

a scalar that determines whether an exact solution is to be found by solving the original integer programming problem (approximate = 0), or whether an approximate solution is to be obtained by solving the relaxed problem via linear programming (approximate = 1). Of course, obtaining the approximate solution is faster but will result in more conservative variance estimates. The default is approximate = 1.

Value

A list with the following objects:

V_hat

the variance estimate;

A_max

adjacency matrix that maximizes the variance;

obj_val

the objective value of the graph optimization problem at the optimum;

time

time elapsed to find the optimal solution.

Author(s)

Peter M. Aronow <peter.aronow@yale.edu>, Forrest W. Crawford <forrest.crawford@yale.edu>, Jose R. Zubizarreta <zubizarreta@columbia.edu>.

References

Aronow, P. M., Crawford, F. W., and Zubizarreta, J. R., (2017), "Confidence intervals for linear unbiased estimators under constrained dependence," submitted, X, X-X.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
################################# 
# Data
#################################

# Example with 100 nodes
data(example)

# Total number of observations
n = nrow(dat)

# Observed data 
# Observed dependencies (observed (known) a_ij's; this is A_R in (5) in the paper)
GR = GR
# Observed degrees
d = dat$degree
# Observed outcomes
y = dat$hiv
# Coefficients or weights
theta = rep(1, length(y))

################################# 
# Heteroskedastic case, solve (5) in the paper
#################################

# No known dependencies (so in (5) in the paper A_R (the matrix of known dependencies) is the n times n matrix of 0's)
depvar(y, theta, d, NULL, case = "heteroskedastic", solver = "glpk", approximate = 1)$V_hat

# Some known dependencies
depvar(y, theta, d, GR, case = "heteroskedastic", solver = "glpk", approximate = 1)$V_hat

################################# 
# Homoskedastic case, solve (8) in the paper
#################################

# Some known dependencies
depvar(y, theta, d, GR, case = "homoskedastic", solver = "glpk", approximate = 1)$V_hat

# Compare to the more conservative variance estimate from (9) in the paper
depvar(y, theta, d, NULL, case = "homoskedastic")$V_hat
var(y * theta)/n * (1 + sum(pmin(d, n-1))/n) 
	

jrzubizarreta/depinf documentation built on May 20, 2019, 2:07 a.m.