csranks | R Documentation |

Marginal and simultaneous confidence sets for ranks.

```
csranks(
x,
Sigma,
coverage = 0.95,
cstype = "two-sided",
stepdown = TRUE,
R = 1000,
simul = TRUE,
indices = NA,
na.rm = FALSE,
seed = NA
)
```

`x` |
vector of estimates containing estimated features by which the |

`Sigma` |
estimated covariance matrix of |

`coverage` |
nominal coverage of the confidence set. Default is 0.95. |

`cstype` |
type of confidence set ( |

`stepdown` |
logical; if |

`R` |
number of bootstrap replications. Default is 1000. |

`simul` |
logical; if |

`indices` |
vector of indices of |

`na.rm` |
logical; if |

`seed` |
seed for bootstrap random variable draws. If set to |

A `csranks`

object, which is a list with three items:

`L`

Lower bounds of the confidence sets for ranks indicated in

`indices`

`rank`

Estimated ranks from

`irank`

with default parameters`U`

Upper bounds of the confidence sets.

Suppose `j=1,\ldots,p`

populations (e.g., schools, hospitals, political parties, countries) are to be ranked according to
some measure `\theta=(\theta_1,\ldots,\theta_p)`

. We do not observe the true values `\theta_1,\ldots,\theta_p`

. Instead, for each population,
we have data from which we have estimated these measures, `\hat{\theta}=(\hat{\theta}_1,\ldots,\hat{\theta}_p)`

. The values `\hat{\theta}_1,\ldots,\hat{\theta}_p`

are estimates of the true values `\theta_1,\ldots,\theta_p`

and thus contain statistical uncertainty. In consequence, a ranking of the populations by
the values `\hat{\theta}_1,\ldots,\hat{\theta}_p`

contains statistical uncertainty and is not necessarily equal to the true ranking of `\theta_1,\ldots,\theta_p`

.

The command computes confidence sets for the rank of one, several or all of the populations (`indices`

indicates which of the `1,\ldots,p`

populations are of interest). `x`

is a vector containing the estimates
`\hat{\theta}_1,\ldots,\hat{\theta}_p`

and `Sigma`

is an estimate of the covariance matrix of `x`

. The method assumes that the estimates are asymptotically normal and the sample sizes of the datasets
are large enough so that `\hat{\theta}-\theta`

is approximately distributed as `N(0,\Sigma)`

. The argument `Sigma`

should contain an estimate of the covariance matrix `\Sigma`

. For instance, if for each population `j`

`\sqrt{n_j} (\hat{\theta}_j-\theta_j) \to_d N(0, \sigma_j^2)`

and the datasets for each population are drawn independently of each other, then `Sigma`

is a diagonal matrix

`diag(\hat{\sigma}_1^2/n_1,\ldots,\hat{\sigma}_p^2/n_p)`

containing estimates of the asymptotic variances divided by the sample size. More generally, the estimates in `x`

may be dependent, but then `Sigma`

must be an estimate of its covariance matrix including off-diagonal terms.

Marginal confidence sets (`simul=FALSE`

) are such that the confidence set for a population `j`

contains the true rank of that population `j`

with probability approximately
equal to the nominal coverage level. Simultaneous confidence sets (`simul=TRUE`

) on the other hand are such that the confidence sets for populations indicated in `indices`

cover the true ranks
of all of these populations simultaneously with probability approximately equal to the nominal coverage level. For instance, in the PISA example below, a marginal confidence set of a country `j`

covers the true
rank of country `j`

with probability approximately equal to 0.95. A simultaneous confidence set for all countries covers the true ranks of all countries simultaneously with probability approximately equal to 0.95.

The command implements the procedures developed and described in more detail in Mogstad, Romano, Shaikh, and Wilhelm (2023). The procedure is based on
on testing a large family of hypotheses for pairwise comparisons. Stepwise methods can be used to improve the power of the procedure by, potentially,
rejecting more hypotheses without violating the desired coverage property of the resulting confidence set. These are employed when
`stepdown=TRUE`

. From a practical point of view, `stepdown=TRUE`

is computationally more demanding, but often results
in tighter confidence sets.

The procedure uses a parametric bootstrap procedure based on the above approximate multivariate normal distribution.

Mogstad, Romano, Shaikh, and Wilhelm (2023), "Inference for Ranks with Applications to Mobility across Neighborhoods and Academic Achievements across Countries", forthcoming at Review of Economic Studies cemmap working paper \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/restud/rdad006")}

```
# simple simulated example:
n <- 100
p <- 10
X <- matrix(rep(1:p,n)/p, ncol=p, byrow=TRUE) + matrix(rnorm(n*p), 100, 10)
thetahat <- colMeans(X)
Sigmahat <- cov(X) / n
csranks(thetahat, Sigmahat)
# PISA example:
attach(pisa)
math_cov_mat <- diag(math_se^2)
# marginal confidence set for each country:
csranks(math_score, math_cov_mat, simul=FALSE)
# simultaneous confidence set for all countries:
csranks(math_score, math_cov_mat, simul=TRUE)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.