# CA: Empirical classification analysis (CA) and inference In yuqimemeda/SortedEffects: Estimation and Inference Methods for Sorted Partial Effects and Classification Analysis

## Description

`CA` conducts CA estimation and inference on user-specified objects of interest: first (weighted) moment or (weighted) distribution. Users can use `t` to specify variables in interest. When object of interest is moment, use `cl` to specify linear combinations for hypothesis testing. All estimates are bias-corrected and all confidence bands are monotonized. The bootstrap procedures follow algorithm 2.2 as in Chernozhukov, Fernandez-Val and Luo (2018).

## Usage

 ```1 2 3 4 5 6``` ```CA(fm, data, method = "ols", var.type = "binary", var.T, compare, subgroup = NULL, samp_weight = NULL, taus = c(1:9)/10, u = 0.1, cl = matrix(c(1, 0), nrow = 2), t = c(1, 1, rep(0, dim(data)[2] - 2)), interest = "moment", cat = NULL, alpha = 0.1, B = 10, ncores = 1, seed = 1, bc = TRUE, range.cb = c(0.5:99.5)/100, boot.type = "nonpar") ```

## Arguments

 `fm` Regression formula `data` The data in use (full sample or subpopulation in interset) `method` Models to be used for estimating partial effects. Four options: `"logit"` (binary response), `"probit"` (binary response), `"ols"` (interactive linear with additive errors), `"QR"` (linear model with non-additive errors). Default is `"ols"`. `var.type` The type of parameter in interest. Three options: `"binary"`, `"categorical"`, `"continuous"`. Default is `"binary"`. `var.T` Variable T in interset. Should be a character. `compare` If parameter in interest is categorical, then user needs to specify which two category to compare with. Should be a 1 by 2 character vector. For example, if the two levels to compare with is 1 and 3, then `c=("1", "3")`, which will calculate partial effect from 1 to 3. To use this option, users first need to specify var.T as a factor variable. `subgroup` Subgroup in interest. Default is `NULL`. Specifcation should be a logical variable. For example, suppose data contains indicator variable for women (female if 1, male if 0). If users are interested in women SPE, then users should specify `subgroup = data[, "female"] == 1`. `samp_weight` Sampling weight of data. If null then function implements empirical bootstrap. If data specifies sampling weight, put that in and the function implements weighted (i.i.d exponential weights) bootstrap. Default is `NULL`. `taus` Indexes for quantile regression. Default is `c(1:9)/10`. `u` Percentile of most and least affected. Default is set to be 0.1. `cl` A pre-specified linear combination. Should be a 2 by L matrix. Default is `matrix(c(1,0), nrow=2)`. L-th column denotes L-th hypothesis For "moment" interest L means the number of hypotheses. `cl` must be specified as a matrix `t` An index for CA object. Should be a 1 by ncol(data) indicator vector. Users can either (1) specify names of variables of interest directly, or (2) use 1 to indicate the variable of interest. For example, total number of variables is 5 and interested in the 1st and 3rd vars, then specify `t = c(1, 0, 1, 0, 0)`. `interest` Generic objects in the least and most affected subpopulations. Two options: (1) `"moment"`: weighted mean of Z in the u-least/most affected subpopulation. (2) `"dist"`: distribution of Z in the u-least/most affected subpopulation. Default is `interest = "moment"`. `cat` P-values in classification analysis are adjusted for multiplicity to account for joint testing of zero coefficients on for all variables within a category. Specify all variables in interest in a list using numbers to denote relative positions. For example, if variables in interest are "educ", "male", "female", "low income", "middle income", and "high income", cat should be specified as `cat = list(a=1, b=c(2,3), c=c(4,5,6))`. Default of cat is `NULL`. `alpha` Size for confidence interval. Shoule be between 0 and 1. Default is 0.1 `B` Number of bootstrap draws. Default is 10. For more accurate results, we recommend 500. `ncores` Number of cores for computation. Default is set to be 1. For large dataset, parallel computing is highly recommended since bootstrap is time-consuming. `seed` Pseudo-number generation for reproduction. Default is 1. `bc` Whether want the estimate to be bias-corrected. Default is `TRUE`. If `FALSE` uncorrected estimate and corresponding confidence bands will be reported. `range.cb` When `interest = "dist"`, we sort and unique variables in interest to estimate weighted CDF. For large dataset there can be memory problem storing very many of observations, and thus users can provide a Sort value and the package will sort and unique based on the weighted quantile of Sort. If users don't want this feature, set `range.cb = NULL`. Default is `c(0.5:99.5)/100`. To see how `range.cb` makes a difference in the plot, refer to the examples in the companion vignette. `boot.type` Type of bootstrap. Default is `boot.type = "nonpar"`, and the package implements nonparametric bootstrap. An alternative is `boot.type = "weighted"`, and the package implements weighted bootstrap.

## Value

If `subgroup = NULL`, all outputs are whole sample. Otherwise output are subgroup results. When `interest = "moment"`, the output is a list showing

• `est` Estimates of variables in interest.

• `bse` Bootstrap standard errors.

• `joint_p` P-values that are adjusted for multiplicity to account for joint testing for all variables.

If users have further specified `cat` (e.g., `!is.null(cat)`), the output has a fourth component

• `p_cat` P-values that are adjusted for multiplicity to account for joint testing for all variables within a category.

When `interest = "dist"`, the output is a list of two components:

• `infresults` A list that stores estimates, upper and lower confidence bounds for all variables in interest for least and most affected groups.

• `sortvar` A list that stores sorted and unique variables in interest.

We recommend using `CAplot` command for result visualization.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16``` ```data("mortgage") fm <- deny ~ black + p_irat t <- c(rep(1, 2), rep(0, 14)) # Specify variables in interest cl <- matrix(c(1,0,0,1), nrow=2) # Meaning: show variables in interest for both groups CA <- CA(fm = fm, data = mortgage, var.T = "black", method = "logit", cl = cl, t = t) # Tabulate the results est <- matrix(CA\$est, ncol=2) se <- matrix(CA\$bse, ncol=2) Table <- matrix(0, ncol=4, nrow=2) Table[, 1] <- est[, 1] # Least Affected Bias-corrected estimate Table[, 2] <- se[, 1] # Corresponding SE Table[, 3] <- est[, 2] # Most affected Table[, 4] <- se[, 2] # Corresponding SE rownames(Table) <- colnames(CA\$est)[1:2] # assign names to each row colnames(Table) <- rep(c("Estimate", "SE"), 2) ```

yuqimemeda/SortedEffects documentation built on May 23, 2019, 9:51 a.m.