ca: Empirical Classification Analysis (CA) and Inference
In SortedEffects: Estimation and Inference Methods for Sorted Causal Effects and Classification Analysis

View source: R/ca.R

ca	R Documentation

Empirical Classification Analysis (CA) and Inference

Description

ca conducts CA estimation and inference on user-specified objects of interest: first (weighted) moment or (weighted) distribution. Users can use t to specify variables in interest. When object of interest is moment, use cl to specify whether want to see averages or difference of the two groups.

Usage

ca(
  fm,
  data,
  method = c("ols", "logit", "probit", "QR"),
  var_type = c("binary", "continuous", "categorical"),
  var,
  compare,
  subgroup = NULL,
  samp_weight = NULL,
  taus = c(5:95)/100,
  u = 0.1,
  interest = c("moment", "dist"),
  t = c(1, 1, rep(0, dim(data)[2] - 2)),
  cl = c("both", "diff"),
  cat = NULL,
  alpha = 0.1,
  b = 500,
  parallel = FALSE,
  ncores = detectCores(),
  seed = 1,
  bc = TRUE,
  range_cb = c(1:99)/100,
  boot_type = c("nonpar", "weighted")
)

Arguments

`fm`	Regression formula
`data`	The data in use: full sample or subpopulation in interset
`method`	Models to be used for estimating partial effects. Four options: `"logit"` (binary response), `"probit"` (binary response), `"ols"` (interactive linear with additive errors), `"QR"` (linear model with non-additive errors). Default is `"ols"`.
`var_type`	The type of parameter in interest. Three options: `"binary"`, `"categorical"`, `"continuous"`. Default is `"binary"`.
`var`	Variable T in interset. Should be a character.
`compare`	If parameter in interest is categorical, then user needs to specify which two category to compare with. Should be a 1 by 2 character vector. For example, if the two levels to compare with is 1 and 3, then `c=("1", "3")`, which will calculate partial effect from 1 to 3. To use this option, users first need to specify `var` as a factor variable.
`subgroup`	Subgroup in interest. Default is `NULL`. Specifcation should be a logical variable. For example, suppose data contain indicator variable for women (female if 1, male if 0). If users are interested in women SPE, then users should specify `subgroup = data[, "female"] == 1`.
`samp_weight`	Sampling weight of data. Input should be a n by 1 vector, where n denotes sample size. Default is `NULL`.
`taus`	Indexes for quantile regression. Default is `c(5:95)/100`.
`u`	Percentile of most and least affected. Default is set to be 0.1.
`interest`	Generic objects in the least and most affected subpopulations. Two options: (1) `"moment"`: weighted mean of Z in the u-least/most affected subpopulation. (2) `"dist"`: distribution of Z in the u-least/most affected subpopulation. Default is `interest = "moment"`.
`t`	An index for ca object. Should be a 1 by ncol(data) indicator vector. Users can either (1) specify names of variables of interest directly, or (2) use 1 to indicate the variable of interest. For example, total number of variables is 5 and interested in the 1st and 3rd vars, then specify `t = c(1, 0, 1, 0, 0)`.
`cl`	If `moment = "interest"`, `cl` allows the user to get the variables of interest (specified in `t` option) of the most and least affected groups. The default is `"both"`, which shows the variables of the two groups; the alternative is `"diff"`, which shows the difference of the two groups. The user can use the `summary.ca` to tabulate the results, which also contain the standard errors and p- values. If `interest = "dist"`, this option doesn't have any bearing and user can leave it to be the default value.
`cat`	P-values in classification analysis are adjusted for multiplicity to account for joint testing of zero coefficients on for all variables within a category. Suppose we have selected specified 3 variables in interest: `t = c("a", "b", "c")`. Without loss of generality, assume `"a"` is not a factor, while `"b"` and `"c"` are two factors. Then users need to specify as `cat = c("b", "c")`. Default is `NULL`.
`alpha`	Size for confidence interval. Shoule be between 0 and 1. Default is 0.1
`b`	Number of bootstrap draws. Default is 500.
`parallel`	Whether the user wants to use parallel computation. The default is `FALSE` and only 1 CPU will be used. The other option is `TRUE`, and user can specify the number of CPUs in the `ncores` option.
`ncores`	Number of cores for computation. Default is set to be `detectCores()`, which is a function from package `parallel` that detects the number of CPUs on the current host. For large dataset, parallel computing is highly recommended since bootstrap is time-consuming.
`seed`	Pseudo-number generation for reproduction. Default is 1.
`bc`	Whether want the estimate to be bias-corrected. Default is `TRUE`. If `FALSE` uncorrected estimate and corresponding confidence bands will be reported.
`range_cb`	When `interest = "dist"`, we sort and unique variables in interest to estimate weighted CDF. For large dataset there can be memory problem storing very many of observations, and thus users can provide a Sort value and the package will sort and unique based on the weighted quantile of Sort. If users don't want this feature, set `range_cb = NULL`. Default is `c(1:99)/100`.
`boot_type`	Type of bootstrap. Default is `"nonpar"`, and the package implements nonparametric bootstrap. The alternative is `"weighted"`, and the package implements weighted bootstrap.

Details

All estimates are bias-corrected and all confidence bands are monotonized. The bootstrap procedures follow algorithm 2.2 as in Chernozhukov, Fernandez-Val and Luo (2018).

Value

If subgroup = NULL, all outputs are whole sample. Otherwise output are subgroup results. When interest = "moment", the output is a list showing

est Estimates of variables in interest.
bse Bootstrap standard errors.
joint_p P-values that are adjusted for multiplicity to account for joint testing for all variables.
pointwise_p P-values that doesn't adjust for join testing

If users have further specified cat (e.g., !is.null(cat)), the fourth component will be replaced with p_cat: P-values that are a djusted for multiplicity to account for joint testing for all variables within a category. Users can use summary.ca to tabulate the results.

When interest = "dist", the output is a list of two components:

infresults A list that stores estimates, upper and lower confidence bounds for all variables in interest for least and most affected groups.
sortvar A list that stores sorted and unique variables in interest.

We recommend using plot.ca command for result visualization.

Examples

data("mortgage")
### Regression Specification
fm <- deny ~ black + p_irat + hse_inc + ccred + mcred + pubrec +
ltv_med + ltv_high + denpmi + selfemp + single + hischl
### Specify characteristics of interest
t <- c("deny", "p_irat", "black", "hse_inc", "ccred", "mcred", "pubrec",
"denpmi", "selfemp", "single", "hischl", "ltv_med", "ltv_high")
### issue ca command
CA <- ca(fm = fm, data = mortgage, var = "black", method = "logit",
cl = "diff", t = t, b = 50, bc = TRUE)

SortedEffects documentation built on March 22, 2022, 9:05 a.m.

SortedEffects index

README.md Sorted Effects

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SortedEffects
Estimation and Inference Methods for Sorted Causal Effects and Classification Analysis

ca: Empirical Classification Analysis (CA) and Inference
In SortedEffects: Estimation and Inference Methods for Sorted Causal Effects and Classification Analysis

Empirical Classification Analysis (CA) and Inference

Description

Usage

Arguments

Details

Value

Examples

Related to ca in SortedEffects...

R Package Documentation

Browse R Packages

We want your feedback!

SortedEffects Estimation and Inference Methods for Sorted Causal Effects and Classification Analysis

ca: Empirical Classification Analysis (CA) and Inference In SortedEffects: Estimation and Inference Methods for Sorted Causal Effects and Classification Analysis

Empirical Classification Analysis (CA) and Inference

Description

Usage

Arguments

Details

Value

Examples

Related to ca in SortedEffects...

R Package Documentation

Browse R Packages

We want your feedback!

SortedEffects
Estimation and Inference Methods for Sorted Causal Effects and Classification Analysis

ca: Empirical Classification Analysis (CA) and Inference
In SortedEffects: Estimation and Inference Methods for Sorted Causal Effects and Classification Analysis