# ddc: Data Defect Correlation In kuriwaki/ddi: The Data Defect Index for Samples that May not be IID

## Description

The Data Defect Correlation (ddc) is the correlation between response and group membership. It quantifies the correlation between the outcome of interest and the selection into the sample; when the sample selection is independent across members of the population, the ddc is zero. Currently both variables are binary. The data defect index (ddi) is the square of ddc. Squaring the d.d.c. is more useful for characterizing the asymptotics of ' MSE.

## Usage

 `1` ```ddc(mu, muhat, N, n, cv = NULL) ```

## Arguments

 `mu` Vector of population quantity of interest `muhat` Vector for sample estimate `N` Vector of population size `n` Vector of sample size `cv` Coefficient of variation of the weights, if survey weights exist and `muhat` is the weighted proportion. The coefficient of variation is a summary statistic computed by `sd(weights) / mean(weights)`.

## Value

A vector of d.d.c. of the same length of the input, or a scalar if all input variables are scalars.

## References

Meng, Xiao-Li (2018) <doi:10.1214/18-AOAS1161SF>, "Statistical Paradises and Paradoxes in Big Data (I): Law of Large Populations, Big Data Paradox, and the 2016 US Presidential Election." Annals of Applied Statistics 12:2, 685–726.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18``` ```library(tibble) library(dplyr) data(g2016) # 1. scalar input select(g2016, cces_pct_djt_vv, cces_n_vv, tot_votes, votes_djt) %>% summarize_all(sum) ## plug those numbers in ddc(mu = 62984824/136639786, muhat = 12284/35829, N = 136639786, n = 35829) # 2. vector input using "with" with(g2016, ddc(mu = pct_djt_voters, muhat = cces_pct_djt_vv, N = tot_votes, n = cces_n_vv)) # 3. vector input in tidy tibble transmute(g2016, st, ddc = ddc(mu = pct_djt_voters, muhat = cces_pct_djt_vv, N = tot_votes, n = cces_n_vv)) ```

kuriwaki/ddi documentation built on May 16, 2020, 12:47 p.m.