checkData: Detailed unit data check and screener by data availability

Description Usage Arguments Details Value Examples

View source: R/coin_datacheck.R

Description

Gives detailed tables of data availability, and optionally screens units based on a data availability threshold and presence of zeros. Units can be optionally "forced" to be included or excluded, making exceptions for the data availability threshold.

Usage

1
2
3
4
5
6
7
8
9
checkData(
  COIN,
  dset = NULL,
  ind_thresh = NULL,
  zero_thresh = NULL,
  unit_screen = "none",
  Force = NULL,
  out2 = "COIN"
)

Arguments

COIN

The COIN object

dset

The data set to be checked/screened

ind_thresh

A data availability threshold used for flagging low data and screening units if unit_screen != "none". Default 0.66. Specify as a fraction.

zero_thresh

As ind_thresh but for non-zero values. Defaults to 0.05, i.e. it will flag any units with less than 5% non-zero values (equivalently more than 95% zero values).

unit_screen

Specifies whether and how to screen units based on data availability or zero values.

  • If set to "none" (default), does not screen any units.

  • If set to "byNA", screens units with data availability below ind_thresh

  • If set to "byzeros", screens units with non-zero values below zero_thresh

  • If set to "byNAandzeros", screens units based on either of the previous two criteria being true.

  • If you simply want to force a unit or units to be excluded (without any other screening), use the Force argument and set unit_screen = TRUE. unit_screen != "none" outputs a new data set .$Data$Screened.

Force

A data frame with any additional countries to force inclusion or exclusion. First column is "UnitCode". Second column "Status" either "Include" or "Exclude" for each country to force.

out2

Where to output the results. If "COIN" (default for COIN input), appends to updated COIN, otherwise if "list" outputs to data frame.

Details

The two main criteria of interest are NA values, and zeros. The summary table gives percentages of NA values for each unit, across indicators, and percentage zero values (as a percentage of non-NA values). Each unit is flagged as having low data or too many zeros based on thresholds.

This function currently only supports COINs as inputs, not data frames.

Value

An updated COIN with data frames showing missing data in .$Analysis, and if unit_screen != "none" outputs a new data set .$Data$Screened. If out2 = "list" wraps missing data stats and screened data set into a list.

Examples

1
2
3
4
5
6
7
# build ASEM COIN
ASEM <- assemble(IndData = ASEMIndData, IndMeta = ASEMIndMeta, AggMeta = ASEMAggMeta)
# return stats to the COIN, plus screened data set, return to list
ScreenedData <- checkData(ASEM, dset = "Raw", unit_screen = "byNA",
ind_thresh = 0.9, out2 = "list")
# See which units were removed
print(ScreenedData$RemovedUnits)

COINr documentation built on Nov. 30, 2021, 9:06 a.m.