Unit Screening"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Unit screening is a screening or filtering of units based on data availability rules. Just like with indicators (columns), when a unit (row) has very few data points available, it may make sense to remove it. This avoids drawing conclusions on units with very few data points. It will also increase the percentage data availability of each indicator once the units have been removed.

The COINr function Screen() is a generic function with methods for data frames, coins and purses. It is a building function in that it creates a new data set in $.Data as its output.

Data frames

We begin with data frames. Let's take a subset of the inbuilt example data for demonstration. I cherry-pick some rows and columns which have some missing values.

library(COINr)

# example data
iData <- ASEM_iData[40:51, c("uCode", "Research", "Pat", "CultServ", "CultGood")]

iData

The data has four indicators, plus an identifier column "uCode". Looking at each unit, the data availability is variable. We have 12 units in total.

Now let's use Screen() to screen out some of these units. Specifically, we will remove any units that have less than 75% data availabilty (3 of 4 indicators with non-NA values):

l_scr <- Screen(iData, unit_screen = "byNA", dat_thresh = 0.75)

The output of Screen() is a list:

str(l_scr, max.level = 1)

We can see already that the "RemovedUnits" entry tells us that three units were removed based on our specifications. We now have our new screened data set:

l_scr$ScreenedData

And we have a summary of data availability and some other things:

head(l_scr$DataSummary)

This table is in fact generated by get_data_avail() - some more details can be found in the Analysis vignette.

Other than data availability, units can also be screened based on the presence of zeros, or on both - this is specified by the unit_screen argument. Use the Force^[Luke. Sorry.] argument to override the screening rules for specified units if required (either to force inclusion or force exclusion).

Coins

Screening on coins is very similar to data frames, because the coin method extracts the relevant data set, passes it to the data frame method, and then then puts the output back as a new data set. This means the arguments are almost the same. The only thing different is to specify which data set to screen, the name to give the new data set, and whether to output a coin or a list.

We'll build the example coin, then screen the raw data set with a threshold of 85% data availability and also name the new data set something different rather than "Screened" (the default):

# build example coin
coin <- build_example_coin(up_to = "new_coin", quietly = TRUE)

# screen units from raw dset
coin <- Screen(coin, dset = "Raw", unit_screen = "byNA", dat_thresh = 0.85, write_to = "Filtered_85pc")

# some details about the coin by calling its print method
coin

The printed summary shows that the new data set only has 48 units, compared to the raw data set with 51. We can find which units were filtered because this is stored in the coin's "Analysis" sub-list:

coin$Analysis$Filtered_85pc$RemovedUnits

The Analysis sub-list also contains the data availability table that is output by Screen(). As with the data frame method, we can also choose to screen units by presence of zeroes, or a combination of zeroes and missing values.

Purses

For completion we also demonstrate the purse method. Like most purse methods, this is simply applying the coin method to each coin in the purse, without any special features. Here, we perform the same example as in the coin section, but on a purse of coins:

# build example purse
purse <- build_example_purse(up_to = "new_coin", quietly = TRUE)

# screen units in all coins to 85% data availability
purse <- Screen(purse, dset = "Raw", unit_screen = "byNA",
                dat_thresh = 0.85, write_to = "Filtered_85pc")


Try the COINr package in your browser

Any scripts or data that you put into this service are public.

COINr documentation built on Oct. 9, 2023, 5:07 p.m.