treat: Treatment of outliers

Description Usage Arguments Details Value See Also Examples

View source: R/coin_treat.R

Description

Takes the COIN object and Winsorises indicators where necessary or specified, or reverts to log transform or similar. This is done one indicator at a time.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
treat(
  COIN,
  dset = NULL,
  winmax = NULL,
  winchange = NULL,
  deflog = NULL,
  boxlam = NULL,
  t_skew = NULL,
  t_kurt = NULL,
  individual = NULL,
  indiv_only = NULL,
  bypass_all = NULL
)

Arguments

COIN

The COIN object

dset

The data set to treat

winmax

The maximum number of points to Winsorise for each indicator. If NA, will keep Winsorising until skewness and kurtosis thresholds achieved (but it is likely this will cause errors).

winchange

Logical: if TRUE (default), Winsorisation can change direction from one iteration to the next. Otherwise if FALSE, no change.

deflog

The type of transformation to apply if Winsorisation fails. If "log", use simple log(x) as log transform (note: indicators containing negative values will be skipped). If "CTlog", will do log(x-min(x) + a), where a <- 0.01*(max(x)-min(x)), similar to that used in the COIN Tool. If "CTlog_orig", this is exactly the COIN Tool log transformation, which is log(x-min(x) + 1). If "GIIlog", use GII log transformation. If "boxcox", performs a Box-Cox transformation. In this latter case, you should also specify boxlam. Finally, if "none", will return the indicator untreated.

boxlam

The lambda parameter of the Box-Cox transform.

t_skew

Absolute skew threshold (default 2)

t_kurt

Kurtosis threshold (default 3.5)

individual

A data frame specifying individual treatment for each indicator, with each row corresponding to one indicator to be treated. Columns are:

  • IndCode The code of the indicator to be treated.

  • Treat The type of treatment to apply, one of "win" (Winsorise), "log" (log), "GIIlog" (GII log), "CTlog" (COIN Tool log), "boxcox" (Box Cox), or "None" (no treatment).

  • Winmax The maximum number of points to Winsorise. Ignored if the corresponding entry in "Treat" is not "win".

  • Thresh Either NA, which means that Winsorisation will continue up to winmax with no checks on skew and kurtosis, or "thresh", which uses the skew and kurtosis thresholds specified in t_skew and t_kurt.

  • boxlam Lambda parameter for the Box Cox transformation

indiv_only

Logical: if TRUE, only the indicators specified in "individual" are treated. If FALSE, all indicators are treated: any outside of individual will get default treatment.

bypass_all

Logical: if TRUE, bypasses all data treatment and returns the original data. This is useful for sensitivity analysis and comparing the effects of turning data treatment on and off.

Details

Outliers are identified according to skewness and kurtosis thresholds. The algorithm attempts to reduce the absolute skew and kurtosis by successively Winsorising points up to a specified limit. If this limit is reached, it applies a nonlinear transformation.

The process is detailed in the COINr online documentation.

Value

If the input is a COIN, outputs an updated COIN with a new treated data set at .$Data$Treated, as well as information about the data treatment in .$Analysis$Treated. Else if the input is a data frame, outputs both the treated data set and the information about data treatment to a list.

See Also

Examples

1
2
3
4
5
6
7
8
# assemble ASEM COIN
ASEM <- assemble(IndData = ASEMIndData, IndMeta = ASEMIndMeta, AggMeta = ASEMAggMeta)
# treat raw data set, Winsorise up to a maximum of five points
ASEM <- treat(ASEM, dset = "Raw", winmax = 5)
# inspect what was done
ASEM$Analysis$Treated$TreatSummary
# check whether skew and kurtosis now within limits
ASEM$Analysis$Treated$StatTable$SK.outlier.flag

COINr documentation built on Nov. 30, 2021, 9:06 a.m.