impute_dataset: Imputation to make a dataset complete

View source: R/impute_dataset.R

impute_datasetR Documentation

Imputation to make a dataset complete

Description

For initial and final missing values there are two options: they could be completely cancelled or, otherwise propagated. For all other missing values within the dataset, deterministic linear imputation is applied in order to obtain complete data.

Usage

impute_dataset(
  myTB,
  countries,
  timeName = "time",
  tailMiss = c("cut", "constant")[2],
  headMiss = c("cut", "constant")[1]
)

Arguments

myTB

a dataset (tibble) time by countries for a given indicator, sorted by time. Note that times corresponding to missing data must be contained in the dataset.

countries

the collection of labels representing countries to process.

timeName

the string that represent the name of the time variable.

tailMiss

what should be done with subsequent missing values starting at the oldest year: cut those years, or input constant values equal to the first observed year.

headMiss

what should be done with subsequent missing values ending at the last year: cut those years, or input constant values equal to the first observed year.

Value

a list with three components: "res": the dataset (tibble) without missing values; "msg" and "err"

References

https://www.eurofound.europa.eu/system/files/2022-04/introduction-to-the-convergeu-package-0.6.4-tutorial-v2-apr2022.pdf

Examples



# Example 1
# Dataset in the format time by countries with missing values:
myTB2  <- tibble::tribble(
    ~time, ~UK, ~DE, ~IT,
    1988,   998,  1250, 332,
    1989,   NA, 868, NA,
    1990,   1150, 978, NA,
    1991,  1600,  NA, 802
    )
toBeProcessed <- c( "UK","DE","IT")
# Simplest Imputation using option "cut":
resImpu <- impute_dataset(myTB2, countries=toBeProcessed,
                         timeName = "time",
                         tailMiss = c("cut", "constant")[1],
                         headMiss = c("cut", "constant")[1])


# Imputation using option "constant":
resImpu1 <- impute_dataset(myTB2, countries=toBeProcessed,
    timeName = "time",
    tailMiss = c("cut", "constant")[2],
    headMiss = c("cut", "constant")[2])

# Imputation using both options "cut" and "constant":
resImput <- impute_dataset(myTB2, countries=toBeProcessed,
    timeName = "time",
    tailMiss = c("cut", "constant")[2],
    headMiss = c("cut", "constant")[1])

# Example 2
# dataset time by countries for the indicator "JQIintensity_i":
myTB <- extract_indicator_EUF(
    indicator_code = "JQIintensity_i", #Code_in_database
    fromTime= 1965,
    toTime=2016,
    gender= c("Total","Females","Males")[1],
    countries= convergEU_glb()$EU27$memberStates$codeMS)

# Imputation of missing values, option "cut":
myTBinp <- impute_dataset(myTB$res, timeName = "time",
    countries=convergEU_glb()$EU27$memberStates$codeMS,
    tailMiss = c("cut", "constant")[1],
    headMiss = c("cut", "constant")[1])

# Imputation of missing values, option "constant":
myTBinp1 <- impute_dataset(myTB$res, timeName = "time",
    countries=convergEU_glb()$EU27$memberStates$codeMS,
    tailMiss = c("cut", "constant")[2],
    headMiss = c("cut", "constant")[2])


convergEU documentation built on May 29, 2024, 11:15 a.m.