ImputeRegression2: Imputation of a sigle variable (y) by a regression model...

View source: R/ImputeRegression2.R

ImputeRegression2R Documentation

Imputation of a sigle variable (y) by a regression model using a primary explanatory variable (x1) and a secondary explanatory variable (x2) for cases where the primary is missing.

Description

Imputation of a sigle variable (y) by a regression model using a primary explanatory variable (x1) and a secondary explanatory variable (x2) for cases where the primary is missing.

Usage

ImputeRegression2(
  data,
  idName = names(data)[1],
  strataName = NULL,
  x1Name = names(data)[3],
  x2Name = names(data)[4],
  yName = names(data)[5],
  method1 = "ordinary",
  method2 = "ordinary",
  limitModel = 2.5,
  limitIterate = 4.5,
  limitImpute = 50,
  returnSameType = TRUE
)

ImputeRegression2NewNames(
  ...,
  oldNames = c("yImputed", "Ntotal", "nImputedTotal", "AnImputedTotal", "BnImputedTotal",
    "estimateTotal", "AyHat", "ByHat", "AestimateOrig", "cvTotal"),
  newNames = c("estimate", "N", "nImputed", "AnImputed", "BnImputed", "estimate",
    "AestimateYHat", "BestimateYHat", "y", "cv")
)

ImputeRegression2Tall(..., iD = TalliD())

ImputeRegression2TallSmall(
  ...,
  iD = TalliD(),
  keep = c("ID", "estimate", "cv", "nImputed")
)

ImputeRegression2Wide(
  ...,
  addName = WideAddName(),
  sep = WideSep(),
  idNames = c("", "strata", ""),
  addLast = FALSE
)

ImputeRegression2WideSmall(
  ...,
  keep = c("id", "strata", "estimate", "cv", "nImputed"),
  addName = WideAddName(),
  sep = WideSep(),
  idNames = c("", "strata", ""),
  addLast = FALSE
)

Arguments

data

Input data set of class data.frame

idName

Name of id-variable(s)

strataName

Name of starta-variable. Single strata when NULL (default)

x1Name

Name of x1-variable

x2Name

Name of x2-variable

yName

Name of y-variable

method1

The method (model and weight) coded as a string: "ordinary" (default), "ratio", "noconstant", "mean" or "ratioconstant". I addition "ratio2" and "ratioconstant2" are alternatives where the weights are based on the other x-variable (x1<->x2).

method2

Similar to method2 above.

limitModel

Studentized residuals limit. Above limit -> group 2.

limitIterate

Studentized residuals limit for iterative calculation of studentized residuals.

limitImpute

Studentized residuals limit. Above limit -> group 3.

returnSameType

When TRUE (default) and when the type of input y variable(s) is integer, the output type of yImputed/estimate/estimateTotal is also integer. Estimates/sums are then calculated from rounded imputed values.

Details

Imputations are initially performed by running method1 using x1 within each strata. Division into three groups are based on studentized residuals. Calculations of studentized residuals are performed by iterativily throwing out observations from the model fitting. Missing imputed values caused by missing x1-values are thereafter imputed by running method2 using x2 within each strata. Combined estimates of seRobust,seEStimate and cv are calculated.

Value

Output of the alternative variants of the function are constructed similar to the variants of ImputeRegression.

Output of ImputeRegression2 and ImputeRegression2NewNames (using the names after or below) is a list of three data sets. micro has as many rows as input, aggregates has one row for each strata and total has a single row. Variables from the two imputations are named using "A" and "B". The individual variables (dropping "A" and "B") are:

micro consists of the following elements:

id

id from input

x1

The input x1 variable

x2

The input x2 variable

strata

The input strata variable (can be NULL)

category123

The three imputation groups: representative (1), correct but not representative (2), wrong (3).

yHat or estimateYHat

Fitted values

yImputed or estimate

Imputed y-data

rStud

The final studentized residuals

dffits

The final DFFITS statistic

hii

The final leverages (diagonal elements of hat matrix)

leaveOutResid

The final outside-model residual

aggregates consists of the following elements:

N

Number of observations in each strata

nImputed

Number of imputed observations in each strata

estimate

Total estimates from imputed data

cv

Coefficient of variation = seEstimate/estimate

estimateYhat

Totale estimate based on model fits

estimateOrig or y

Estimate based on original data with missing set to zero

coef

The final first model coefficient

coefB

The final second model coefficient or zeros when only one coefficient in model.

n

The final number of observations in model.

sigmaHat

The final square root of the estimated variance parameter

seEstimate

The final standard error estimate of the total estimate from imputed data

seRobust

Robust variant of seEstimate (experimental)

total consists of the following elements:

Ntotal or N

Number of observations

nImputedTotal or nImputed

Total number of imputed observations

estimateTotal or estimate

Total estimate for all strata

cvTotal or cv

Total cv for all strata

Examples


rateData <- KostraData("rateData")               # Real Kostra data set
w <- rateData$data[, c(17,19,3,16,5)]              # Data with id, strata, x1, x2 and y
w <- w[is.finite(w[,"Ny.kostragruppe"]), ]       # Remove Longyearbyen
ImputeRegression2(w, strataName = names(w)[2])    # Works without combining strata
w[w[,"Ny.kostragruppe"]>13,"Ny.kostragruppe"]=13 # Combine small strata
ImputeRegression2(w, strataName = names(w)[2]) # Ordinary regressions
ImputeRegression2(w, strataName = names(w)[2],x1Name = names(w)[4], method1="ratio") # x1=x2 and no imputation in round 2
ImputeRegression2(w, strataName = names(w)[2],method1="ratio2",method2="ratio") # ratio2 needed since x1=0
ImputeRegression2(w, strataName = names(w)[2],method1="ratioconstant2",method2="ratioconstant")
ImputeRegression2Tall(w, strataName = names(w)[2])
ImputeRegression2TallSmall(w, strataName = names(w)[2])
ImputeRegression2Wide(w, strataName = names(w)[2])
ImputeRegression2WideSmall(w, strataName = names(w)[2])

statisticsnorway/Kostra documentation built on July 8, 2023, 5:58 a.m.