LmImpute: INTERNAL FUNCTION: Regeression imputation.
In statisticsnorway/Kostra: Functions for Kostra

LmImpute

R Documentation

INTERNAL FUNCTION: Regeression imputation.

Description

Imputation by weighted regeression, using lm, allowing multiple explanatory variables and multiple response variables. Impute missing and wrong values (category 3) by the model based on representative data (category 1). Some data are considered correct but not representative (category 2).

Usage

LmImpute(
  data,
  model = "y~x",
  weights = NULL,
  limitModel = 2.5,
  limitIterate = 4.5,
  limitImpute = 50,
  maxiter = 10,
  returnIter = TRUE,
  returnYHat = FALSE,
  returnFirst = FALSE,
  returnLast = TRUE,
  returnFinal = FALSE,
  MultiFuction = function(x) {
     max(abs(x))
 },
  estimationGroup = TRUE,
  unfoldCoef = FALSE,
  category123 = NULL,
  forceCategory2 = rep(FALSE, N),
  BackTransform = NULL,
  warningEstimate = "estimate: Missing yImputed replaced by zero",
  removeEmpty = FALSE,
  NArStudHandling = warning,
  cvPercent = TRUE,
  returnSameType = FALSE
)

Arguments

`data`	Input data set (data.frame, data.table or list)
`model`	String with model formula
`weights`	NULL or string with weight expression
`limitModel`	Studentized residuals limit. Above limit -> category 2.
`limitIterate`	Studentized residuals limit for iterative calculation of studentized residuals.
`limitImpute`	Studentized residuals limit. Above limit -> category 3. No imputation when 0.
`maxiter`	Maximum number of iterations.
`returnIter`	When TRUE, iteration when observation was thrown outin output.
`returnYHat`	When TRUE, fitted values and corresponding estimates in output.
`returnFirst`	When TRUE, studentized residuals from first iteration in output.
`returnLast`	When TRUE, some results from last iteration in output.
`returnFinal`	When TRUE, extra results from final model in output.
`MultiFuction`	Transforming rStud for several responses into a single positive value.
`estimationGroup`	Total estimates will be be computed within each group. Default (and TRUE) is a single group (estimationGroup <- rep(1, N) ).
`unfoldCoef`	When TRUE several elements of coef will be spilt as several ouput elements. unfoldCoef=2 is a specialised variant used to ensure two coefficients in output (extra coefficient zero).
`category123`	When non-NULL, this is used directly with no iteration.
`forceCategory2`	Force category 2 (can be useful for elements imputed by another method)
`BackTransform`	When model contains transformation of y (e.g: "log(y)~x") a function (e.g: exp) can be supplied to transform back to original scale before calculation of leaveOutResid, yHat, yImputed, estimate, estimateYHat, estimateOrig and seRobust.
`warningEstimate`	Warning text when missing values. Use NULL to avoid warning.
`removeEmpty`	When TRUE empty elements will be removed from output.
`NArStudHandling`	Function (warning or stop) taking a message as input. Used when rStud in model (category 1) is missing.
`cvPercent`	When TRUE (default) cv output is in percent
`returnSameType`	When TRUE and when the type of input y variable(s) is integer, the output type of yImputed and estimate is also integer. Estimates/sums are then calculated from rounded imputed values.

Value

A list with separate elements. Each element can be a scalar, vector or a matrix. Possible elements are:

`x`	The input x variable
`y`	The input y variable
`strata`	The input strata variable
`category123`	The three imputation groups: representative (1), correct but not representative (2), wrong (3) and zero when x is missing.
`yHat`	Fitted values
`yImputed`	Imputed y-data
`rStudFirst`	Initial studentized residuals
`rStud`	The final (or last) studentized residuals
`dffits`	The final (or last) DFFITS statistic
`hii`	The final (or last) leverages (diagonal elements of hat matrix)
`leaveOutResid`	The final (or last) outside-model residual
`iter`	Iteration when observation was thrown out
`N`	Total number of observations (rows in data)
`nImputed`	Number of imputed observations
`estimate`	Totale estimate from imputed data
`cv`	Coefficient of variation = seEstimate/estimate. In percent when cvPercent=TRUE (default)
`estimateYhat`	Totale estimate based on model fits
`estimateOrig`	Estimate based on original data with missing set to zero
`coef`	The final (or last) model coefficient(s). Several variables when several parameters ("coef..Intercept.", "coef.x").
`nModel`	The final (or last) number of observations in model.
`sigmaFirst`	Initial square root of the estimated variance parameter
`sigmaHat`	The final (or last) square root of the estimated variance parameter
`seEstimate`	The final (or last) standard error estimate of the total estimate from imputed data
`seRobust`	Robust variant of seEstimate (experimental)

Examples

z = data.frame(  # Same example as in Thorud et.al (2010).
        x = c(1.1, 2.2, 3.3, 4.4, 5.5),
        y = c(2.3, 3.1, 3.2, 3.7, 4.5))
LmImpute(z)  # Simple regression
LmImpute(z, model = "y~x-1", weights = "1/x")  # Ratio model

rateData <- KostraData("rateData")               # Real Kostra data set
w <- rateData$data[, c(16, 5, 14, 15, 19)]
w <- w[is.finite(w[,"Ny.kostragruppe"]), ]       # Remove Longyearbyen
w[w[,"Ny.kostragruppe"]>13,"Ny.kostragruppe"]=13 # Combine small strata
names(w) = c("x", "y", "y14", "y15", "k")

# Ratio model within each strata assumming common variance
LmImpute(w, "y~x:factor(k)-1", weights = "1/x", estimationGroup = w$k)

# Similar to above, but two y variables
LmImpute(w, "cbind(y14,y15)~x:factor(k)-1", weights = "1/x", estimationGroup = w$k)

# Using transformation and "BackTransform"
LmImpute(w, "sqrt(y)~x",BackTransform = function(x) x^2,returnYHat = TRUE)

# Direct imputation of x
LmImpute(w, "I(y-x)~0",weights = "1/x",
  BackTransform = function(y){return(y+dynGet("data")$x)},
  limitModel = Inf, limitIterate = Inf, limitImpute = Inf,
  returnYHat = TRUE)

statisticsnorway/Kostra documentation built on Nov. 2, 2024, 6:40 p.m.