LmImpute: INTERNAL FUNCTION: Regeression imputation.

View source: R/LmImpute.r

LmImputeR Documentation

INTERNAL FUNCTION: Regeression imputation.

Description

Imputation by weighted regeression, using lm, allowing multiple explanatory variables and multiple response variables. Impute missing and wrong values (category 3) by the model based on representative data (category 1). Some data are considered correct but not representative (category 2).

Usage

LmImpute(
  data,
  model = "y~x",
  weights = NULL,
  limitModel = 2.5,
  limitIterate = 4.5,
  limitImpute = 50,
  maxiter = 10,
  returnIter = TRUE,
  returnYHat = FALSE,
  returnFirst = FALSE,
  returnLast = TRUE,
  returnFinal = FALSE,
  MultiFuction = function(x) {
     max(abs(x))
 },
  estimationGroup = TRUE,
  unfoldCoef = FALSE,
  category123 = NULL,
  forceCategory2 = rep(FALSE, N),
  BackTransform = NULL,
  warningEstimate = "estimate: Missing yImputed replaced by zero",
  removeEmpty = FALSE,
  NArStudHandling = warning,
  cvPercent = TRUE,
  returnSameType = FALSE
)

Arguments

data

Input data set (data.frame, data.table or list)

model

String with model formula

weights

NULL or string with weight expression

limitModel

Studentized residuals limit. Above limit -> category 2.

limitIterate

Studentized residuals limit for iterative calculation of studentized residuals.

limitImpute

Studentized residuals limit. Above limit -> category 3. No imputation when 0.

maxiter

Maximum number of iterations.

returnIter

When TRUE, iteration when observation was thrown outin output.

returnYHat

When TRUE, fitted values and corresponding estimates in output.

returnFirst

When TRUE, studentized residuals from first iteration in output.

returnLast

When TRUE, some results from last iteration in output.

returnFinal

When TRUE, extra results from final model in output.

MultiFuction

Transforming rStud for several responses into a single positive value.

estimationGroup

Total estimates will be be computed within each group. Default (and TRUE) is a single group (estimationGroup <- rep(1, N) ).

unfoldCoef

When TRUE several elements of coef will be spilt as several ouput elements. unfoldCoef=2 is a specialised variant used to ensure two coefficients in output (extra coefficient zero).

category123

When non-NULL, this is used directly with no iteration.

forceCategory2

Force category 2 (can be useful for elements imputed by another method)

BackTransform

When model contains transformation of y (e.g: "log(y)~x") a function (e.g: exp) can be supplied to transform back to original scale before calculation of leaveOutResid, yHat, yImputed, estimate, estimateYHat, estimateOrig and seRobust.

warningEstimate

Warning text when missing values. Use NULL to avoid warning.

removeEmpty

When TRUE empty elements will be removed from output.

NArStudHandling

Function (warning or stop) taking a message as input. Used when rStud in model (category 1) is missing.

cvPercent

When TRUE (default) cv output is in percent

returnSameType

When TRUE and when the type of input y variable(s) is integer, the output type of yImputed and estimate is also integer. Estimates/sums are then calculated from rounded imputed values.

Value

A list with separate elements. Each element can be a scalar, vector or a matrix. Possible elements are:

x

The input x variable

y

The input y variable

strata

The input strata variable

category123

The three imputation groups: representative (1), correct but not representative (2), wrong (3) and zero when x is missing.

yHat

Fitted values

yImputed

Imputed y-data

rStudFirst

Initial studentized residuals

rStud

The final (or last) studentized residuals

dffits

The final (or last) DFFITS statistic

hii

The final (or last) leverages (diagonal elements of hat matrix)

leaveOutResid

The final (or last) outside-model residual

iter

Iteration when observation was thrown out

N

Total number of observations (rows in data)

nImputed

Number of imputed observations

estimate

Totale estimate from imputed data

cv

Coefficient of variation = seEstimate/estimate. In percent when cvPercent=TRUE (default)

estimateYhat

Totale estimate based on model fits

estimateOrig

Estimate based on original data with missing set to zero

coef

The final (or last) model coefficient(s). Several variables when several parameters ("coef..Intercept.", "coef.x").

nModel

The final (or last) number of observations in model.

sigmaFirst

Initial square root of the estimated variance parameter

sigmaHat

The final (or last) square root of the estimated variance parameter

seEstimate

The final (or last) standard error estimate of the total estimate from imputed data

seRobust

Robust variant of seEstimate (experimental)

Examples

z = data.frame(  # Same example as in Thorud et.al (2010).
        x = c(1.1, 2.2, 3.3, 4.4, 5.5),
        y = c(2.3, 3.1, 3.2, 3.7, 4.5))
LmImpute(z)  # Simple regression
LmImpute(z, model = "y~x-1", weights = "1/x")  # Ratio model

rateData <- KostraData("rateData")               # Real Kostra data set
w <- rateData$data[, c(16, 5, 14, 15, 19)]
w <- w[is.finite(w[,"Ny.kostragruppe"]), ]       # Remove Longyearbyen
w[w[,"Ny.kostragruppe"]>13,"Ny.kostragruppe"]=13 # Combine small strata
names(w) = c("x", "y", "y14", "y15", "k")

# Ratio model within each strata assumming common variance
LmImpute(w, "y~x:factor(k)-1", weights = "1/x", estimationGroup = w$k)

# Similar to above, but two y variables
LmImpute(w, "cbind(y14,y15)~x:factor(k)-1", weights = "1/x", estimationGroup = w$k)

# Using transformation and "BackTransform"
LmImpute(w, "sqrt(y)~x",BackTransform = function(x) x^2,returnYHat = TRUE)

# Direct imputation of x
LmImpute(w, "I(y-x)~0",weights = "1/x",
  BackTransform = function(y){return(y+dynGet("data")$x)},
  limitModel = Inf, limitIterate = Inf, limitImpute = Inf,
  returnYHat = TRUE)




statisticsnorway/Kostra documentation built on Nov. 2, 2024, 6:40 p.m.