ImputeRegression: Imputation of a sigle variable (y) by a regression model...

View source: R/ImputeRegression.R

ImputeRegressionR Documentation

Imputation of a sigle variable (y) by a regression model using a single explanatory variable (x).

Description

Impute missing and wrong values (group 3) by the model based on representative data (group 1). Some data are considered correct but not representative (group 2).

Usage

ImputeRegression(
  data,
  idName = names(data)[1],
  strataName = NULL,
  xName = names(data)[3],
  yName = names(data)[4],
  method = "ordinary",
  limitModel = 2.5,
  limitIterate = 4.5,
  limitImpute = 50,
  returnSameType = TRUE,
  ...
)

ImputeRegressionNewNames(
  ...,
  Fun = ImputeRegression,
  oldNames = c("yImputed", "Ntotal", "nImputedTotal", "estimateTotal", "yHat",
    "estimateOrig", "cvTotal"),
  newNames = c("estimate", "N", "nImputed", "estimate", "estimateYHat", "y", "cv"),
  iD = NULL,
  keep = NULL
)

ImputeRegressionTall(..., iD = TalliD())

ImputeRegressionTallSmall(
  ...,
  iD = TalliD(),
  keep = c("ID", "estimate", "cv", "nImputed")
)

ImputeRegressionWide(
  ...,
  addName = WideAddName(),
  sep = WideSep(),
  idNames = c("", "strata", ""),
  addLast = FALSE
)

ImputeRegressionWideSmall(
  ...,
  keep = c("id", "strata", "estimate", "cv", "nImputed"),
  addName = WideAddName(),
  sep = WideSep(),
  idNames = c("", "strata", ""),
  addLast = FALSE
)

Arguments

data

Input data set of class data.frame

idName

Name of id-variable(s)

strataName

Name of starta-variable. Single strata when NULL (default)

xName

Name of x-variable

yName

Name of y-variable

method

The method (model and weight) coded as a string: "ordinary" (default), "ratio", "noconstant", "mean" or "ratioconstant".

limitModel

Studentized residuals limit. Above limit -> group 2.

limitIterate

Studentized residuals limit for iterative calculation of studentized residuals.

limitImpute

Studentized residuals limit. Above limit -> group 3.

returnSameType

When TRUE (default) and when the type of input y variable(s) is integer, the output type of yImputed/estimate/estimateTotal is also integer. Estimates/sums are then calculated from rounded imputed values.

...

Simplified specification of the above arguments and possibly the five arguments below. Can also be used to specify additional variable names that will be included in output (micro).

Fun

Function as input to ImputeRegressionNewNames for more general applications.

oldNames

Vector of output names to be changed.

newNames

Corresponding vector of new names.

iD

When non-NULL a new variable ID will be created (see details).

keep

When non-NULL Only variables listed in keep will be kept. This is input to ImputeRegressionNewNames and for more general applications keep apply to the three first list elements.

addName

NULL or vector of strings used to name columns according to origin frame.

sep

A character string to separate when addName apply

idNames

Names of a id variable within each data frame

addLast

When TRUE addName will be at end

Details

Imputations are performed by running an imputation model within each strata. Division into three groups are based on studentized residuals. Calculations of studentized residuals are performed by iterativily throwing out observations from the model fitting.

Below (Value) the names before or are unique and the names after or can be used to combine the data by stacking (rbind). The latter is the basis for the Tall/Wide/Small functions which has a single data frame as output.

More specifically ImputeRegressionNewNames is a wrapper to ImputeRegression and the Tall/Wide/Small functions are wrappers to ImputeRegressionNewNames.

The last four parameters (addName, sep, idNames addLast) are parameters to CbindIdMatch used by ImputeRegressionWide and ImputeRegressionWideSmall.

The parameter iD is used by ImputeRegressionTall and ImputeRegressionTallSmall. A character variable ID is created using the input names ("id" "strata" and "Landet"). If the input name correspond to av variable name this variable is used. If not, the input name is used direvtly (possibly replicated).

Value

Output of ImputeRegression and ImputeRegressionNewNames (using the names after or below) is a list of three data sets. micro has as many rows as input, aggregates has one row for each strata and total has a single row. The individual variables are:

micro consists of the following elements:

id

id from input

x

The input x variable

y

The input y variable

strata

The input strata variable (can be NULL)

category123

The three imputation groups: representative (1), correct but not representative (2), wrong (3).

yHat \emph{or estimateYHat}

Fitted values

yImputed \emph{or estimate}

Imputed y-data

rStud

The final studentized residuals

dffits

The final DFFITS statistic

hii

The final leverages (diagonal elements of hat matrix)

leaveOutResid

The final outside-model residual

aggregates consists of the following elements:

N

Number of observations in each strata

nImputed

Number of imputed observations in each strata

estimate

Total estimates from imputed data

cv

Coefficient of variation = seEstimate/estimate

estimateYhat

Totale estimate based on model fits

estimateOrig \emph{or y}

Estimate based on original data with missing set to zero

coef

The final first model coefficient

coefB

The final second model coefficient or zeros when only one coefficient in model.

nModel

The final number of observations in model.

sigmaHat

The final square root of the estimated variance parameter

seEstimate

The final standard error estimate of the total estimate from imputed data

seRobust

Robust variant of seEstimate (experimental)

total consists of the following elements:

Ntotal \emph{or N}

Number of observations

nImputedTotal \emph{or nImputed}

Total number of imputed observations

estimateTotal \emph{or estimate}

Total estimate for all strata

cvTotal or \emph{cv}

Total cv for all strata

Author(s)

Øyvind Langsrud

Examples


z = cbind(id=1:34,KostraData("ratioTest")[,c(3,1,2)])
ImputeRegression(z,strataName="k")

# Datasett med kjonn som eksta id
zkjonn  <- rbind(cbind(z,kjonn="mann"),cbind(z,kjonn="kvinne"))
zkjonn$y[1:34] <- zkjonn$y[1:34]  + 1:34

# Kjøring der id egentlig ikke blir brukt. Kjønn i output.
ImputeRegression(zkjonn,idName="id",strataName= "k",kjonnOutput="kjonn")

# Kjøring der id er kodet med id i list. Da lages data med unik id (første treff) uten feilmelding eller warning (kan endres)
ImputeRegression(zkjonn,idName=list(id="id"),strataName= "k",kjonnOutput="kjonn")

# Kjøring med sammensatt id + tar med enkelvariabler i output.
ImputeRegression(zkjonn,idName=c("id","kjonn"),strataName= "k",kjonnOutput="kjonn",idOutput="id")

# Kjøring med sammensatt id og samnnesat strata  + tar med enkelvariabler i output.
ImputeRegression(zkjonn,idName=c("id","kjonn"),strataName= c("k","kjonn"),kjonnOutput="kjonn",idOutput="id")

# Tilsvarende ved bruk av liste
ImputeRegression(zkjonn,idName=list(id=c("id","kjonn")),strataName= list(c("k","kjonn")),kjonnOutput=list("kjonn"))

# Bruker liste til å snevre inn til ett kjønn
ImputeRegression(zkjonn,idName=list(id="id",kjonn="mann"),strataName= list("k",kjonn="mann"),kjonnOutput=list("kjonn",kjonn="mann"))

ImputeRegression(z,strataName="k",method="ratio")
ImputeRegressionNewNames(z,strataName="k",method="ratio")
ImputeRegressionTall(z,strataName="k",method="ratio")
ImputeRegressionTallSmall(z,strataName="k",method="ratio")
ImputeRegressionWide(z,strataName="k",method="ratio")
ImputeRegressionWideSmall(z,strataName="k",method="ratio")


rateData <- KostraData("rateData")               # Real Kostra data set
w <- rateData$data[, c(17,19,16,5)]              # Data with id, strata, x and y
w <- w[is.finite(w[,"Ny.kostragruppe"]), ]       # Remove Longyearbyen
ImputeRegression(w, strataName = names(w)[2])    # Works without combining strata
w[w[,"Ny.kostragruppe"]>13,"Ny.kostragruppe"]=13 # Combine small strata
ImputeRegression(w, strataName = names(w)[2], method="ratio")
ImputeRegressionTallSmall(w, strataName = names(w)[2], method="ratio")
ImputeRegressionWideSmall(w, strataName = names(w)[2], method="ratio")


statisticsnorway/Kostra documentation built on Nov. 2, 2024, 6:40 p.m.