ImputeRegression: Imputation of a sigle variable (y) by a regression model...
In statisticsnorway/Kostra: Functions for Kostra

ImputeRegression

R Documentation

Imputation of a sigle variable (y) by a regression model using a single explanatory variable (x).

Description

Impute missing and wrong values (group 3) by the model based on representative data (group 1). Some data are considered correct but not representative (group 2).

Usage

ImputeRegression(
  data,
  idName = names(data)[1],
  strataName = NULL,
  xName = names(data)[3],
  yName = names(data)[4],
  method = "ordinary",
  limitModel = 2.5,
  limitIterate = 4.5,
  limitImpute = 50,
  returnSameType = TRUE,
  ...
)

ImputeRegressionNewNames(
  ...,
  Fun = ImputeRegression,
  oldNames = c("yImputed", "Ntotal", "nImputedTotal", "estimateTotal", "yHat",
    "estimateOrig", "cvTotal"),
  newNames = c("estimate", "N", "nImputed", "estimate", "estimateYHat", "y", "cv"),
  iD = NULL,
  keep = NULL
)

ImputeRegressionTall(..., iD = TalliD())

ImputeRegressionTallSmall(
  ...,
  iD = TalliD(),
  keep = c("ID", "estimate", "cv", "nImputed")
)

ImputeRegressionWide(
  ...,
  addName = WideAddName(),
  sep = WideSep(),
  idNames = c("", "strata", ""),
  addLast = FALSE
)

ImputeRegressionWideSmall(
  ...,
  keep = c("id", "strata", "estimate", "cv", "nImputed"),
  addName = WideAddName(),
  sep = WideSep(),
  idNames = c("", "strata", ""),
  addLast = FALSE
)

Arguments

`data`	Input data set of class data.frame
`idName`	Name of id-variable(s)
`strataName`	Name of starta-variable. Single strata when NULL (default)
`xName`	Name of x-variable
`yName`	Name of y-variable
`method`	The method (model and weight) coded as a string: "ordinary" (default), "ratio", "noconstant", "mean" or "ratioconstant".
`limitModel`	Studentized residuals limit. Above limit -> group 2.
`limitIterate`	Studentized residuals limit for iterative calculation of studentized residuals.
`limitImpute`	Studentized residuals limit. Above limit -> group 3.
`returnSameType`	When TRUE (default) and when the type of input y variable(s) is integer, the output type of yImputed/estimate/estimateTotal is also integer. Estimates/sums are then calculated from rounded imputed values.
`...`	Simplified specification of the above arguments and possibly the five arguments below. Can also be used to specify additional variable names that will be included in output (micro).
`Fun`	Function as input to ImputeRegressionNewNames for more general applications.
`oldNames`	Vector of output names to be changed.
`newNames`	Corresponding vector of new names.
`iD`	When non-NULL a new variable ID will be created (see details).
`keep`	When non-NULL Only variables listed in keep will be kept. This is input to ImputeRegressionNewNames and for more general applications keep apply to the three first list elements.
`addName`	NULL or vector of strings used to name columns according to origin frame.
`sep`	A character string to separate when addName apply
`idNames`	Names of a id variable within each data frame
`addLast`	When TRUE addName will be at end

Details

Imputations are performed by running an imputation model within each strata. Division into three groups are based on studentized residuals. Calculations of studentized residuals are performed by iterativily throwing out observations from the model fitting.

Below (Value) the names before or are unique and the names after or can be used to combine the data by stacking (rbind). The latter is the basis for the Tall/Wide/Small functions which has a single data frame as output.

More specifically ImputeRegressionNewNames is a wrapper to ImputeRegression and the Tall/Wide/Small functions are wrappers to ImputeRegressionNewNames.

The last four parameters (addName, sep, idNames addLast) are parameters to CbindIdMatch used by ImputeRegressionWide and ImputeRegressionWideSmall.

The parameter iD is used by ImputeRegressionTall and ImputeRegressionTallSmall. A character variable ID is created using the input names ("id" "strata" and "Landet"). If the input name correspond to av variable name this variable is used. If not, the input name is used direvtly (possibly replicated).

Value

Output of ImputeRegression and ImputeRegressionNewNames (using the names after or below) is a list of three data sets. micro has as many rows as input, aggregates has one row for each strata and total has a single row. The individual variables are:

micro consists of the following elements:

`id`	id from input
`x`	The input x variable
`y`	The input y variable
`strata`	The input strata variable (can be NULL)
`category123`	The three imputation groups: representative (1), correct but not representative (2), wrong (3).
`yHat \emph{or estimateYHat}`	Fitted values
`yImputed \emph{or estimate}`	Imputed y-data
`rStud`	The final studentized residuals
`dffits`	The final DFFITS statistic
`hii`	The final leverages (diagonal elements of hat matrix)
`leaveOutResid`	The final outside-model residual

aggregates consists of the following elements:

`N`	Number of observations in each strata
`nImputed`	Number of imputed observations in each strata
`estimate`	Total estimates from imputed data
`cv`	Coefficient of variation = seEstimate/estimate
`estimateYhat`	Totale estimate based on model fits
`estimateOrig \emph{or y}`	Estimate based on original data with missing set to zero
`coef`	The final first model coefficient
`coefB`	The final second model coefficient or zeros when only one coefficient in model.
`nModel`	The final number of observations in model.
`sigmaHat`	The final square root of the estimated variance parameter
`seEstimate`	The final standard error estimate of the total estimate from imputed data
`seRobust`	Robust variant of seEstimate (experimental)

total consists of the following elements:

`Ntotal \emph{or N}`	Number of observations
`nImputedTotal \emph{or nImputed}`	Total number of imputed observations
`estimateTotal \emph{or estimate}`	Total estimate for all strata
`cvTotal or \emph{cv}`	Total cv for all strata

Author(s)

Øyvind Langsrud

Examples


z = cbind(id=1:34,KostraData("ratioTest")[,c(3,1,2)])
ImputeRegression(z,strataName="k")

# Datasett med kjonn som eksta id
zkjonn  <- rbind(cbind(z,kjonn="mann"),cbind(z,kjonn="kvinne"))
zkjonn$y[1:34] <- zkjonn$y[1:34]  + 1:34

# Kjøring der id egentlig ikke blir brukt. Kjønn i output.
ImputeRegression(zkjonn,idName="id",strataName= "k",kjonnOutput="kjonn")

# Kjøring der id er kodet med id i list. Da lages data med unik id (første treff) uten feilmelding eller warning (kan endres)
ImputeRegression(zkjonn,idName=list(id="id"),strataName= "k",kjonnOutput="kjonn")

# Kjøring med sammensatt id + tar med enkelvariabler i output.
ImputeRegression(zkjonn,idName=c("id","kjonn"),strataName= "k",kjonnOutput="kjonn",idOutput="id")

# Kjøring med sammensatt id og samnnesat strata  + tar med enkelvariabler i output.
ImputeRegression(zkjonn,idName=c("id","kjonn"),strataName= c("k","kjonn"),kjonnOutput="kjonn",idOutput="id")

# Tilsvarende ved bruk av liste
ImputeRegression(zkjonn,idName=list(id=c("id","kjonn")),strataName= list(c("k","kjonn")),kjonnOutput=list("kjonn"))

# Bruker liste til å snevre inn til ett kjønn
ImputeRegression(zkjonn,idName=list(id="id",kjonn="mann"),strataName= list("k",kjonn="mann"),kjonnOutput=list("kjonn",kjonn="mann"))

ImputeRegression(z,strataName="k",method="ratio")
ImputeRegressionNewNames(z,strataName="k",method="ratio")
ImputeRegressionTall(z,strataName="k",method="ratio")
ImputeRegressionTallSmall(z,strataName="k",method="ratio")
ImputeRegressionWide(z,strataName="k",method="ratio")
ImputeRegressionWideSmall(z,strataName="k",method="ratio")


rateData <- KostraData("rateData")               # Real Kostra data set
w <- rateData$data[, c(17,19,16,5)]              # Data with id, strata, x and y
w <- w[is.finite(w[,"Ny.kostragruppe"]), ]       # Remove Longyearbyen
ImputeRegression(w, strataName = names(w)[2])    # Works without combining strata
w[w[,"Ny.kostragruppe"]>13,"Ny.kostragruppe"]=13 # Combine small strata
ImputeRegression(w, strataName = names(w)[2], method="ratio")
ImputeRegressionTallSmall(w, strataName = names(w)[2], method="ratio")
ImputeRegressionWideSmall(w, strataName = names(w)[2], method="ratio")

statisticsnorway/Kostra documentation built on Nov. 2, 2024, 6:40 p.m.