ImputeRegressionMulti: Imputation of a several variables (y's) by a regression model...

View source: R/ImputeRegressionMulti.R

ImputeRegressionMultiR Documentation

Imputation of a several variables (y's) by a regression model using a single explanatory variable (x).

Description

Impute missing and wrong values (group 3) by the model based on representative data (group 1). Some data are considered correct but not representative (group 2). This grouping is common for all y-variables.

Usage

ImputeRegressionMulti(
  data,
  idName = names(data)[1],
  strataName = NULL,
  xName = names(data)[3],
  yNames = names(data)[4:NCOL(data)],
  ySelect = 1:length(yNames),
  methodOneComp = "mean",
  method = "ordinary",
  limitModel = 2.5,
  limitIterate = 4.5,
  limitImpute = 50,
  returnSameType = TRUE
)

ImputeRegressionMultiNewNames(...)

ImputeRegressionMultiTall(..., iD = TalliD())

ImputeRegressionMultiTallSmall(..., iD = TalliD(), keep = c("ID", "nImputed"))

ImputeRegressionMultiWide(
  ...,
  addName = WideAddName(),
  sep = WideSep(),
  idNames = c("", "strata", ""),
  addLast = FALSE
)

ImputeRegressionMultiWideSmall(
  ...,
  keep = c("id", "strata", "nImputed"),
  addName = WideAddName(),
  sep = WideSep(),
  idNames = c("", "strata", ""),
  addLast = FALSE
)

Arguments

data

Input data set of class data.frame

idName

Name of id-variable(s)

strataName

Name of starta-variable. Single strata when NULL (default)

xName

Name of x-variable

yNames

Names of y-variables

ySelect

Indices of yNames to extract a single component from

methodOneComp

Method used to extract a single component coded as a string: "mean" (default),"pca","pcaMedian" or "pcaStd"

method

The method (model and weight) coded as a string: "ordinary" (default), "ratio", "noconstant", "mean" or "ratioconstant".

limitModel

Studentized residuals limit. Above limit -> group 2.

limitIterate

Studentized residuals limit for iterative calculation of studentized residuals.

limitImpute

Studentized residuals limit. Above limit -> group 3.

returnSameType

When TRUE (default) and when the type of input y variables is integer, the output type of imputations and estimates from the final run is also integer. Those estimates/sums are then calculated from rounded imputed values.

Details

Multivariate imputations are performed by running an imputation model within each strata. Thus, the three groups (category123) are the same for all y-variables. Division into the three groups are based on studentized residuals from a inititial run with a single variable. Calculations of studentized residuals are performed by iterativily throwing out observations from the model fitting. The single initial variable can be a original variable (ySelect is a single number) or a component extracted according to methodOneComp.

Calculations of studentized residuals are performed by iterativily throwing out observations from the model fitting.

Missing x-values are not allowed in this version.

Value

Output of the alternative variants of the function (Tall, Wide, Small) are constructed similar to the variants of ImputeRegression.

Output of ImputeRegressionMulti and ImputeRegressionMultiNewNames (using the names after or below) is a list where the first three elements are ouput from the initial run with a single variable: micro has as many rows as input, aggregates has one row for each strata and total has a single row. The individual variables of these three elements are:

micro consists of the following elements:

id

id from input

x

The input x variable

y

The input y variable

strata

The input strata variable (can be NULL)

category123

The three imputation groups: representative (1), correct but not representative (2), wrong (3).

yHat \emph{or estimateYHat}

Fitted values

yImputed \emph{or estimate}

Imputed y-data

rStud

The final studentized residuals

dffits

The final DFFITS statistic

hii

The final leverages (diagonal elements of hat matrix)

leaveOutResid

The final outside-model residual

aggregates consists of the following elements:

N

Number of observations in each strata

nImputed

Number of imputed observations in each strata

estimate

Total estimates from imputed data

cv

Coefficient of variation = seEstimate/estimate

estimateYhat

Totale estimate based on model fits

estimateOrig \emph{or y}

Estimate based on original data with missing set to zero

coef

The final first model coefficient

coefB

The final second model coefficient or zeros when only one coefficient in model.

n

The final number of observations in model.

sigmaHat

The final square root of the estimated variance parameter

seEstimate

The final standard error estimate of the total estimate from imputed data

seRobust

Robust variant of seEstimate (experimental)

total consists of the following elements:

Ntotal \emph{or N}

Number of observations

nImputedTotal \emph{or nImputed}

Total number of imputed observations

estimateTotal \emph{or estimate}

Total estimate for all strata

cvTotal or \emph{cv}

Total cv for all strata

The other output elements are from the final run with all y-variables. These elements are:

\strong{MyImputed}

Matrix of imputed y-data

\strong{Mestimate}

Matrix of total estimates from imputed data

\strong{Mcv}

Matrix of coefficient of variation = seEstimate/estimate

\strong{MestimateTotal}

Matrix of total estimates for all strata (a single row)

\strong{McvTotal}

Matrix of total cvs for all strata (a single row)

Examples


z=KostraData("ratioTest")
z2=cbind(id=10*(1:NROW(z)),z[,c(3,1,2)],y2=z$y+z$x)
ImputeRegressionMulti(z2,strataName="k",method="ratio")
ImputeRegressionMultiNewNames(z2,strataName="k")
ImputeRegressionMultiTall(z2,strataName="k")
ImputeRegressionMultiTallSmall(z2,strataName="k")
ImputeRegressionMultiWide(z2,strataName="k")
ImputeRegressionMultiWideSmall(z2,strataName="k")

rateData <- KostraData("rateData")               # Real Kostra data set
w <- rateData$data[, c(17,19,16,5:10)]           # Data with id, strata, x and many ys
w <- w[is.finite(w[,"Ny.kostragruppe"]), ]       # Remove Longyearbyen
w[w[,"Ny.kostragruppe"]>13,"Ny.kostragruppe"]=13 # Combine small strata
ImputeRegressionMulti(w, strataName = names(w)[2])
names(w)[4:9] = paste("y",1:6,sep="")           # rename for nicer output
ImputeRegressionMulti(w, strataName = names(w)[2], method="ratio")
ImputeRegressionMulti(w, method="ratioconstant") # No strata

statisticsnorway/Kostra documentation built on Sept. 25, 2024, 10:37 a.m.