RBtest: Test of missing data mechanism using complete data
In RBtest: Regression-Based Approach for Testing the Type of Missing Data

Description Usage Arguments Value Author(s) Examples

View source: R/TestMD.R

This function tests the missing completely at random (MCAR) vs missing at random (MAR) by using the complete variables only.

1	RBtest(data)

data

Dataset with at least one complete variable. The variables could be either continuous, categorical or a mix of both.

A list of the following elements:

abs.nbrMD The absolute number of missing data per variable.
rel.nbrMD The percentage of missing data per variable.
type Vector of the same length than the number of variables of the dataset, where '0' is for variables with MCAR data, '1' is for variables with MAR data and '-1' is for complete variables.

Serguei Rouzinov rouzinovs@gmail.com and

André Berchtold Andre.Berchtold@unil.ch

Maintainer: Serguei Rouzinov rouzinovs@gmail.com

set.seed(60)
n<-100 # sample size
r<-5 # number of variables
mis<-0.2 # frequency of missing data
mydata<-matrix(NA, nrow=n, ncol=r) # mydata is a matrix of r variables
# following a U(0,1) distribution
for (i in c(1:r)){
	mydata[,i]<-runif(n,0,1)
}
bin.var<-sample(LETTERS[1:2],n,replace=TRUE, prob=c(0.3,0.7)) # binary variable [A,B].
# The probability of being in one of the categories is 0.3.
cat.var<-sample(LETTERS[1:3],n,replace=TRUE, prob=c(0.5,0.3,0.2)) # categorical variable [A,B,C].
num.var<-runif(n,0,1) # Additional continuous variable following a U(0,1) distribution
mydata<-cbind.data.frame(mydata,bin.var,cat.var,num.var,stringsAsFactors = TRUE)
# dataframe with r+3 variables
colnames(mydata)=c("v1","v2","X1","X2","X3","X4","X5", "X6") # names of columns
# MCAR on X1 and X4 by using v1 and v2. MAR on X3 and X5 by using X2 and X6.
mydata$X1[which(mydata$v1<=sort(mydata$v1)[mis*n])]<-NA # X1: (mis*n)% of MCAR data.
# All data above the (100-mis)th percentile in v1 are selected
# and the corresponding observations in X1 are replaced with missing data.
mydata$X3[which(mydata$X2<=sort(mydata$X2)[mis*n])]<-NA # X3: (mis*n)% of MAR data.
# All data above the (100-mis)th percentile in X2 are selected
# and the corresponding observations in X3 are replaced with missing data.
mydata$X4[which(mydata$v2<=sort(mydata$v2)[mis*n])]<-NA # X4: (mis*n)% of MCAR data.
# All data above the (100-mis)th percentile in v2 are selected
# and the corresponding observations in X4 are replaced with missing data.
mydata$X5[which(mydata$X6<=sort(mydata$X6)[mis*n])]<-NA # X5: (mis*n)% of MAR data.
# All data above the (100-mis)th percentile in X6 are selected
# and the corresponding observations in X5 are replaced with missing data.
mydata$v1=NULL
mydata$v2=NULL
RBtest(mydata)