R/MNAR.R

Defines functions MNAR.data

Documented in MNAR.data

#' @title This function inserts missingness (Missing Not at Random - MNAR) into the given data sets.
#' @description Missing values (MNAR) will be added to the Generated data sets (Generated by sim.skewed() or sim.normal() functions).
#' Under MNAR, the missingness was associated with the values of the variable itself.
#' In order to create MNAR, the variable was sorted first. Then, based on the given percent of missingness, 90 percent of the missing values were selected from the top.  The remaining 10 percent of missing values were assigned from the rest of the variable.
#' For example, let’s say the sample size was 300, and 20 percent of missingness was wanted (missing count: 300x20%=60). The values of the specific variable were sorted by decreasing. Missing values were added randomly to 54 of the top 60 values (60x90%=54). The remaining 6 values (60-54, 10%) were randomly assigned missing among the 240 values (300-60=240).
#' The missing values are shown as "NA" in the data files. The new data sets which have missing values will be saved as a different data file.
#' In each data file, the first column shows sample numbers. The second and the other columns show actual data sets for each item.
#' There also be a file named "MNAR_List.dat". The file includes the names of the data sets which has missing values in it.
#'
#' @author Fatih Orcan
#' @importFrom utils read.table write.table
#' @param misg vector of 0s and 1s for each item. 0 indicates non-missing and 1 indicates items which have missing values. If misg is not indicated all items are considered as missing.
#' @param perct Percent of missingness. The default is 10 percent.
#' @param dataList List of the names of data sets generated earlier either with the package functions or any other software.
#' @param f.loc File location. It indicates where the simulated data sets and "dataList" are located.
#' @export
#' @examples
#'
#' #   Data needed to be generated at the first step.
#'
#' fc<-fcors.value(nf=3, cors=c(1,.5,.6,.5,1,.4,.6,.4,1))
#' fl<-loading.value(nf=3, fl.loads=c(.5,.5,.5,0,0,0,0,0,0,0,0,.6,.6,.6,0,0,0,0,0,0,0,0,.4,.4))
#' floc<-tempdir()
#' sim.normal(nd=10, ss=100, fcors=fc, loading<-fl,  f.loc=floc)
#'
#'  #  Missing values were added at the second step.
#'
#' mis.items<-c(1,1,1,0,0,0,0,0)
#' dl<-"Data_List.dat"  # should be located in the working directory.
#' MNAR.data(misg = mis.items, perct = 20, dataList = dl, f.loc=floc)

MNAR.data<-function(misg=NULL, perct=10, dataList="Data_List.dat", f.loc){

  data.names<-as.matrix(read.table(paste(f.loc, "/", dataList,sep=""), header = FALSE))
  misg.names<-data.names
  veri<-read.table(paste(f.loc,"/", data.names[1,],sep=""))
  colnames(veri)<-c("ID", paste("x",seq(1:(dim(veri)[2]-1)),sep=""))

  nd<-dim(data.names)[1]
  for(i in 1:nd){
    veri<-read.table(paste(f.loc,"/", data.names[i,],sep=""))

    misg.names[i,]<-c(paste("MNAR_",i,".dat", sep=""))
    nitem<-dim(veri)[2]-1
    ss<-dim(veri)[1]

    MNAR.data<-veri
    for(j in 1:nitem){
      if(misg[j]==0){
        MNAR.data[,j+1]<-veri[,j+1]}
      else if(misg[j]==1){
        mis.ss<-(perct/100)*ss

        MNAR.data[sample(order(veri[,j+1],decreasing = T)[1:mis.ss],size=mis.ss*.9),j+1]<-NA
        MNAR.data[sample(order(veri[,j+1],decreasing = T)[(mis.ss+1):ss],size=mis.ss*.1),j+1]<-NA
        message(paste("MNAR_",i,".dat was completed", sep=""))
        }

      else {stop("Please use only 0s or 1s to indicated missingness")}}
    write.table(MNAR.data, file= paste(f.loc, "/MNAR_",i,".dat", sep=""), sep = "\t",
                col.names = FALSE, row.names = FALSE,quote = FALSE)
  }
  write.table(misg.names,file=paste(f.loc,"/MNAR_List.dat", sep = ""),
              col.names = FALSE, row.names = FALSE, quote = FALSE)
  message("Done!...")
}

Try the MonteCarloSEM package in your browser

Any scripts or data that you put into this service are public.

MonteCarloSEM documentation built on May 29, 2024, 11:56 a.m.