fillNASVD: Function to Use SVD to Impute the Missing Values for Training...

Description Usage Arguments Value Examples

Description

Singular value decomposition (SVG) is used to impute the missing values for the training dataset. For each monitoring location, the time series of multivariate data is leveraged to impute the missing values using SVD.

Usage

1
fillNASVD(dset, cols, idF, dateF)

Arguments

dset

The dataframe having many missing values. Data format: dataframe

cols

A character vector to contain the column names (including the columns with missing values) used to impute the missing valeus

idF

Unique location identification

dateF

Date column name if any

Value

A dataframe base on the input dset, but with filled values.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Use the covariates for PM2.5 data as a example:

data("trainsample")
cols=c("ndvi","aod","wnd_avg","monthAv")
n=nrow(trainsample)
p=0.05
pn=as.integer(p*n)
trainsample2missed=trainsample
for(col in cols){
  index=sample(n,pn)
  trainsample2missed[index,col]=NA
}
trainsample2filled=fillNASVD(trainsample2missed,cols,"siteid","date")

#Examine the accuracy:
for(col in cols){
  index=which(is.na(trainsample2missed[,col]))
  obs=trainsample[index,col]
  missed=trainsample2missed[index,]
  sindex=match(interaction(missed$siteid,missed$date),
               interaction(trainsample2filled$siteid,trainsample2filled$date))
  pre=trainsample2filled[sindex,col]
  print(paste(col," missing value correlation: ",round(cor(obs,pre),2)))
  print(paste(col," missing value cv rmse: ",round(rmse(obs,pre),2)))
}

sptemExp documentation built on July 7, 2019, 9:02 a.m.

Related to fillNASVD in sptemExp...