fillNASVD: Function to Use SVD to Impute the Missing Values for Training...

Description Usage Arguments Value Examples

View source: R/fillNASVD.R

Description

Singular value decomposition (SVG) is used to impute the missing values for the training dataset. For each monitoring location, the time series of multivariate data is leveraged to impute the missing values using SVD.

Usage

1
fillNASVD(dset, cols, idF, dateF)

Arguments

dset

The dataframe having many missing values. Data format: dataframe

cols

A character vector to contain the column names (including the columns with missing values) used to impute the missing valeus

idF

Unique location identification

dateF

Date column name if any

Value

A dataframe base on the input dset, but with filled values.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Use the covariates for PM2.5 data as a example:

data("trainsample")
cols=c("ndvi","aod","wnd_avg","monthAv")
n=nrow(trainsample)
p=0.05
pn=as.integer(p*n)
trainsample2missed=trainsample
for(col in cols){
  index=sample(n,pn)
  trainsample2missed[index,col]=NA
}
trainsample2filled=fillNASVD(trainsample2missed,cols,"siteid","date")

#Examine the accuracy:
for(col in cols){
  index=which(is.na(trainsample2missed[,col]))
  obs=trainsample[index,col]
  missed=trainsample2missed[index,]
  sindex=match(interaction(missed$siteid,missed$date),
               interaction(trainsample2filled$siteid,trainsample2filled$date))
  pre=trainsample2filled[sindex,col]
  print(paste(col," missing value correlation: ",round(cor(obs,pre),2)))
  print(paste(col," missing value cv rmse: ",round(rmse(obs,pre),2)))
}

lspatial/sptemUS documentation built on May 29, 2019, 3:42 a.m.