Description Usage Arguments Value Examples
Singular value decomposition (SVG) is used to impute the missing values for the training dataset. For each monitoring location, the time series of multivariate data is leveraged to impute the missing values using SVD.
1 | fillNASVD(dset, cols, idF, dateF)
|
dset |
The dataframe having many missing values. Data format: |
cols |
A character vector to contain the column names (including the columns with missing values) used to impute the missing valeus |
idF |
Unique location identification |
dateF |
Date column name if any |
A dataframe base on the input dset, but with filled values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | # Use the covariates for PM2.5 data as a example:
data("trainsample")
cols=c("ndvi","aod","wnd_avg","monthAv")
n=nrow(trainsample)
p=0.05
pn=as.integer(p*n)
trainsample2missed=trainsample
for(col in cols){
index=sample(n,pn)
trainsample2missed[index,col]=NA
}
trainsample2filled=fillNASVD(trainsample2missed,cols,"siteid","date")
#Examine the accuracy:
for(col in cols){
index=which(is.na(trainsample2missed[,col]))
obs=trainsample[index,col]
missed=trainsample2missed[index,]
sindex=match(interaction(missed$siteid,missed$date),
interaction(trainsample2filled$siteid,trainsample2filled$date))
pre=trainsample2filled[sindex,col]
print(paste(col," missing value correlation: ",round(cor(obs,pre),2)))
print(paste(col," missing value cv rmse: ",round(rmse(obs,pre),2)))
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.