pcafill: PCA-based missing-value filling

View source: R/pcafill.R

pcafillR Documentation

PCA-based missing-value filling

Description

Fills missing (station) values by predicting their values using multiple regression. The regression uses as input pcincipal components from PCA from the same (group of station) data, but where series with missing data have been excluded. This makes sense for (station) data where most of the variability is accounted for by a few leading modes. This method is not expected to be useful when there are many large data gaps.

Usage

pcafill(
  X,
  insertmiss = 0,
  ip = 1:4,
  mnv = 0,
  complete = FALSE,
  test = FALSE,
  verbose = FALSE
)

Arguments

X

station data (group of stations)

insertmiss

Used for testing and evaluating. Missing data are introduced to test the predictive capability

ip

Number of EOFs/PCAs to include in filling in. In many cases, it may be useful to keep this to a small set of values.

mnv

Minimum number of valid data points for any given time. Can be used to get around the problem with too many missing data

complete

Use pattern projection between PCA pattern and original data to get a complete record - otherwise a subset of times with sufficient data.

test

Extra test - debugging

verbose

Print diagnostics - debugging

N

Number of runs in Monte-Carlo simulation

max.miss

Maximum NAs to insert (insertmiss) in Monte-Carlo simulations

x

time series for calibrating regression analysis

y

PC input for regression analysis

Details

This function is handy for the downscaling of PCAs. See Benestad, R.E., D. Chen, A. Mezghani, L. Fan, K. Parding, On using principal components to represent stations in empirical-statistical downscaling, Tellus A 28326, accepted.

Value

The same as the input - station object with filled-in values

See Also

PCA, allgood

Examples

data('Tx.Norway')
X <- annual(Tx.Norway,FUN='mean',nmin=200)
ok<- apply(X,1,nv)
X <- subset(X,it=ok > 0)
Y <- pcafill(X)

plot(PCA(Y))
plot(c(coredata(Y)),c(coredata(X)))

## Monte-Carlo test with random selection of data points set to NA:
Y.test <- pcafill.test(X,max.miss=10,ip=1:3)
cor(Y.test)



metno/esd documentation built on March 9, 2024, 11:21 a.m.