impute: Impute data from a NN model
In henkelstone/NPEL.Classification: Classifaction and data-handling routines for NPEL Caribou Project

Description Usage Arguments Details Value Note Warning See Also Examples

This function takes an input raster, that is, an image on a categorical variable, and generates an image (or images) of other data using the input raster as a key or index variable. It is used to render maps of variables that are would not otherwise be possible, such as Nearest Neighbour algorithms on continuous data.

1	impute(inRdata, iData, outFilename, fx = NULL, x = NULL, y = NULL)

`inRdata`	a raster* object with the spatial data—typically a map of indices to the lookup table.
`iData`	or imputation data; the lookup table that takes map values (e.g. `siteID`) and converts them to some other value, e.g. a site characteristic such as cover, tree density, species composition, etc.
`outFilename`	a file to hold the output resulting raster object.
`fx`	(optional) a formula object specifying which column(s) in `iData` to output; if omitted, it will either be computed from x and y, or assumed to be all columns in `iData`.
`x`	(optional) a list of columns from `iData` to output; will be computed from fx if omitted.
`y`	(optional) the pivot column that connects the model and the imputation data; if neither fx nor this is specified then it is assumed to be the first column in `iData`.

It is perhaps easiest to explain the concept of imputation using an example: consider the case where the input raster represents the ID of the nearest neighbour to that pixel; it is possible to impute environmental data by looking up each ID in the original dataset and assigning that pixel the value (of the environmental variable) found at that site. So, if this pixel is nearest (in phase space) to site No.153, then we infer – impute – that it is also most likely to have similar environmental characteristics. This is significantly better than generating a map of classes, then inferring values from the mean of the class as there is a huge amount of information lost in mapping N sites to k classes, especially where N >> k.

This function is functionally similar to the SQL/database command JOIN; that is, it joins two groups of data using a common column, such that every time a value y occurs in the first table, some or all of the addition columns in the second table x are appended to the result. It is a glorified form of lookup table in which the vector of lookup values is all the pixels in the image.

Of course it is, in principle, possible to use this function to impute data that has been generated by some other type of model, however, the other methods included in this package are all able to generate continuous variable output directly. Imputation has only the benefit that it is possible to produce multiple output from a single rendering simply by imputing a different (suite of) variable(s). However, this computational benefit may be relevant for categorical data.

Note: that the notation used for fx may not be intuitive: the y variable, usually the ‘dependent’ variable is used as the pivot, which can intuitively seem like the dependent variable; in a like way the x variables, which are usually the ‘independent’ variables, are output here. Use caution when specifying the formula; that this function expects only a single term on the left and multiple terms on the right is a good clue as to which variables should be where.

a raster.brick of the imputed data with as many layers as specified.

An analysis that can be useful is to look at the frequency each site is used as a nearest neighbour. This is straightforward using the output of the imputation map. Example code is given below.

In an effort to streamline usage, this function will attempt to coerce non-numeric data into something that can be written using the raster package. To this end, if the data is found to be other than numeric, it is converted to numeric using the command as.numeric(factor(x)), which, as has been observed before in this documentation returns the indices of the factors (see warning section for factor. It should be possible to recover the values of the indices using this same typecast, however, there is a risk that there could be some glitch or error, and a mis-mapping could result between factor indices and actual values.

It would be much safer to do your own typecast before passing the data to impute! 'Nuf said...

factor and ecoGroup for more information on the factor index gotcha.
generateModels, and writeTile for more information on building models for imputation purposes.
nnErrMap for outputting nearest neighbour distances, and generating accuracies from these.

egTile <- readTile(file.path(system.file("extdata", "egTile", package = "NPEL.Classification"),''),
                   layers=c('base','grnns','wetns','brtns','dem','slp','asp','hsd'))
fx <- formula('siteID ~ brtns + grnns + wetns + dem + slp + asp + hsd')
nnData <- cbind(siteID=factor(1:nrow(siteData)),siteData)
nnData <- get_all_vars(fx, nnData)
models <- generateModels(nnData, suppModels[!suppModels %in% contModels], fx)

fNN <- paste0(dirname(tempfile()),'/Tmp_nn.tif')
egData <- writeTile (models[[1]], egTile, fNN, layers='class')

fImpute <- paste0(dirname(tempfile()),'/Tmp_nnImpute.tif')
egImpute <- impute (egData, nnData, fImpute, formula('siteID~ecoType+bedrockD+parentMaterial'))
plot (egImpute)

## Frequency/sensitivity of nearest neighbour site dependency
freq <- table(getValues(egData))
plot (freq/sum(freq), ylab='freq')
hiFreq <- freq[freq > 100]
index <- as.integer(rownames(hiFreq))
print (cbind(freq=hiFreq, nnData[nnData$siteID %in% index,]))

unlink (fNN)
unlink (fImpute)