Description Functions Sample Data Utility functions and Constants Future Expansion Warning See Also
For aiding in the processing and classification of remote sensed data, and rendering of imputed maps
This package aims to aid and simplify the following tasks:
reading and writing multilayer raster TIFs (relying heavily on the raster package)
sampling locational data from these rasters, i.e. extracting raster data for field sites
grouping/lumping classes for reduced or simplified analysis, e.g. to increase sample size in each class
streamlining building classification models from several packages
to aid in the analysis of bundles of models, particularly accuracy metrics but also a limited VIMP metric
streamlining the rendering of output rasters derived from these classifiers and the input rasters
to provide plotting of these data
to allow extensibility of the above functionality
to provide education and examples for this type of analysis
These tasks fall into six main groups with the following functions associated with each task:
Read in a raster
readTile
– read a collection of TIF files from a folder and compile them into a single raster.stack.
Extract data from the raster
extractPoints
– given a collection of spatial points (maptools), extract the raster
data under them. This is allows the construction of a model linking site characteristics–most notably
ecoSite–to remote sense variables.
Generate Models
generateModels
– given some data and a list of model types, create a list of models. This returns a
list of models, which together can be treated as a whole using many of the analytical functions.
Assess model accuracy and variable importance
npelVIMP
– generate variable importance data for a model. This was developed as a way of
finding VIMP for nearest neighbour models but has been expanded to generate VIMP data for all models included in
this package using the same leave-one-out technique.
npelVIF
– compute the variable inflation factor for a model.
classAcc
– compute the accuracies for a specified model: class accuracies for categorical data, and R-squared
for regression models.
modelAccs
– report the accuracies and VIMP data for a list of models.
validate
– validate a specified model, that is, use a validation dataset to determine accuracy: class accuracies
for categorical data, and R-squared for regression models.
modelsValid
– validate a list of models.
nnErrMap
– produce a map of nearest neigbour distances or errors.
Render Output
writeTile
– generate a output raster(s) given a collection of input rasters and a single model.
writeTiles
– generate a collection output raster given a input rasters and a list of models.
impute
– create a map of a variable based via a lookup table and second map.
Visualize the results
plotTile
– plot a single model; doesn't work yet???
plotTiles
– plot a list of models; doesn't work yet???
A small selection of data has been included in the package for didactic and testing purposes:
egTile
– a sample tile comprising an .rda file linked to a (small) collection of tifs
siteData
– a (small) subset of site data
ecoGroup
– an example transformation 'function' including labels and suggested colours
water
– an example of a water mask
There are several other common tasks that this package aims to streamline:
A few utilities function encapsulating common tasks:
sortLevels
– sort the levels of a factor so they are in order
trimLevels
– trim the levels of a factor so only levels that appear in the variable are present
mergeLevels
– merge the levels of two factor variables
factorValues
– as outlined in the warnings section of the help file for factors
(?factor
) there is a common gotcha when dealing with factors: converting numerical factors using
as.numeric
returns the factor indices not the values as expected.
rad2deg
– convert radians to degrees; for slope, aspect, hillshade etc.
deg2rad
– convert degrees to radians; for slope, aspect, hillshade etc.
fx2vars
– convert a formula object to lists of names of x and y, and vice versa.
prob2class
– convert a matrix of probabilities into a factor of classes; each column is taken to represent a
different class and each row is a different datapoint.
Object Oriented access to model internals: The various modelling packages that are used by NPEL.Classification all have different ways of storing their internals, nor do they all store the same data. In a few cases, the existing package did not even wrap their models in classes; this has been done in this package so access to the models can be standardized through S3 overloading. These OOP methods clean up the situation by providing a common interface for all the data, and in some cases generating the data when necessary.
isCat
– was the model built with categorical data
isCont
– was the model built with continuous data
getData
– the data used to build this model.
getClasses
– the list of classes present in this model.
getProb
– the probability matrix from a model built with categorical data; if the parent package doesn't
natively support probabilities, a matrix will be generated in which the selected class has probability=1 and the others are 0.
getFormula
– the formula used to generate this model.
getArgs
– the arguments specific to this model type used when building this model.
getFitted
– the fitted data from the original dataset.
getVIMP
– variable importance data; generated for some model types.
buildModel
– a single interface for building a model from any of the supported packages; see the documentation
for more information on how the desired model type is passed to the function.
buildPredict
– build a function that can be used to predict new values for this model; again this function
standardizes the interface across all supported model types.
Global Package Constants: These constants define the scope of this package with respect to the models/packages it uses.
suppModels
– these are the model types that NPEL.Classification currently supports.
probModels
– these are the model types that can give probability outputs when built on categorical values.
contModels
– these are the model types that (natively) support continuous variables.
While this section is more theoretical, a word on adding other modelling packages to this package. Given the object oriented (OOP) nature
of the implementation, adding packages should be a task comprising adding the relevant OOP code; that is, add a relevant function
for every overloaded function in the package. At the time of this writing, all of these functions can be found in the file
OOP_util.R
.
Of course, return values need to be consistent with existing expectations. It is also necessary to update the global constants
which show which packages are supported: see Constants
for more information.
And finally, thorough testing... Testing code could be added to the testing suite, but it could also remain outside the package if the new functionality is only to be used locally. For that matter, extra OOP code for expanding the functionality need not be added to the package, but could remain external as long as the correct overloading are used!
Good luck and I hope this work does what you need it to do...
NPEL.Classification must be before raster on the search path; in particular getData
is a valid function in both
packages. If you are getting very unusual errors, consider detaching and reattaching this package so it is found first.
The code in this package depends heavily on the raster package. A couple of functions utilize maptools.
Currently supported modelling packages are: randomForest, randomForestSRC, FNN, class, kknn, and gbm.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.