Building modules"

The "1, 2, 3" of module building

The process of making a module is essentially

  1. Write an R function
  2. Run BuildModule with the function and metadata
  3. Optional -- Upload to the zoon modules repository

Each module type is slightly different to write though the same three basic steps apply. Below we show an example of how to write each of the module types. We also link to pre-existing modules that you can use as templates.

How to build an occurrence module

The aim of an occurrence module is to return a data.frame of occurrence data which can be used for modelling a species distribution. The example I'm going to show gets data from a fictional survey we have undertaken. The data was saved as a .csv and to share it we have placed it on Figshare.

# Load zoon
library(zoon)

# Start building our function
Lorem_ipsum_UK <- function(){

In this case we have not given our function any arguments as we simply want to return the online dataset. However you could add arguments here to modify what your function returns (for an example see the SpOcc module).

# First I retrieve the data from figshare
# Here is the URL
URL <- "https://ndownloader.figshare.com/files/2519918"

# Here is the data
out <- read.csv(URL)
head(out)
##    startDate latitude longitude
## 1 2014-06-25 51.98917 0.8917427
## 2 2014-06-25 51.98917 0.8917427
## 3 2007-08-28 52.21136 0.6602159
## 4       <NA> 51.97564 0.9833449
## 5 1973-01-01 52.34187 0.7142953
## 6 2013-04-12 52.23719 0.7877316

Now it is time to think about how we return our data. The output format for occurrence modules is very important. If you do not ensure that the format is correct then your module will not work properly when entered into a workflow. An occurrence module must return a data.frame with the columns longitude, latitude, value, type and fold, the details are given at the end of this document. The order of these columns is important. Another optional column is crs (Coordinate Reference System), which specifies the coordinate system of you points if they are not latitude/longitude (the default). This is specified in the proj4string format. You can also supply extra columns that might be used further down the workflow.

Our occurrence data does not have all of these columns so we need to add them. So we need to do a little reformatting

# Keep only Lat Long columns
out <- out[, c("longitude", "latitude")]

# Add in the columns we dont have
out$value <- 1 # all our data are presences
out$type <- 'presence'
out$fold <- 1 # we don't add any folds

# Now the data is in the correct format we can return it
return(out)

We have now written the R code for our occurrence module, this is what it looks like when you put it all together.

Lorem_ipsum_UK <- function(){

  # Get data
  URL <- "https://ndownloader.figshare.com/files/2519918"
  out <- read.csv(URL)
  out <- out[, c("longitude", "latitude")]

  # Add in the columns we dont have
  out$value <- 1 # all our data are presences
  out$type <- 'presence'
  out$fold <- 1 # we wont add any folds

  return(out)
}

Now that we have our function written we can test it very simply in a workflow like this.

workl1 <- workflow(occurrence = Lorem_ipsum_UK,
                   covariate = UKBioclim,
                   process = OneHundredBackground,
                   model = LogisticRegression,
                   output = PrintMap)
## Occurrence data does not have a "crs" column, zoon will assume it is in the same projection as the covariate data

plot of chunk building_occ6

This is a nice way to debug your function and ensure you are getting the results you expect.

Once you are happy that your function is working as you expect it to you can build you code into a module using the BuildModule function in zoon. This script adds in metadata including the type of module, authors' names, a brief description and documentation for the arguments it accepts (though this one doesn't accept any arguments). Once BuildModule has created your module it will run it through checks to make sure has the required features, outputs data in the correct format, etc. Checking can be turned off by setting the argument check = FALSE.

# Let's build our module
BuildModule(Lorem_ipsum_UK,
            type = 'occurrence',
            title = 'A dataset of Lorem ipsum occurrences',
            description = paste0('The module retrieves a dataset of',
            'Lorem ipsum records from figshare. This dataset contains',
            'precence only data and was collected between 1990 and',
            '2000 by members of to Lorem ipsum appreciation society'),
            details = 'This dataset is fake, Lorem ipsum does not exist',
            version = 0.1,
            author = 'A.B. Ceidi',
            email = 'ABCD@anemail.com',
            dataType = 'presence-only')
## Starting checks...
## done
## [1] "Lorem_ipsum_UK"

This function is fairly self explanatory however it is worth noting the dataType field. This must be any of 'presence-only', 'presence/absence', 'presence/background', 'abundance' or 'proportion'. This is important so that people using your module in the future will know what it is going to output.

BuildModule has now written an R file in our working directory containing the function and metadata, so that it can be shared with others.

# First we remove the function from our workspace
rm(list = 'Lorem_ipsum_UK')

# This is how you would use a module that a colleague has sent you
LoadModule(module = 'Lorem_ipsum_UK.R')

work2 <- workflow(occurrence = Lorem_ipsum_UK,
                  covariate = UKAir,
                  process = OneHundredBackground,
                  model = LogisticRegression,
                  output = PrintMap)

Once we're happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit the development pages for more information.

How to write a covariate module

The aim of a covariate module is to provide spatial information that will help to explain the distribution of a species. For example this data could be climate data, habitat data or topology.

A covariate module, like an occurrence module, does not have to take any arguments but must return a raster layer, brick or stack.

In this example we will create a covariate module that can provide a number of different climate layers for the area covering Australia.

# Our function will take an argument to set the variable
# the user wants returned
AustraliaAir <- function(variable = 'rhum'){

When your module has arguments, as here, it is important to include defaults for all arguments. This make it easier for other users to use your modules and allows your module to be tested effectively when you upload it to the zoon repository.

The first step is to load the R packages that your code is going to need. It is important that you use the GetPackage function rather than library or require as it will also install the package if the user does not already have it installed.

In this example we do not need any external packages as the data we are downloading is a RasterStack object, and zoon already loads the raster package to deal with RasterStacks.

To share our covariate data we have saved the raster object as an R data file and placed it on Figshare - attributing those that created the data.

In our module we download this data into R

# Load in the data
URL <- "http://files.figshare.com/2527274/aus_air.rdata"
load(url(URL, method = 'libcurl')) # The object is called 'ras'

# Subset the data according the the variable parameter
ras <- subset(ras, variables)

return(ras)

We can test our function works by running it in a workflow with other modules

AustraliaAir <- function(variables = 'rhum'){

  URL <- "http://files.figshare.com/2527274/aus_air.rdata"
  load(url(URL, method = 'libcurl')) # The object is called 'ras'
  ras <- subset(ras, variables)
  return(ras)

}

# Select the variables we want
myVariables <- c('air','hgt','rhum','shum','omega','uwnd','vwnd')

work3 <- workflow(occurrence = SpOcc(extent = c(111, 157, -46, -6),
                                     species = 'Varanus varius',
                                     limit = 500),
                  covariate = AustraliaAir(variables = myVariables),
                  process = OneHundredBackground,
                  model = LogisticRegression,
                  output = PrintMap)

plot of chunk building_cov4

Once we are happy with the function we have written we need to use the BuildModule function to convert our function into a module by adding in the necessary metadata. Once BuildModule has created your module it will run it through checks to make sure has the required features, outputs data in the correct format, etc. Checking can be turned off by setting the argument check = FALSE.

# Build our module
BuildModule(AustraliaAir,
            type = 'covariate',
            title = 'Australia Air data from NCEP',
            description = paste('This modules provides access to the',
                                'NCEP air data for austrlia provided by',
                                'NCEP and should be attributed to Climatic',
                                'Research Unit, University of East Anglia'),
            details = paste('These data are redistributed under the terms of',
                            'the Open Database License',
                            'http://opendatacommons.org/licenses/odbl/1.0/'),
            version = 0.1,
            author = 'Z.O. Onn',
            email = 'zoon@zoon-zoon.com',
            paras = list(variables = paste('A character vector of air variables',
                         'you wish to return. This can include any number of',
                         "the following: 'air','hgt','rhum','shum','omega',",
                         "'uwnd','vwnd'")))
## Starting checks...
## done
## [1] "AustraliaAir"

BuildModule is fairly self explanatory but it is worth noting the paras argument. This takes a named list of the parameters the module takes. This should follow the following structure; list(parameterName = 'Parameter description.', anotherParameter = 'Another description.')

Once BuildModule has been run there will be an R file in our working directory that represents our module and can be shared with others. This R script can be used as follows.

# remove the original function from our environment
rm(list = 'AustraliaAir')

# Load the module script
LoadModule('AustraliaAir.R')

work4 <- workflow(occurrence = SpOcc(extent = c(111, 157, -46, -6),
                                     species = 'Varanus varius',
                                     limit = 500),
                  covariate = AustraliaAir,
                  process = OneHundredBackground,
                  model = LogisticRegression,
                  output = PrintMap)

Once we're happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit the development pages for more information.

How to write a process module

The aim of a process model is to modify the occurrence data or/and the covariate data prior to modelling. Examples include adding background points, or adding folds for cross-validation.

A process model returns data in exactly the same format that it accepts data. It takes and returns a list of two elements. The first element is a data.frame with the columns values, type, fold, longitude, latitude (see Occurrence module output), and additional covariate columns. The covariate columns are added internally in the zoon workflow by combining the output of the covariate module. The data.frame has an attribute covCols which details which columns these are. The second element of the list is a RasterBrick, RasterLayer, or RasterStack as returned by a covariate module.

In this example we are going to create a process module that cuts down our occurrence data to a user supplied extent.

When writing a module it is useful to have example input to test with. One way to do this is to run a similar workflow and use the outputs of that workflow to test yours. Here is an example:

# We run a very simple workflow so that we can get example input
# for our module
work5 <- workflow(occurrence = UKAnophelesPlumbeus,
                  covariate  = UKAir,
                  process    = OneHundredBackground,
                  model      = LogisticRegression,
                  output     = PrintMap)
## Occurrence data does not have a "crs" column, zoon will assume it is in the same projection as the covariate data
## There are fewer than 100 cells in the environmental raster.
## Using all available cells (81) instead

plot of chunk building_pro1

# The output from a process module is in the same format as the 
# input, so we can use the output of OneHundredBackground as the testing
# input for our module. Note that this object should be called
# .data
.data <- Process(work5)

str(.data, 2)
## List of 2
##  $ df :'data.frame': 269 obs. of  6 variables:
##   ..$ value    : num [1:269] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ type     : Factor w/ 2 levels "background","presence": 2 2 2 2 2 2 2 2 2 2 ...
##   ..$ fold     : num [1:269] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ longitude: num [1:269] 1.01 -0.16 -2.83 -0.63 -3.53 ...
##   ..$ latitude : num [1:269] 52.4 51.6 53.4 51.6 56 ...
##   ..$ layer    : num [1:269] 271 272 272 272 271 ...
##   ..- attr(*, "call_path")=List of 1
##   ..- attr(*, "covCols")= chr "layer"
##  $ ras:Formal class 'RasterLayer' [package "raster"] with 12 slots
##  - attr(*, "call_path")=List of 3
##   ..$ occurrence: chr "UKAnophelesPlumbeus"
##   ..$ covariate : chr "UKAir"
##   ..$ process   : chr "OneHundredBackground"

It is important to note that the list object that is passed into a process module is named .data, and so when writing our module we need to adhere to this convention.

First lets have a look at the input

# The first element is the occurrence data
head(.data$df)
##   value     type fold   longitude latitude    layer
## 1     1 presence    1  1.01287600 52.37696 271.4658
## 2     1 presence    1 -0.16003467 51.57146 272.2655
## 3     1 presence    1 -2.83497900 53.40813 271.6481
## 4     1 presence    1 -0.62955210 51.55540 272.2655
## 5     1 presence    1 -3.52534680 56.04848 271.2964
## 6     1 presence    1  0.01144066 51.58168 272.2655
# The attribute covCols gives the covariate columns
attr(.data$df, 'covCols')
## [1] "layer"
# If we want to modify these covariate columns in
# our process module we can select them using this
# attribute
head(.data$df[attr(.data$df, 'covCols')])
##      layer
## 1 271.4658
## 2 272.2655
## 3 271.6481
## 4 272.2655
## 5 271.2964
## 6 272.2655
# The second element is the raster
plot(.data$ras)

plot of chunk buliding_pro1a

Let's start writing our new process module.

# Start writing our module
ClipOccurence <- function(.data, extent = c(-180, 180, -180, 180)){

Here we have remembered to give .data as an argument as this is a default for process modules. In addition we have supplied an argument for the extent and set the default to the entire globe (i.e. no clipping). It is important that all of your arguments have defaults (even if the default might not be a good idea in practice), as this allows the zoon system to perform automatic testing on your modules when you share them online.

# Write the body of our function
# extract the occurrence data from the .data object
occDF <- .data$df

# Subset by longitude
occSub <- occDF[occDF$longitude >= extent[1] &
                occDF$longitude <= extent[2], ]

# Subset by latitude
occSub <- occSub[occSub$latitude >= extent[3] &
                 occSub$latitude <= extent[4], ]

# assign this data.frame back to the .data object
.data$df <- occSub

So our simple process function looks like this:

ClipOccurrence <- function(.data, extent = c(-180, 180, -180, 180)){

  # Write the body of our function
  # extract the occurrence data from the .data object
  occDF <- .data$df

  occSub <- occDF[occDF$longitude >= extent[1] &
                  occDF$longitude <= extent[2], ]

  occSub <- occSub[occSub$latitude >= extent[3] &
                   occSub$latitude <= extent[4], ]

  .data$df <- occSub

  return(.data)

}

Our next step is to test that this function will work in a workflow. Once we have read in our function so that it is available in our working environment we can then include it in a workflow as we would a normal module.

# Run a workflow with our new process
# In this example we first add background points, then clip the data
work6 <- workflow(occurrence = UKAnophelesPlumbeus,
                  covariate  = UKAir,
                  process    = Chain(OneHundredBackground,
                                     ClipOccurrence(extent = c(-3, 2, 50, 53))),
                  model      = LogisticRegression,
                  output     = PrintMap)

plot of chunk building_pro5

We can see that the data has been clipped to the extent we specified in the map printed by the output module.

The next stage is to turn this function into a module which is shareable. To do this we need to add metadata to our function using the BuildModule function. Once BuildModule has created your module it will run it through checks to make sure has the required features, outputs data in the correct format, etc. Checking can be turned off by setting the argument check = FALSE.

# Build our module
BuildModule(ClipOccurrence,
            type = 'process',
            title = 'Clip occurrence data to extent',
            description = paste('This process module clips the occurrence',
                                'data that is returned from the occurrence',
                                'module to a user defined extent'),
            details = paste('The extent is a square region which denotes the',
                            'area within which observations will be kept.',
                            'All data that falls outside of the extent will',
                            'be removed and will be not be used in the',
                            'modelling process'),
            version = 0.1,
            author = 'Z.O. Onn',
            email = 'zoon@zoon-zoon.com',
            paras = list(extent = paste('A numeric vector of length for',
                                        'giving (in this order) the minimum',
                                        'longitude, maximum longitude, minimum',
                                        'latitude, maximum latitude.')),
            dataType = c('presence-only', 'presence/absence',
                         'presence/background', 'abundance',
                         'proportion'))
## Starting checks...
## done
## [1] "ClipOccurrence"

Much of how to use BuildModule is self-explanatory but two parameters are worth mentioning here. The paras argument takes a named list of the parameters the module takes. This should follow the following structure; list(parameterName = 'Parameter description.', anotherParameter = 'Another description.'), but should not include the defaults (i.e. we do not include .data). dataType describes the types of occurrence data that this module will work with. Certain modules might only work with presence-only data for example. In our case, our module will work with any type of data and so we list all the data types in the dataType field.

Once BuildModule has been run there will be an R file in our working directory that represents our module and can be shared with others. This R script can be used as follows.

# remove the original function from our environment
rm(list = 'ClipOccurrence')

# Load the module script
LoadModule('ClipOccurrence.R')
## [1] "ClipOccurrence"
work7 <- workflow(occurrence = AnophelesPlumbeus,
                  covariate = UKBioclim,
                  process = Chain(OneHundredBackground,
                                  ClipOccurrence(extent = c(-5, 5, 50, 55))),
                  model = LogisticRegression,
                  output = PrintMap)
## 152 records found
## 0-
## 152 records downloaded
## Occurrence data does not have a "crs" column, zoon will assume it is in the same projection as the covariate data

plot of chunk building_pro7

Once we're happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit the development pages for more information.

How to write a model module

Here is a simple function that will become our module. It is a model module that uses general additive models. We will work through it one element at a time

First we start our function by declaring all the parameters we need, including all the defaults

GamGam <- function(.df){

Since this is a model module the only default is .df. To find out more about defaults see the section Module IO definitions for module developers. .df is a data.frame with columns: values, type, fold, longitude, latitude plus additional named columns giving associated covariate values. The names of the covariate columns are given as an attribute of the table: attr(.df, 'covCols')].

Next we specify the packages our function needs. These should be specified by using GetPackage function in the zoon package. This function will load the package if the user of your module already has it or will install it from CRAN if they don't. For this reason make sure your package only uses packages that are on CRAN.

# Specify the packages we need using the function
# GetPackage
zoon::GetPackage("gam")

Next we can add the code that does our modelling, here we create a simple GAM (Generalised Additive Model) using the package gam

# Create a data.frame of covariate data
covs <- .df[colnames(.df) %in% attr(.df, 'covCols')]

# do a bit of copy-pasting to define smooth terms for each covariate
f <- sprintf('.df$value ~ s(%s)',
                    paste(colnames(covs),
                          collapse = ') + s('))

# Run our gam model
m <- gam::gam(formula = formula(f),
              data = covs,
              family = binomial)

The final stage of building a model module is to write some code within the function to create a ZoonModel object. This is important as it standardises all outputs from model modules and crucially enables zoon to make predictions from them in a predictable and standard way.

We build a ZoonModel object by using the function ZoonModel. This takes three parameters

# Create a ZoonModel object to return.
# this includes our model, predict method
# and the packages we need.
ZoonModel(model = m,
          code = {

          # create empty vector of predictions
          p <- rep(NA, nrow(newdata))

          # omit NAs in new data
          newdata_clean <- na.omit(newdata)

          # get NA indices
          na_idx <- attr(newdata_clean, 'na.action')

          # if there are no NAs then the index should 
          # include all rows, else it should name the 
          # rows to ignore
          if (is.null(na_idx)){
            idx <- 1:nrow(newdata)
          } else {
            idx <- -na_idx
          }

          # Use the predict function in gam to predict
          # our new values
          p[idx] <- gam::predict.gam(model,
                                     newdata_clean,
                                     type = 'response')
          return (p)
        },
        packages = 'gam')

With all these elements in place we now have our module complete. All together it looks like this.

GamGam <- function(.df){

  # Specify the packages we need using the function
  # GetPackage
  zoon::GetPackage("gam")

  # Create a data.frame of covariate data
  covs <- .df[colnames(.df) %in% attr(.df, 'covCols')]


  # do a bit of copy-pasting to define smooth terms for each covariate
  f <- sprintf('.df$value ~ s(%s)',
                      paste(colnames(covs),
                            collapse = ') + s('))

  # Run our gam model
  m <- gam::gam(formula = formula(f),
                data = covs,
                family = binomial)

  # Create a ZoonModel object to return.
  # this includes our model, predict method
  # and the packages we need.
  ZoonModel(model = m,
            code = {

            # create empty vector of predictions
            p <- rep(NA, nrow(newdata))

            # omit NAs in new data
            newdata_clean <- na.omit(newdata)

            # get their indices
            na_idx <- attr(newdata_clean, 'na.action')

            # if there are no NAs then the index should 
            # include all rows, else it should name the 
            # rows to ignore
            if (is.null(na_idx)){
              idx <- 1:nrow(newdata)
            } else {
              idx <- -na_idx
            }

            # Use the predict function in gam to predict
            # our new values
            p[idx] <- gam::predict.gam(model,
                                       newdata_clean,
                                       type = 'response')
            return (p)
          },
          packages = 'gam')

}

We then run BuildModule on our function, adding the required metadata. As this module has no parameters other than .df which is not user specified, we don't need to set the paras argument, which would normally be used to document arguments. Default arguments, like .df are all signified by starting with a . and don't need to be documented as this will be written into the module documentation automatically. It is worth noting the dataType field. This must be any of 'presence-only', 'presence/absence', 'presence/background', 'abundance' or 'proportion'. This is important so that people using your module in the future will know what types of data can be used as inputs. Once BuildModule has created your module it will run it through checks to make sure has the required features, outputs data in the correct format, etc. Checking can be turned off by setting the argument check = FALSE.

BuildModule(object = GamGam,
            type = 'model',
            title = 'GAM sdm model',
            description = 'This is my mega cool new model.',
            details = paste('This module performs GAMs (Generalised Additive',
                            'Models) using the gam function from the package gam.'),
            version = 0.1,
            author = 'Z. Oon',
            email = 'zoon@zoon.com',
            dataType = c('presence-only', 'presence/absence'))
## Starting checks...
## done
## [1] "GamGam"

This is now a run-able module.

# remove the function in our workspace else
# this will cause problems
rm(GamGam)

# Load in the module we just built
LoadModule('GamGam.R')
## [1] "GamGam"
# Run a workflow using our module
work8 <- workflow(occurrence = UKAnophelesPlumbeus,
                  covariate = UKAir,
                  process  = OneHundredBackground,
                  model = GamGam,
                  output = PrintMap)

plot of chunk building_mod7

Once we're happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit the development pages for more information.

How to write an output module

An output module is the last module in a zoon workflow and is an opportunity to summarise the model results, make predictions, or otherwise visualise the data or results. The input to output modules is a combination of the outputs of occurrence, process and model modules providing many possible output types.

In this example we will create an output module that uses the model output to predict the species occurrence in a new location given by a user-provided raster.

When writing a module it is useful to have example input to test with. One way to do this is to run a similar workflow and use the outputs of that workflow to test yours. Here is an example:

# We run a very simple workflow so that we can get example input
# for our module
work9 <- workflow(occurrence = UKAnophelesPlumbeus,
                  covariate  = UKAir,
                  process    = OneHundredBackground,
                  model      = LogisticRegression,
                  output     = PrintMap)

# The input to an output module is a combination of the output
# from the model module and the covariate module. We can recreate
# it for this work flow like this
.model <- Model(work9)
.ras <- Covariate(work9)

Both .model and .ras are default arguments for an output model so it is important that you have them as arguements for your module, even if you dont use them both. It is also important that you stick to the same naming conventions.

# Our output module takes the default parameters and a user-defined
# Raster* object that has the same structure as the raster layer output
# by the covariate module
PredictNewRasterMap <- function(.model, .ras, raster = .ras){

It is important to have default values for all user defined parameters so that your module can be tested when you upload it to the zoon website. Here we set our default 'new area' raster to be the same as the raster used to create the model. Clearly this is not how we envisage the module being used in a real application (unless they genuinely wanted to predict back to the same area), however this ensures that this module will always work with its default arguments, no matter what workflow it is placed in.

# The first step is to load in the packages we need
zoon::GetPackage("raster") 

# Then extract the covariate values
# from the user provided raster
vals <- data.frame(getValues(raster))
colnames(vals) <- names(raster)

Once we have these new values we can predict using the ZoonPredict function. This function is very useful as it simplifies the process of making predictions from the ouput of a model module. See the InteractiveMap module for an innovative visualisation using predicted values.

# Make predictions to the new values
pred <- ZoonPredict(.model$model,
                    newdata = vals)

# Create a copy of the users' raster...
# (just a single layer)
pred_ras <- raster[[1]]

# ... and assign the predicted values to it
pred_ras <- setValues(pred_ras, pred)

Once we have the raster of predicted values we can plot it and return the results to the user.

# Plot the predictions as a map
plot(pred_ras)

# Return the raster of predictions
return (pred_ras)

Our function now looks like this:

PredictNewRasterMap <- function(.model, .ras, raster = .ras){

  zoon::GetPackage("raster")

  # Extract the values from the user provided raster
  vals <- data.frame(getValues(raster))
  colnames(vals) <- names(raster)

  # Make predictions to the new values
  pred <- ZoonPredict(.model$model,
                      newdata = vals)

  pred_ras <- raster[[1]]
  pred_ras <- setValues(pred_ras, pred)

  # Print the predictions as a map
  plot(pred_ras)

  return(pred_ras)
}

Our next step is to test that this function will work in a workflow. Once we have read in our function so that it is available in our working environment we can then include it in a workflow as we would a normal module.

# Run it with the defaults
work10 <- workflow(occurrence = UKAnophelesPlumbeus,
                   covariate  = UKBioclim,
                   process    = OneHundredBackground,
                   model      = LogisticRegression,
                   output     = PredictNewRasterMap)

plot of chunk building_out7

# Now I'm going to run it with a different raster
library(raster)

# Get Bioclim data (using the getData function in the raster package,
# which zoon loads) ...
BioclimData <- getData('worldclim', var = 'bio', res = 5)
BioclimData <- BioclimData[[1:19]]

# ... and crop to Australia
cropped <- crop(BioclimData,
                c(109,155,-46,-7))

# Run it with my new raster
work11 <- workflow(occurrence = UKAnophelesPlumbeus,
                   covariate  = UKBioclim,
                   process    = OneHundredBackground,
                   model      = LogisticRegression,
                   output     = PredictNewRasterMap(raster = cropped))

plot of chunk building_out7

# The prediction map should also be returned as a raster
class(Output(work11))
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"

The next stage is to turn this function into a module which is shareable. To do this we need to add metadata to our function using the BuildModule function

# Build our module
BuildModule(PredictNewRasterMap,
            type = 'output',
            title = 'Predict to a new raster and map',
            description = paste('This output module predicts the species',
                                'distribution in a new area given a new',
                                'raster'),
            details = paste('The results are printed as a map and a raster is',
                            'returned with the predicted values. It is important',
                            'that the new raster has the same structure as the',
                            'raster provided by the covariate module.',
                            'It must have the same covariate columns in the',
                            'same order.'),
            version = 0.1,
            author = 'Z.O. On',
            email = 'zoon@zoon-zoon.com',
            paras = list(raster = paste('A RasterBrick, RasterLayer or RasterStack in',
                                        'the same format as the raster provided',
                                        'by the covariate module. Predicted values',
                                        'will be estimated for this raster using',
                                        'the results from the model module')),
            dataType = c('presence-only', 'presence/absence', 'abundance',
                         'proportion'))
## Starting checks...
## done
## [1] "PredictNewRasterMap"

Once BuildModule has created your module it will run it through checks to make sure has the required features, outputs data in the correct format, etc. Checking can be turned off by setting the argument check = FALSE. Much of how to use BuildModule is self-explanatory but two parameters are worth mentioning here. The paras argument takes a named list of the parameters the module takes in the following structure: list(parameterName = 'Parameter description.', anotherParameter = 'Another description.'), but should not include the defaults (i.e. we do not include .model or .ras). dataType describes the types of occurrence data that this module will work with. Certain modules might only work with presence-only data for example. In our case, our module will work with any type of data and so we list all the data types in the dataType field.

Once BuildModule has been run there will be an R file in our working directory that represents our module and can be shared with others. This R script can be used as follows.

# remove the original function from our environment
rm(list = 'PredictNewRasterMap')

# Load the module script
LoadModule('PredictNewRasterMap.R')
## [1] "PredictNewRasterMap"
# Now I model a crop pest from Zimbabwe in its home
# range and in Australia by chaining together
# output modules
work12 <- workflow(occurrence = CWBZimbabwe,
                   covariate = Bioclim(extent = c(28, 38, -24, -16)),
                   process = NoProcess,
                   model = RandomForest,
                   output = Chain(PrintMap,
                                  PredictNewRasterMap(raster = cropped)))
## Occurrence data does not have a "crs" column, zoon will assume it is in the same projection as the covariate data

plot of chunk building_out9plot of chunk building_out9

Once we're happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit the development pages for more information.

Module IO definitions for module developers

The default input arguments and return values of modules are strict. However, any module type can have additional named input arguments, provided they have default values. A lot of the data frames include '+ covariates'. This indicates that the number of covariate columns is flexible.

Occurrence

In: No default inputs

Out: data.frame with column names (in this order):

Covariate

In: No default inputs

Out: RasterLayer, RasterBrick or RasterStack object

Process

In: list named .data with 2 named elements:

Out: list with 2 elements

Model

In: data.frame from process called .df. .df has an attribute covCols naming the covariate columns.

Out: A ZoonModel object (built by the function ZoonModel). A list with three elements.

Output

In:

Out: Anything!

Pictoral description of inputs and outputs

OccurrenceModule CovariateModule ProcessModule ModelModule OuputModule



Try the zoon package in your browser

Any scripts or data that you put into this service are public.

zoon documentation built on May 29, 2017, 10:45 a.m.