leptrine: Blue Mountains presence-absence data for the plant species...

leptrineR Documentation

Blue Mountains presence-absence data for the plant species Leptospermum trinervium.

Description

These data are binary (presence-absence) data collected on plant species for 8,678 sites located in the Blue Mountains region. It also contains environmental predictor variables collected at each site. There are are 1,751 absences and 6,927 presences. The 'leptrine' data consists of training and test sets, such that N = 4,339. The nine environmental predictor variables given here have all been standardized.

Usage

leptrine

Format

A list containing two matrices: the training data with 4,339 observations and 9 columns, and the test data with 4,339 observations and 10 columns (it includes a column of zeroes for the intercept term). The columns are defined as follows:

RAIN_DRY_QTR

Recorded rainfall for the driest quater for each site.

FC

Recorded number of fire counts for each site.

TMP_MIN

Recorded minimum temperatures for each site.

TMP_SEAS

Recorded seasonal temperature for each site.

TMP_MN_WARM_QTR

Recorded minimum temperature for the warmest quater for each site.

RAIN_WET_QTR

Recorded rainfall for the westest quater for each site.

TMP_MN_COLD_QTR

Recorded minimum temperature for the coldest quater for each site.

TMP_MAX

Recorded maximum temperature for each site.

TMP_MN

Recorded minimum temperature for each site.

Y

Presence-absence for the plant species Leptospermum trinervium for each site.

Author(s)

Jakub Stoklosa and David I. Warton

Source

The Blue Mountains presence-absence data for the species Leptospermum trinervium were obtained from http://www.bionet.nsw.gov.au/. Environmental data for Blue Mountains region: DRYAD entry doi:10.5061/dryad.985s5.

References

Stoklosa, J. and Warton, D.I. (2018). A generalized estimating equation approach to multivariate adaptive regression splines. Journal of Computational and Graphical Statistics, 27, 245–253.

Examples

# Load the data.

data(leptrine)

dat1 <- leptrine[[1]]  # Training data.
Y <- dat1$Y            # Response variable.
N <- length(Y)         # Sample size (number of clusters).
n <- 1                 # Cluster size.
id <- rep(1:N, each = n)  # The ID of each cluster.

X_pred <- dat1[, -c(3:10)]  # Design matrix using only two (of nine) predictors.

# Set MARGE tuning parameters.

family <- "binomial"   # The selected "exponential" family for the GLM/GEE.
is.gee <- FALSE        # Is the model a GEE?
nb <- FALSE            # Is this a negative binomial model?
tols_score <- 0.0001   # A set tolerance (stopping condition) in forward pass for MARGE.
M <- 21                # A set threshold for the maximum number of basis functions to be used.
print.disp <- FALSE    # Print ALL the output?
pen <- 2               # Penalty to be used in GCV.
minspan <- NULL        # A set minimum span value.

# Fit the MARGE models (about ~ 30 secs.)

mod <- marge(X_pred, Y, N, n, id, family, corstr, pen, tols_score,
             M, minspan, print.disp, nb, is.gee)

JakubStats/marge documentation built on Feb. 25, 2024, 9:38 p.m.