leptrine | R Documentation |
These data are binary (presence-absence) data collected on plant species for 8,678 sites located in the Blue Mountains region. It also contains environmental predictor variables collected at each site. There are are 1,751 absences and 6,927 presences. The 'leptrine' data consists of training and test sets, such that N
= 4,339. The nine environmental predictor variables given here have all been standardized.
leptrine
A list containing two matrices
: the training data with 4,339 observations and 9 columns, and the test data with 4,339 observations and 10 columns (it includes a column of zeroes for the intercept term). The columns are defined as follows:
RAIN_DRY_QTR
Recorded rainfall for the driest quater for each site.
FC
Recorded number of fire counts for each site.
TMP_MIN
Recorded minimum temperatures for each site.
TMP_SEAS
Recorded seasonal temperature for each site.
TMP_MN_WARM_QTR
Recorded minimum temperature for the warmest quater for each site.
RAIN_WET_QTR
Recorded rainfall for the westest quater for each site.
TMP_MN_COLD_QTR
Recorded minimum temperature for the coldest quater for each site.
TMP_MAX
Recorded maximum temperature for each site.
TMP_MN
Recorded minimum temperature for each site.
Y
Presence-absence for the plant species Leptospermum trinervium for each site.
Jakub Stoklosa and David I. Warton
The Blue Mountains presence-absence data for the species Leptospermum trinervium were obtained from http://www.bionet.nsw.gov.au/. Environmental data for Blue Mountains region: DRYAD entry doi:10.5061/dryad.985s5.
Stoklosa, J. and Warton, D.I. (2018). A generalized estimating equation approach to multivariate adaptive regression splines. Journal of Computational and Graphical Statistics, 27, 245–253.
# Load the data.
data(leptrine)
dat1 <- leptrine[[1]] # Training data.
Y <- dat1$Y # Response variable.
N <- length(Y) # Sample size (number of clusters).
n <- 1 # Cluster size.
id <- rep(1:N, each = n) # The ID of each cluster.
X_pred <- dat1[, -c(3:10)] # Design matrix using only two (of nine) predictors.
# Set MARGE tuning parameters.
family <- "binomial" # The selected "exponential" family for the GLM/GEE.
is.gee <- FALSE # Is the model a GEE?
nb <- FALSE # Is this a negative binomial model?
tols_score <- 0.0001 # A set tolerance (stopping condition) in forward pass for MARGE.
M <- 21 # A set threshold for the maximum number of basis functions to be used.
print.disp <- FALSE # Print ALL the output?
pen <- 2 # Penalty to be used in GCV.
minspan <- NULL # A set minimum span value.
# Fit the MARGE models (about ~ 30 secs.)
mod <- marge(X_pred, Y, N, n, id, family, corstr, pen, tols_score,
M, minspan, print.disp, nb, is.gee)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.