radish: Optimize a parameterized conductance surface
In nspope/radish: Fast gradient-based optimization of resistance surfaces

Description Usage Arguments Details Value Examples

Uses maximum likelihood to fit a parameterized conductance surface to genetic data, given a function mapping spatial data to conductance (a "conductance model") and a function mapping resistance distance (covariance) to genetic distance (a "measurement model").

radish(
  formula,
  data,
  conductance_model = radish::loglinear_conductance,
  measurement_model = radish::mlpe,
  nu = NULL,
  theta = NULL,
  leverage = TRUE,
  nonnegative = TRUE,
  conductance = TRUE,
  optimizer = c("newton", "bfgs"),
  control = NewtonRaphsonControl(verbose = TRUE, ctol = 1e-06, ftol = 1e-06),
  validate = FALSE
)

`formula`	A formula with the name of a matrix of observed genetic distances on the lhs, and covariates in the creation of `data` on the rhs
`data`	An object of class `radish_graph` (see `conductance_surface`)
`conductance_model`	A function of class `radish_conductance_model_factory` (see `radish_conductance_model_factory`)
`measurement_model`	A function of class `radish_measurement_model` (see `radish_measurement_model`)
`nu`	Number of genetic markers (potentially used by `measurement_model`)
`theta`	Starting values for optimization
`leverage`	Compute influence measures and leverage?
`nonnegative`	Force regression-like `measurement_model` to have nonnegative slope?
`conductance`	If `TRUE`, edge conductance is the sum of cell conductances; otherwise edge conductance is the inverse of the sum of cell resistances (unused; TODO)
`optimizer`	The optimization algorithm to use: `newton` uses the exact Hessian, with computational cost that grows linearly with the number of parameters; while `bfgs` uses an approximation with much reduced cost (but slower overall convergence)
`control`	A list containing options for the optimization routine (see `NewtonRaphsonControl` for list)
`validate`	Numerical validation of leverage via package `numDeriv` (very slow, use for debugging small examples)

By "parameterized conductance surface", what is meant is a model where per-vertex conductance (and thus resistance distance) is a function of spatial covariates. The choice of function is referred to in this package as the "conductance model". The inverse problem (and the purpose of this package) is to estimate the parameters of the conductance model, by relating the (unknown, modelled) resistance distance to observed genetic dissimilarity via a probability model (referred to as the "measurement model" throughout this package).

For example, a log-linear choice of conductance model is:

conductance[i] = exp(covariates[i,] %*% theta)

where theta are unknown parameters and covariates is a design matrix. radish estimates theta as well as any nuisance parameters associated with the measurement model.

If the fit is on the boundary (e.g. no spatial genetic structure) or is the null model of isolation-by-distance, the resulting object will not contain influence/leverage/gradient/hessian.

For handling of categorical covariates, see details of conductance_surface and examples below. The dummy coding of categorical covariates is created by conductance_model (e.g. radish::loglinear_conductance) for flexiblity. Currently, all the in-built conductance models in radish use the default contrast coding (e.g. for a categorical covariate with K factors, the K-1 mean differences against a reference category). For these conductance models, the intercept is excluded if it is not identifiable.

An object of class radish

library(raster)

data(melip)

# scaling spatial covariates helps avoid numeric overflow
covariates <- raster::stack(list(altitude = raster::scale(melip.altitude), 
                                 forestcover = raster::scale(melip.forestcover)))

# create parameterized conductance surface
surface <- conductance_surface(covariates, melip.coords, directions = 8)

fit_nnls <- radish(melip.Fst ~ altitude * forestcover, data = surface, 
                   radish::loglinear_conductance, radish::leastsquares)
summary(fit_nnls)

# a different "measurement_model" that incorporates dependence
# among pairwise measurements
fit_mlpe <- radish(melip.Fst ~ altitude * forestcover, data = surface, 
                   radish::loglinear_conductance, radish::mlpe)
summary(fit_mlpe)

# conductance surface with 95% CI
fitted_conductance <- conductance(surface, fit_mlpe, quantile = 0.95)

# test for an interaction using a likelihood ratio test
fit_mlpe_interaction <- radish(melip.Fst ~ forestcover * altitude, data = surface, 
                               radish::loglinear_conductance, radish::mlpe)
anova(fit_mlpe, fit_mlpe_interaction)

# test against null model of IBD using a LRT
fit_mlpe_ibd <- radish(melip.Fst ~ 1, data = surface, 
                       radish::loglinear_conductance, radish::mlpe)
anova(fit_mlpe, fit_mlpe_ibd)

# categorical covariates:
# rasters of categorical covariates must have an associated RAT, see ?raster::ratify 
# and 'details' section of ?conductance_surface
forestcover_class <- cut(raster::values(melip.forestcover), breaks = c(0, 1/3, 1/6, 1)) 
melip.forestcover_cat <- 
  raster::ratify(raster::setValues(melip.forestcover, as.numeric(forestcover_class)))

RAT <- levels(melip.forestcover_cat)[[1]]
RAT$VALUE <- levels(forestcover_class) #explicitly define level names
levels(melip.forestcover_cat) <- RAT

covariates_cat <- raster::stack(list(forestcover = melip.forestcover_cat,
                                     altitude = melip.altitude)) 

surface_cat <- conductance_surface(covariates_cat, melip.coords, directions = 8)

fit_mlpe_cat <- radish(melip.Fst ~ forestcover + altitude, surface_cat, 
                       radish::loglinear_conductance, radish::mlpe)
summary(fit_mlpe_cat)