radish: Optimize a parameterized conductance surface

Description Usage Arguments Details Value Examples

View source: R/radish_optimize.R

Description

Uses maximum likelihood to fit a parameterized conductance surface to genetic data, given a function mapping spatial data to conductance (a "conductance model") and a function mapping resistance distance (covariance) to genetic distance (a "measurement model").

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
radish(
  formula,
  data,
  conductance_model = radish::loglinear_conductance,
  measurement_model = radish::mlpe,
  nu = NULL,
  theta = NULL,
  leverage = TRUE,
  nonnegative = TRUE,
  conductance = TRUE,
  optimizer = c("newton", "bfgs"),
  control = NewtonRaphsonControl(verbose = TRUE, ctol = 1e-06, ftol = 1e-06),
  validate = FALSE
)

Arguments

formula

A formula with the name of a matrix of observed genetic distances on the lhs, and covariates in the creation of data on the rhs

data

An object of class radish_graph (see conductance_surface)

conductance_model

A function of class radish_conductance_model_factory (see radish_conductance_model_factory)

measurement_model

A function of class radish_measurement_model (see radish_measurement_model)

nu

Number of genetic markers (potentially used by measurement_model)

theta

Starting values for optimization

leverage

Compute influence measures and leverage?

nonnegative

Force regression-like measurement_model to have nonnegative slope?

conductance

If TRUE, edge conductance is the sum of cell conductances; otherwise edge conductance is the inverse of the sum of cell resistances (unused; TODO)

optimizer

The optimization algorithm to use: newton uses the exact Hessian, with computational cost that grows linearly with the number of parameters; while bfgs uses an approximation with much reduced cost (but slower overall convergence)

control

A list containing options for the optimization routine (see NewtonRaphsonControl for list)

validate

Numerical validation of leverage via package numDeriv (very slow, use for debugging small examples)

Details

By "parameterized conductance surface", what is meant is a model where per-vertex conductance (and thus resistance distance) is a function of spatial covariates. The choice of function is referred to in this package as the "conductance model". The inverse problem (and the purpose of this package) is to estimate the parameters of the conductance model, by relating the (unknown, modelled) resistance distance to observed genetic dissimilarity via a probability model (referred to as the "measurement model" throughout this package).

For example, a log-linear choice of conductance model is:

conductance[i] = exp(covariates[i,] %*% theta)

where theta are unknown parameters and covariates is a design matrix. radish estimates theta as well as any nuisance parameters associated with the measurement model.

If the fit is on the boundary (e.g. no spatial genetic structure) or is the null model of isolation-by-distance, the resulting object will not contain influence/leverage/gradient/hessian.

For handling of categorical covariates, see details of conductance_surface and examples below. The dummy coding of categorical covariates is created by conductance_model (e.g. radish::loglinear_conductance) for flexiblity. Currently, all the in-built conductance models in radish use the default contrast coding (e.g. for a categorical covariate with K factors, the K-1 mean differences against a reference category). For these conductance models, the intercept is excluded if it is not identifiable.

Value

An object of class radish

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
library(raster)

data(melip)

# scaling spatial covariates helps avoid numeric overflow
covariates <- raster::stack(list(altitude = raster::scale(melip.altitude), 
                                 forestcover = raster::scale(melip.forestcover)))

# create parameterized conductance surface
surface <- conductance_surface(covariates, melip.coords, directions = 8)

fit_nnls <- radish(melip.Fst ~ altitude * forestcover, data = surface, 
                   radish::loglinear_conductance, radish::leastsquares)
summary(fit_nnls)

# a different "measurement_model" that incorporates dependence
# among pairwise measurements
fit_mlpe <- radish(melip.Fst ~ altitude * forestcover, data = surface, 
                   radish::loglinear_conductance, radish::mlpe)
summary(fit_mlpe)

# conductance surface with 95% CI
fitted_conductance <- conductance(surface, fit_mlpe, quantile = 0.95)

# test for an interaction using a likelihood ratio test
fit_mlpe_interaction <- radish(melip.Fst ~ forestcover * altitude, data = surface, 
                               radish::loglinear_conductance, radish::mlpe)
anova(fit_mlpe, fit_mlpe_interaction)

# test against null model of IBD using a LRT
fit_mlpe_ibd <- radish(melip.Fst ~ 1, data = surface, 
                       radish::loglinear_conductance, radish::mlpe)
anova(fit_mlpe, fit_mlpe_ibd)

# categorical covariates:
# rasters of categorical covariates must have an associated RAT, see ?raster::ratify 
# and 'details' section of ?conductance_surface
forestcover_class <- cut(raster::values(melip.forestcover), breaks = c(0, 1/3, 1/6, 1)) 
melip.forestcover_cat <- 
  raster::ratify(raster::setValues(melip.forestcover, as.numeric(forestcover_class)))

RAT <- levels(melip.forestcover_cat)[[1]]
RAT$VALUE <- levels(forestcover_class) #explicitly define level names
levels(melip.forestcover_cat) <- RAT

covariates_cat <- raster::stack(list(forestcover = melip.forestcover_cat,
                                     altitude = melip.altitude)) 

surface_cat <- conductance_surface(covariates_cat, melip.coords, directions = 8)

fit_mlpe_cat <- radish(melip.Fst ~ forestcover + altitude, surface_cat, 
                       radish::loglinear_conductance, radish::mlpe)
summary(fit_mlpe_cat)

nspope/radish documentation built on July 12, 2020, 11:50 a.m.