disc: Identify Deme Inbreeding Spatial Coefficients in Continuous...

View source: R/main_vanilla.R

discR Documentation

Identify Deme Inbreeding Spatial Coefficients in Continuous Space

Description

The purpose of this statistic is to identify an inbreeding coefficient, or degree of relatedness, for a given location in discrete space. We assume that locations in spaces can be represented as "demes," such that multiple individuals live in the same deme (i.e. samples are sourced from the same location). The expected pairwise relationship between two individuals, or samples, is dependent on the each sample's deme's inbreeding coefficient and the geographic distance between the demes. The program assumes a symmetric distance matrix.

Usage

disc(
  discdat,
  start_params = NULL,
  lambda = 0.1,
  learningrate = 0.001,
  m_lowerbound = 0,
  m_upperbound = Inf,
  b1 = 0.9,
  b2 = 0.999,
  e = 1e-08,
  steps = 1000,
  thin = 1,
  normalize_geodist = TRUE,
  report_progress = TRUE,
  return_verbose = FALSE
)

Arguments

discdat

dataframe; The genetic-geographic data by deme (K)

start_params

named double vector; vector of start parameters.

lambda

double; A quadratic L2 explicit regularization, or penalty, parameter on "m" parameter. Note, lambda is a scalar such that: \lambda m^2.

learningrate

double; alpha parameter for how much each "step" is weighted in the gradient descent

m_lowerbound

double; lower limit value for the global "m" parameter; any "m" value encounter less than the lower bound will be replaced by the lower bound

m_upperbound

double; upper limit value for the global "m" parameter; any "m" value encounter greater than the upper bound will be replaced by the upper bound

b1

double; exponential decay rates for the first moment estimate in the Adam optimization algorithm

b2

double; exponential decay rates for the second moment estimate in the Adam optimization algorithm

e

double; epsilon (error) for stability in the Adam optimization algorithm

steps

integer; the number of steps as we move down the gradient

thin

integer; the number of steps to keep as part of the output (i.e. if the user specifies 10, every 10th iteration will be kept)

normalize_geodist

boolean; whether geographic distances between demes should be normalized (i.e. Min-Max Feature Scaling: X' = \frac{X - X_{min}}{X_{max} - X_{min}} , which places the geodistances on the scale to [0-1]). Helps increase model stability at the expense of complicating the interpretation of the migration rate parameter.

report_progress

boolean; whether or not a progress bar should be shown as you iterate through steps

return_verbose

boolean; whether the inbreeding coefficients and migration rate should be returned for every iteration or only for the final iteration. User will typically not want to store every iteration, which can be memory intensive

Details

The gen.geo.dist dataframe must be named with the following columns: "smpl1"; "smpl2"; "deme1"; "deme2"; "gendist"; "geodist"; which corresponds to: Sample 1 Name; Sample 2 Name; Sample 1 Location; Sample 2 Location; Pairwise Genetic Distance; Pairwise Geographpic Distance. Note, the order of the columns do not matter but the names of the columns must match.

The start_params vector names must match the cluster names (i.e. clusters must be have a name that we can match on for the starting relatedness paramerts). In addition, you must provide a start parameter for "m".

Note: We have implemented coding decisions to not allow the "f" inbreeding coefficients to be negative by using a logit transformation internally in the code.

Gradient descent is performed using the Adam (adaptive moment estimation) optimization approach. Default values for moment decay rates, epsilon, and learning rates are taken from DP Kingma, 2014.


nickbrazeau/discent documentation built on Feb. 10, 2025, 3:09 p.m.