run.spacemix.analysis: Runs a SpaceMix analysis

View source: R/space.mix.MCMC.R

run.spacemix.analysisR Documentation

Runs a SpaceMix analysis

Description

This function runs a Markov chain Monte Carlo to estimate a geogenetic map of your genotyped samples. A SpaceMix analysis can be run with any of 4 separate models:

  1. "no_movement" - populations do not choose their own locations, nor can they draw admixture. The only parameters to be estimated are: the alpha parameters of the spatial covariance function and the nugget parameters

  2. "source" - populations do not choose their own locations, but they do draw admixture. The parameters to be estimated are: the alpha parameters of the spatial covariance function, the nugget parameters, the locations of the sources of admixture, the strength of that admixture.

  3. "target" - populations choose their own locations, but no admixture. The parameters to be estimated are: the alpha parameters of the spatial covariance function, the nugget parameters, the population locations.

  4. "source_and_target" - populations choose their own locations AND they draw admixture. The parameters to be estimated are: the alpha parameters of the spatial covariance function, the nugget parameters, the population locations, the locations of the sources of admixture, and the strength of that admixture.

Usage

run.spacemix.analysis(n.fast.reps, fast.MCMC.ngen, fast.model.option,
  long.model.option, data.type, sample.frequencies = NULL,
  mean.sample.sizes = NULL, counts = NULL, sample.sizes = NULL,
  sample.covariance = NULL, target.spatial.prior.scale = NULL,
  source.spatial.prior.scale = NULL, spatial.prior.X.coordinates,
  spatial.prior.Y.coordinates, round.earth,
  long.run.initial.parameters = NULL, k, loci, ngen, printfreq,
  samplefreq, mixing.diagn.freq = 50, savefreq, directory = NULL,
  prefix = "MyRun")

Arguments

n.fast.reps

The number of short initial runs to perform.

fast.MCMC.ngen

The number of generations to run each initial MCMC analysis.

fast.model.option

The model to be used in the short runs: may be "no_movement","source","target","source_and_target".

long.model.option

The model to be used in the long run: may be "no_movement","source","target","source_and_target".

data.type

The data type to be used. May be "sample.covariance","sample.frequencies","counts". Please see the vignette for a discussion of what these different data elements should look like.

sample.frequencies

Data to be specified if "sample.frequencies" is chosen as data.type.

mean.sample.sizes

Data to be specified if "sample.frequencies" or "sample.covariance" are chosen as data.type.

counts

Data to be specified if "counts" is chosen as data.type.

sample.sizes

Data to be specified if "counts" is chosen as data.type.

sample.covariance

Data to be specified if "sample.covariance" is chosen as data.type.

target.spatial.prior.scale

The variance on the spatial prior on population locations, default is half the pairwise observed distance.

source.spatial.prior.scale

The variance on the spatial prior on sources of admixture, default is twice the pairwise observed distance.

spatial.prior.X.coordinates

'Observed' sample longitude, or, if you want to examine the influence of the prior, random values.

spatial.prior.Y.coordinates

'Observed' sample latitude, or, if you want to examine the influence of the prior, random values.

round.earth

Option of whether you want to estimate locations on a plane (round.earth = FALSE) or a sphere (round.earth = TRUE).

long.run.initial.parameters

List of parameter values that can be passed directly to the long run MCMC as initial parameter values. The list should include values for a0, a1, a2, population.coordinates, admix.proportions, and nugget, and each element of the list should named for the corresponding parameter (e.g., list("a0" = 1.07, "a1" = 0.5, ...)).

k

Number of samples.

loci

Number of loci.

ngen

Number of MCMC gnereations for the long MCMC.

printfreq

Frequency with which updates are printed. The updates consist of the current MCMC iteration followed by the posterior probability.

samplefreq

Frequency with which samples are logged from the MCMC (basically the thinning).

mixing.diagn.freq

Frequency of adaptive Metropolis-within-Gibbs updates do the tuning parameters of the proposal distributions. Default value is every 50 MCMC iterations.

savefreq

Frequency with which MCMC_output object is saved.

directory

Directory into which you want output to be saved. If no directory is specified, a random directory name will be generated and that directory will be created.

prefix

Prefix to be attached to all output files.

Details

The algorithm proceeds by running a user-specified number of fast initial runs from random locations in parameter space to find a generally good area, then one long run from the final location in parameter space from the best fast run, the results of which are what the user cares about. The user can also choose to run only a single long analysis, for which the initial parameters may be specified. If the user runs one or more fast runs, the fast.model.option specified must be a model that is the same as, or nested within, the long.model.option. For example, a user may not specify "source" for the fast runs and "target" for the long runs.

Value

This function saves an output R object (".Robj") which contains the results of the analysis. The components of this R object are:

  • a0 - The posterior distribution on parameter α_0.

  • a1 - The posterior distribution on parameter α_1.

  • a2 - The posterior distribution on parameter α_2.

  • accept_rates - The list of acceptance rates of different parameters over the course of the MCMC. The total number of elements in each element of the list is equal to the number of sampled MCMC iterations (i.e., the total number of generations divided by the sample frequency).

  • admix.proportions - The posterior distribution on admixture proportions. This is a matrix in which the ith column is the vector of estimated admixture proportions from the ith sampled generation of the MCMC.

  • diagns - The list of acceptance rates for each parameter over the last 50 MCMC iterations.

  • distances - The list of pairwise distances between all samples and their sources of admixture over the course of the MCMC. Each element of the list is a pairwise distance matrix of dimension 2*K by 2*K, where K is the number of samples. The total number of elements in the list is equal to the number of sampled MCMC iterations (i.e., the total number of generations divided by the sample frequency).

  • last.params - The list of values passed between each iteration of the MCMC, sampled at the last iteration of the MCMC (i.e., the location in parameter space from the very end of the analysis, along with other quantities passed between parameter update functions).

  • LnL_freqs - The vector of likelihood values sampled over the course of the MCMC.

  • lstps - A list giving the log of the scale of the tuning parameters, updated via an adaptive MCMC procedure, for each model parameter. The total number of elements in each element of the list is equal to the number of sampled MCMC iterations (i.e., the total number of generations divided by the sample frequency).

  • ngen - The user-specified number of generations of the MCMC.

  • nugget - The posterior distribution on nugget parameters. This is a matrix in which the ith column is the vector of estimated nuggets from the ith sampled generation of the MCMC.

  • population.coordinates - The posterior distribution on sample coordinates in geogenetic space. Each element of the list is a matrix with 2 columns (Eastings and Northings, which correspond to Long and Lat in the geogenetic space and 2*K rows, where K is the number of samples in the dataset. The first K rows correspond to the geogenetic coordinates of the samples themselves, and the K+1:2*K rows give the geogenetic coordinates of the source of admixture for each sample.

  • Prob - The vector of posterior probability values sampled over the course of the MCMC.

  • samplefreq - The number of iterations between each time the MCMC is sampled. A higher frequency (lower samplefreq) result in more sampled iterations per analysis, with a higher autocorrelation between sampled parameter estimates.

  • source.spatial.prior.scale - The variance of the prior distribution on admixture source geogenetic locations.

  • target.spatial.prior.scale - The variance of the prior distribution on sample geogenetic locations.

  • transformed.covariance.list - The posterior distribution of the mean-centered and projected parametric covariance matrix. This is of dimension K-1 by K-1, where K is the number of samples.

Examples

# load example dataset
data(spacemix.example.dataset)

# run example analysis
run.spacemix.analysis(n.fast.reps = 2,
			fast.MCMC.ngen = 100,
			fast.model.option = "source_and_target",
			long.model.option = "source_and_target",
			data.type = "counts",
			sample.frequencies = NULL,
			mean.sample.sizes = NULL,
			counts = spacemix.example.dataset$allele.counts,
			sample.sizes = spacemix.example.dataset$sample.sizes,
			sample.covariance = NULL,
		target.spatial.prior.scale = NULL,
			source.spatial.prior.scale = NULL,
			spatial.prior.X.coordinates = spacemix.example.dataset$population.coordinates[,1],
			spatial.prior.Y.coordinates = spacemix.example.dataset$population.coordinates[,2],
			round.earth = FALSE,
			long.run.initial.parameters = NULL,
			k = nrow(spacemix.example.dataset$allele.counts),
			loci = ncol(spacemix.example.dataset$allele.counts),
			ngen = 5000,
		printfreq = 50,
			samplefreq = 5,
			mixing.diagn.freq = 50,
			savefreq = 5000,
			directory = NULL,
			prefix = "example_run")

gbradburd/SpaceMix documentation built on Oct. 19, 2022, 12:43 p.m.