run_structure: Run run_structure from SNP data in a VCF file and and plot...

View source: R/run_structure.R

run_structureR Documentation

Run run_structure from SNP data in a VCF file and and plot results. This function is a wrapper that enables running STRUCTURE on SNP data in either a VCF file or vcfR object Notes: Requires installation of STRUCTURE software

Description

Run run_structure from SNP data in a VCF file and and plot results.

This function is a wrapper that enables running STRUCTURE on SNP data in either a VCF file or vcfR object

Notes: Requires installation of STRUCTURE software

Usage

run_structure(
  x,
  format = "VCF",
  coords = NULL,
  mainparams.path = NULL,
  extraparams.path = NULL,
  burnin = 1000,
  kmax = 10,
  numreps = 10000,
  runs = 5,
  ploidy = NULL,
  missing = NULL,
  onerowperind = NULL,
  save.in = NULL,
  structure.path = NULL,
  samplenames = NULL,
  cleanup = TRUE,
  include.out = c(".pdf", "popfiles"),
  debug = FALSE,
  ...,
  setupOnly = FALSE,
  overwrite = FALSE
)

Arguments

x

'vcfR' object (see package::vcfR) or a character string with path to a SNPs dataset formatted according to the 'format' argument. Currently VCF or 'structure' (a type of STRUCTURE format) can be used.

format

Character string indicating the format of the data. Currently only "VCF" or "structure" allowed. Other types may be added. Ignored if x is a vcfR object.

coords

Either a character string with path to file containing coordinates (longitude in first column, latitude in second column), or matrix object with longitude and latitude columns.

mainparams.path

Character string with path to the mainparams file. Default is NULL, in which case the mainparams file is generated from values supplied to arguments of this function.

extraparams.path

Character string with path to the extraparams file. Default is NULL, in which case the extraparams file is generated from values supplied to arguments of this function.

burnin

Integer with how many initial MCMC samples to ignore. Default is 1000. NEED TO CHECK IF THIS IS REASONABLE. If this argument is NULL, then BURNIN must be defined in the file mainparams file and 'mainparams.path' must not be NULL.

kmax

Numerical vector with set of values to use for K. Default 40.

numreps

Chain length. Default 10000.

runs

Number of times to repeat the mcmc analysis . Default 5.

ploidy

Integer ≥ 1 indicating ploidy, or NULL (the default), in which case ploidy is determined automatically from the input data (only works if 'format' = "VCF" or "vcfR").

missing

Integer used to code missing alleles, or NULL (the default), in which case missing data is identified automatically from the input file (only works if 'format' = "VCF" or "vcfR").

onerowperind

Logical indicating if the input data file codes individuals on a single or multiple rows. Default is NULL (only works if 'format' = "VCF" or "vcfR"), in which case a temporary structure file is created and onerowperind is coerced to TRUE.

save.in

Character string with path to directory where output files should be saved. The directory will be created and should not already exist. Default is NULL, in which case output is saved to a new folder (name randomly generated) in the current directory.

structure.path

Character string with path to folder containing the structure executable called 'structure.py'

samplenames

NULL. Not yet implemented.

cleanup

Whether or not the original output files should be deleted/replaced with one, simple table holding all of the information usually spread across multiple files and tables. Default TRUE.

include.out

Character vector indicating which type of files should be included as output in addition to the usual structure output. Default is c(".pdf","popfiles"). ".pdf" generates EvannoPlots and admixture barplots, and "popfiles" generates an easySFS-format popfile with individual assignments to populations for each K.

debug

Logical indicating whether or not to print messages indicating the internal step of the function. Default FALSE. Typically only used for development.

...

Additional arguments passed to STRUCTURE. Not yet implemented in the future may include 'LABEL', 'POPDATA', 'POPFLAG', 'LOCDATA', 'PHENOTYPE', 'EXTRACOLS', 'MARKERNULLMES', 'RECESSIVEALLELES', 'MAPDISTANCES', 'PHASED', 'PHASEINFO', 'MARKOVPHASE', and 'NOTAMBIGUOUS'

setupOnly

Logical indicating whether or not the structure environment should be setup but not run. Default FALSE.

overwrite

Whether or not to overwrite previous results. Default FALSE.

save.as

Where to save the output PDF. Default is NULL.

tolerance

Tolerance for convergence, i.e., the change in marginal likelihood required to continue.

prior

Type of prior to use. Default "simple"

full

Whether or not to generate output files holding variation of Q, P, and marginal likelihood, in addition to the files holding means. Default FALSE.

seed

Value to use as a seed for reproducing results. Default NULL.

Value

List of plots


JeffWeinell/misc.wrappers documentation built on Sept. 20, 2023, 12:42 p.m.