View source: R/run_structure.R
run_structure | R Documentation |
Run run_structure from SNP data in a VCF file and and plot results.
This function is a wrapper that enables running STRUCTURE on SNP data in either a VCF file or vcfR object
Notes: Requires installation of STRUCTURE software
run_structure(
x,
format = "VCF",
coords = NULL,
mainparams.path = NULL,
extraparams.path = NULL,
burnin = 1000,
kmax = 10,
numreps = 10000,
runs = 5,
ploidy = NULL,
missing = NULL,
onerowperind = NULL,
save.in = NULL,
structure.path = NULL,
samplenames = NULL,
cleanup = TRUE,
include.out = c(".pdf", "popfiles"),
debug = FALSE,
...,
setupOnly = FALSE,
overwrite = FALSE
)
x |
'vcfR' object (see package::vcfR) or a character string with path to a SNPs dataset formatted according to the 'format' argument. Currently VCF or 'structure' (a type of STRUCTURE format) can be used. |
format |
Character string indicating the format of the data. Currently only "VCF" or "structure" allowed. Other types may be added. Ignored if x is a vcfR object. |
coords |
Either a character string with path to file containing coordinates (longitude in first column, latitude in second column), or matrix object with longitude and latitude columns. |
mainparams.path |
Character string with path to the mainparams file. Default is NULL, in which case the mainparams file is generated from values supplied to arguments of this function. |
extraparams.path |
Character string with path to the extraparams file. Default is NULL, in which case the extraparams file is generated from values supplied to arguments of this function. |
burnin |
Integer with how many initial MCMC samples to ignore. Default is 1000. NEED TO CHECK IF THIS IS REASONABLE. If this argument is NULL, then BURNIN must be defined in the file mainparams file and 'mainparams.path' must not be NULL. |
kmax |
Numerical vector with set of values to use for K. Default 40. |
numreps |
Chain length. Default 10000. |
runs |
Number of times to repeat the mcmc analysis . Default 5. |
ploidy |
Integer ≥ 1 indicating ploidy, or NULL (the default), in which case ploidy is determined automatically from the input data (only works if 'format' = "VCF" or "vcfR"). |
missing |
Integer used to code missing alleles, or NULL (the default), in which case missing data is identified automatically from the input file (only works if 'format' = "VCF" or "vcfR"). |
onerowperind |
Logical indicating if the input data file codes individuals on a single or multiple rows. Default is NULL (only works if 'format' = "VCF" or "vcfR"), in which case a temporary structure file is created and onerowperind is coerced to TRUE. |
save.in |
Character string with path to directory where output files should be saved. The directory will be created and should not already exist. Default is NULL, in which case output is saved to a new folder (name randomly generated) in the current directory. |
structure.path |
Character string with path to folder containing the structure executable called 'structure.py' |
samplenames |
NULL. Not yet implemented. |
cleanup |
Whether or not the original output files should be deleted/replaced with one, simple table holding all of the information usually spread across multiple files and tables. Default TRUE. |
include.out |
Character vector indicating which type of files should be included as output in addition to the usual structure output. Default is c(".pdf","popfiles"). ".pdf" generates EvannoPlots and admixture barplots, and "popfiles" generates an easySFS-format popfile with individual assignments to populations for each K. |
debug |
Logical indicating whether or not to print messages indicating the internal step of the function. Default FALSE. Typically only used for development. |
... |
Additional arguments passed to STRUCTURE. Not yet implemented in the future may include 'LABEL', 'POPDATA', 'POPFLAG', 'LOCDATA', 'PHENOTYPE', 'EXTRACOLS', 'MARKERNULLMES', 'RECESSIVEALLELES', 'MAPDISTANCES', 'PHASED', 'PHASEINFO', 'MARKOVPHASE', and 'NOTAMBIGUOUS' |
setupOnly |
Logical indicating whether or not the structure environment should be setup but not run. Default FALSE. |
overwrite |
Whether or not to overwrite previous results. Default FALSE. |
save.as |
Where to save the output PDF. Default is NULL. |
tolerance |
Tolerance for convergence, i.e., the change in marginal likelihood required to continue. |
prior |
Type of prior to use. Default "simple" |
full |
Whether or not to generate output files holding variation of Q, P, and marginal likelihood, in addition to the files holding means. Default FALSE. |
seed |
Value to use as a seed for reproducing results. Default NULL. |
List of plots
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.