run_fastStructure: Run run_fastStructure from SNP data in a VCF file and and...

View source: R/run_fastStructure.R

run_fastStructureR Documentation

Run run_fastStructure from SNP data in a VCF file and and plot results. This fucntion is a wrapper that enables running fastStructure with SNP data in either a VCF file or vcfR object and coordinates in file or a matrix or data frame object. Notes for running fastStructure: Requires python 2 fastStructure 'cv' (cross-validation) feature is not yet implemented here.

Description

Run run_fastStructure from SNP data in a VCF file and and plot results.

This fucntion is a wrapper that enables running fastStructure with SNP data in either a VCF file or vcfR object and coordinates in file or a matrix or data frame object.

Notes for running fastStructure: Requires python 2 fastStructure 'cv' (cross-validation) feature is not yet implemented here.

Usage

run_fastStructure(
  x,
  format = "VCF",
  coords = NULL,
  samplenames = NULL,
  kmax = 10,
  save.in = NULL,
  reps = 30,
  tolerance = 1e-05,
  prior = "simple",
  full = FALSE,
  seed = NULL,
  python.path = NULL,
  fastStructure.path = NULL,
  cleanup = TRUE,
  include.out = c(".pdf", ".Qlog", ".margLlog", ".extraLog", ".Plog"),
  debug = FALSE,
  overwrite = FALSE
)

Arguments

x

'vcfR' object (see package::vcfR) or a character string with path to a SNPs dataset formatted according to the 'format' argument. Currently VCF or 'fastStructure' (a type of STRUCTURE format) can be used.

format

Character string indicating the format of the data. Currently only "VCF" or "fastStructure" allowed. Other types may be added. Ignored if x is a vcfR object.

coords

Either a character string with path to file containing coordinates (longitude in first column, latitude in second column), or matrix object with longitude and latitude columns.

samplenames

NULL or a character string vector with names of samples in the input data, and coords file if supplied. If NULL (the default), sample names are extracted from the SNPs datafile.

kmax

Numerical vector with set of values to use for K. Default 40.

save.in

Character string with path to directory where output files should be saved.

reps

Number of repititions. Default 100.

tolerance

Tolerance for convergence, i.e., the change in marginal likelihood required to continue.

prior

Type of prior to use. Default "simple".

full

Whether or not to generate output files holding variation of Q, P, and marginal likelihood, in addition to the files holding means. Default FALSE.

seed

Value to use as a seed for reproducing results. Default NULL.

python.path

Character string with path to python 2 with fastStructure dependencies Numpy, Scipy, Cython, GNU Scientific Library

fastStructure.path

Character string with path to folder containing the fastStructure python executable called 'structure.py'

cleanup

Whether or not the original fastStructure output files (*.log, *.meanQ, *meanP file for each replicate of each K) should be deleted after the data from those files are compiled and saved in three tables. Default TRUE.

include.out

Character vector indicating which type of files should be included as output. Default is c(".pdf",".Qlog",".margLlog"). An additional file ".Plog" can be included but can be very large.

debug

Logical indicating whether or not to print messages indicating the internal step of the function.

overwrite

Logical indicating whether or not to allow new output files to overwrite existing ones. Default FALSE.

save.as

Where to save the output PDF. Default is NULL.

Value

List of plots


JeffWeinell/misc.wrappers documentation built on Sept. 20, 2023, 12:42 p.m.