vcf2structure: Convert VCF file/object to STRUCTURE file (for SNP data)...

View source: R/run_structure.R

vcf2structureR Documentation

Convert VCF file/object to STRUCTURE file (for SNP data) Converts/writes a VCF file or vcfR object containing SNPs into STRUCTURE format, which is written to a file. Note: Does not yet handle 'recessive alleles' or phase information (see STRUCTURE manual). This information is optional for STRUCTURE but may be of interest to some users.

Description

Convert VCF file/object to STRUCTURE file (for SNP data)

Converts/writes a VCF file or vcfR object containing SNPs into STRUCTURE format, which is written to a file. Note: Does not yet handle 'recessive alleles' or phase information (see STRUCTURE manual). This information is optional for STRUCTURE but may be of interest to some users.

Usage

vcf2structure(
  vcf,
  IndvNames = TRUE,
  OneRowPerIndv = TRUE,
  MarkerNames = TRUE,
  MissingData = c(-9),
  out = NULL,
  InterMarkerDists = FALSE,
  PopData = NULL,
  PopFlag = NULL,
  LocData = NULL,
  Phenotype = NULL,
  OtherData = NULL
)

Arguments

vcf

Character string with path to input VCF, or an object of class vcfR.

IndvNames

Logical indicating if the first column of the STRUCTURE file should contain the names of the individuals, or a character string with names to to use. Default is TRUE, in which case the names will be the names used in the VCF file.

OneRowPerIndv

Logical specifying if each genotypes should be written to a single row or multiple rows. Default TRUE, but STRUCTURE works equivalently with either format.

MarkerNames

Either a logical (TRUE or FALSE), or a character string with marker names. If TRUE (the default), the marker names are constructed from the CHROM and POS columns of the VCF file: 'CHROM_POS' (e.g. "102_4").

MissingData

Number to use for missing-data. Default is -9.

out

Path where output structure file should be written (default NULL). The mainparams file is also written using the same name but with the extension '.params'.

InterMarkerDists

Either a logical indicating if intermarker distances should be included from CHROM and POS columns of the VCF, or a vector of integers with inter-marker distances to use. Default is FALSE. A warning is generated if multiple sites per locus are present in the input VCF and 'InterMarkerDists' is FALSE. See STRUCTURE manual for details on supplying inter-marker distances.

PopData

Either NULL or an integer vector indicating user-defined population assignments of individuals. The default is NULL (population data not included in output file). If non-NULL, PopData is written in the second column of the output file.

PopFlag

Either NULL or a logical vector indicating whether or not STRUCTURE should use the PopData information for the particular individual. Default is NULL, in which case the PopFlag column is not written in the output file. If supplied, PopData must be non-NULL.

LocData

NULL (the default) or a vector of integers specifying user-defined sampling locality for each individual.

Phenotype

NULL (the default) or a vector of integers specifying the value of a phenotype of interest for each individual.

OtherData

NULL (the default) or either a matrix or data frame with as many rows as individuals and columns specifying any other information of interest associated with individuals.

Value

A list with [1] Value of 'out', which contains the path to the output file, and [2] values that should be used for some of the parameters in the mainparams file.

Examples

library(misc.wrappers)
## Example 1:
# Path to VCF with SNPs
vcf.path    <- file.path(system.file("extdata", package = "misc.wrappers"),"simK4.vcf.gz")
# Run structure 30 times each for K=1-10
run_structure(x=vcf.path,kmax=10,reps=30,save.as="fs_simK4.pdf",include.out=c(".pdf"))

## Example 2: Same SNP dataset as example 1, but here we also provide Lon/Lat coordinates of individuals to geographically interpolate admixture coefficients.
vcf.path    <- file.path(system.file("extdata", package = "misc.wrappers"), "simK4.vcf.gz")
coords.path <- file.path(system.file("extdata", package = "misc.wrappers"), "simK4_coords.txt")
run_structure(x=vcf.path, coords=coords.path, kmax=10, reps=30, save.as="fs_simK4_withCoords.pdf", include.out=c(".pdf"))

JeffWeinell/misc.wrappers documentation built on Sept. 20, 2023, 12:42 p.m.