View source: R/run_structure.R
vcf2structure | R Documentation |
Convert VCF file/object to STRUCTURE file (for SNP data)
Converts/writes a VCF file or vcfR object containing SNPs into STRUCTURE format, which is written to a file. Note: Does not yet handle 'recessive alleles' or phase information (see STRUCTURE manual). This information is optional for STRUCTURE but may be of interest to some users.
vcf2structure(
vcf,
IndvNames = TRUE,
OneRowPerIndv = TRUE,
MarkerNames = TRUE,
MissingData = c(-9),
out = NULL,
InterMarkerDists = FALSE,
PopData = NULL,
PopFlag = NULL,
LocData = NULL,
Phenotype = NULL,
OtherData = NULL
)
vcf |
Character string with path to input VCF, or an object of class vcfR. |
IndvNames |
Logical indicating if the first column of the STRUCTURE file should contain the names of the individuals, or a character string with names to to use. Default is TRUE, in which case the names will be the names used in the VCF file. |
OneRowPerIndv |
Logical specifying if each genotypes should be written to a single row or multiple rows. Default TRUE, but STRUCTURE works equivalently with either format. |
MarkerNames |
Either a logical (TRUE or FALSE), or a character string with marker names. If TRUE (the default), the marker names are constructed from the CHROM and POS columns of the VCF file: 'CHROM_POS' (e.g. "102_4"). |
MissingData |
Number to use for missing-data. Default is -9. |
out |
Path where output structure file should be written (default NULL). The mainparams file is also written using the same name but with the extension '.params'. |
InterMarkerDists |
Either a logical indicating if intermarker distances should be included from CHROM and POS columns of the VCF, or a vector of integers with inter-marker distances to use. Default is FALSE. A warning is generated if multiple sites per locus are present in the input VCF and 'InterMarkerDists' is FALSE. See STRUCTURE manual for details on supplying inter-marker distances. |
PopData |
Either NULL or an integer vector indicating user-defined population assignments of individuals. The default is NULL (population data not included in output file). If non-NULL, PopData is written in the second column of the output file. |
PopFlag |
Either NULL or a logical vector indicating whether or not STRUCTURE should use the PopData information for the particular individual. Default is NULL, in which case the PopFlag column is not written in the output file. If supplied, PopData must be non-NULL. |
LocData |
NULL (the default) or a vector of integers specifying user-defined sampling locality for each individual. |
Phenotype |
NULL (the default) or a vector of integers specifying the value of a phenotype of interest for each individual. |
OtherData |
NULL (the default) or either a matrix or data frame with as many rows as individuals and columns specifying any other information of interest associated with individuals. |
A list with [1] Value of 'out', which contains the path to the output file, and [2] values that should be used for some of the parameters in the mainparams file.
library(misc.wrappers)
## Example 1:
# Path to VCF with SNPs
vcf.path <- file.path(system.file("extdata", package = "misc.wrappers"),"simK4.vcf.gz")
# Run structure 30 times each for K=1-10
run_structure(x=vcf.path,kmax=10,reps=30,save.as="fs_simK4.pdf",include.out=c(".pdf"))
## Example 2: Same SNP dataset as example 1, but here we also provide Lon/Lat coordinates of individuals to geographically interpolate admixture coefficients.
vcf.path <- file.path(system.file("extdata", package = "misc.wrappers"), "simK4.vcf.gz")
coords.path <- file.path(system.file("extdata", package = "misc.wrappers"), "simK4_coords.txt")
run_structure(x=vcf.path, coords=coords.path, kmax=10, reps=30, save.as="fs_simK4_withCoords.pdf", include.out=c(".pdf"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.