structure2bedassle: Convert a dataset from STRUCTURE to BEDASSLE format

View source: R/format.data.R

structure2bedassleR Documentation

Convert a dataset from STRUCTURE to BEDASSLE format

Description

structure2bedassle converts a STRUCTURE dataset to BEDASSLE format

Usage

structure2bedassle(
  infile,
  onerowperind,
  start.loci,
  start.samples = 1,
  missing.datum,
  prefix,
  save.freqs = TRUE
)

Arguments

infile

The name and path of the file in STRUCTURE format to be converted to the format used in a bedassle analysis.

onerowperind

Indicates whether the file format has one row per individual (TRUE) or two rows per individual (FALSE).

start.loci

The index of the first column in the dataset that contains genotype data.

start.samples

The index of the first row in the dataset that contains genotype data (e.g., after any headers). Default value is 1.

missing.datum

The character or value used to denote missing data in the STRUCTURE dataset (often 0 or -9).

prefix

A character vector giving the prefix (including desired directory path) to be attached to output files.

save.freqs

A logical value indicating whether or not to save the allele frequency data matrix generated by this function as an R object.

Details

This function takes a population genetics dataset in STRUCTURE format and converts it to an allele frequency data table, then calculates pairwise pi between all samples. The matrix of pairwise pi can be used as the genDist argument in run.bedassle. The STRUCTURE file can have one row per individual and two columns per locus, or one column and two rows per individual. It can only contain bi-allelic SNPs. Missing data is acceptable, but must be indicated with a single value throughout the dataset.

This function takes a STRUCTURE format data file and converts it to a bedassle format data file. This function can only be applied to diploid organisms. The STRUCTURE data file must be a plain text file. If there are extraneous lines of text or column headers before the data start, those extra lines should be deleted by hand or taken into account via the start.samples argument.

The STRUCTURE dataset can either be in the ONEROWPERIND=1 file format, with one row per individual and two columns per locus, or the ONEROWPERIND=0 format, with two rows and one column per individual. The first column of the STRUCTURE dataset should be individual names. There may be any number of other columns that contain non-genotype information before the first column that contains genotype data, but there can be no extraneous columns at the end of the dataset, after the genotype data.

The genotype data must be bi-allelic single nucleotide polymorphisms (SNPs). Applying this function to datasets with more than two alleles per locus may result in cryptic failure.

Value

This function returns a matrix of pairwise pi that can be used as the genDist argument in a BEDASSLE analysis (run.bedassle). It also saves this matrix as a text file ("yourprefix_pwp.txt") so that it can be used in future analyses. If the save.freqs is TRUE, the allele frequency data matrix generated from the STRUCTURE data file is saved as an R data (.RData) object.


gbradburd/bedassle documentation built on May 20, 2022, 1 p.m.