Format conversion for codominant marker data

Share:

Description

Codominant marker data (which here means: data with several diploid loci; two alleles per locus) can be represented in various ways. This function converts the formats "genepop" and "structure" into "structurama" and "prabclus". "genepop" is a version of the format used by the package GENEPOP (Rousset, 2010), "structure" is a version of what is used by STRUCTURE (Pritchard et al., 2000), "structurama" is a version of what is used by STRUCTURAMA (Huelsenbeck and Andolfatto, 2007) and "prabclus" is required by the function alleleinit in the present package.

Usage

1
2
3
4
5
6
  alleleconvert(file=NULL,strmatrix=NULL, format.in="genepop",
                          format.out="prabclus",
                          alength=3,orig.nachar="000",new.nachar="-",
                          rows.are.individuals=TRUE, firstcolname=FALSE,
                          aletters=intToUtf8(c(65:90,97:122),multiple=TRUE),
                          outfile=NULL)

Arguments

file

string. Filename of input file, see details. One of file and strmatrix needs to be specified.

strmatrix

matrix or data frame of strings, see details. One of file and strmatrix needs to be specified.

format.in

string. One of "genepop" or "structure", see details.

format.out

string. One of "structurama" or "prabclus", see details.

alength

integer. If format.in="genepop", length of code for a single allele.

orig.nachar

string. Code for missing values in input data.

new.nachar

string. Code for missing values in output data.

rows.are.individuals

logical. If TRUE, rows are interpreted as individuals and columns (variables if strmatrix is a data frame) as loci.

firstcolname

logical. If TRUE, it is assumed that the first column contains row names.

aletters

character vector. String of default characters for alleles if format.out=="prabclus" (the default is fine unless there is a locus that can have more than 62 different alleles in the dataset).

outfile

string. If specified, the output matrix (omitting quotes) is written to a file of this name (including row names if fistcolname==TRUE).

Details

The formats are as follows (described is the format within R, i.e., for the input, the format of strmatrix; if file is specified, the file is read with read.table(file,colClasses="character") and should give the format explained below - note that colClasses="character" implies that quotes are not needed in the input file):

genepop

Alleles are coded by strings of length alength and there is no space between the two alleles in a locus, so a value of "258260" means that in the corresponding locus the two alleles have codes 258 and 260.

structure

Alleles are coded by strings of arbitrary length. Two rows correspond to each inidividual, the first row containing the first alleles in all loci and the second row containing the second ones.

structurama

Alleles are coded by strings of arbitrary length. the two alleles in each locus are written with brackets around them and a comma in between, so "258260" in "genepop" corresponds to "(258,260)" in "structurama".

prabclus

Alleles are coded by a single character and there is no space between the two alleles in a locus (e.g., "AC").

Value

A matrix of strings in the format specified as format.out with an attribute "alevels", a vector of all used allele codes if format.out=="prabclus", otherwise vector of allele codes of last locus.

Author(s)

Christian Hennig chrish@stats.ucl.ac.uk http://www.homepages.ucl.ac.uk/~ucakche

References

Huelsenbeck, J. P., and P. Andolfatto (2007) Inference of population structure under a Dirichlet process model. Genetics 175, 1787-1802.

Pritchard, J. K., M. Stephens, and P. Donnelly (2000) Inference of population structure using multi-locus genotype data. Genetics 155, 945-959.

Rousset, F. (2010) Genepop 4.0 for Windows and Linux. http://kimura.univ-montp2.fr/~rousset/Genepop.pdf

See Also

alleleinit

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  data(tetragonula)
# This uses example data file Heterotrigona_indoFO.dat
  str(alleleconvert(strmatrix=tetragonula))
  strucmatrix <-
    cbind(c("I1","I1","I2","I2","I3","I3"),
    c("122","144","122","122","144","144"),c("0","0","21","33","35","44"))
  alleleconvert(strmatrix=strucmatrix,format.in="structure",
    format.out="prabclus",orig.nachar="0",firstcolname=TRUE)
  alleleconvert(strmatrix=strucmatrix,format.in="structure",
    format.out="structurama",orig.nachar="0",new.nachar="-9",firstcolname=TRUE)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.