readGenalex: Read GenAlEx-format genotypes file

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Reads genotype data file in GenAlEx format into an annotated data frame of class genalex. Internal consistency checks that are allowed by the GenAlEx format are also performed as data is read. GenAlEx and its documentation are available at http://biology-assets.anu.edu.au/GenAlEx.

Usage

1
2
readGenalex(file, sep = "\t", ploidy = 2, na.strings = c("0", "-1", ".",
  "NA", ""), ...)

Arguments

file

Delimited text file in GenAlEx format, typically exported as tab- or comma-delimited text from Excel

sep

Column separator used when file was created (defaults to tab)

ploidy

The ploidy of genotypes encoded in file (defaults to 2)

na.strings

Strings encoding missing data. Default is to include the GenAlEx missing values ("0" and "-1") as well as ".", "NA" and "" (empty).

...

Additional arguments passed to scan when reading data

Details

readGenalex expects a genotype data file in GenAlEx format, which specifies three header lines describing the structure and content of the file, followed by lines containing the genotype data, along with optionally extra columns specifying additional information about the sampled information for other analyses. GenAlEx format for a collection of diploid samples is the following, with columns separated by sep:

N loci Total N samples N populations N pop 1 N pop 2 ...
Dataset title empty empty Name pop 1 Name pop 2 ...
Sample title Pop title Name locus 1 empty Name locus 2 ...
ID sample 1 ID sample 1 pop Loc 1 allele 1 Loc 1 allele 2 Loc 2 allele 1 ...
ID sample 2 ID sample 2 pop Loc 1 allele 1 Loc 1 allele 2 Loc 2 allele 1 ...
... ... ... ... ... ...

Calling readGenalex for a file first reads the top 3 header lines, then reads the remainder of the file checking for consistency with the data description from the header lines. It attempts to cleanly ignore extra delimiters that Excel might add when exporting a delimited file.

After reading, the first two columns of the data frame containing the sample and population names are stored as character, while the genotype columns are stored as numeric, as that is the specified type for genotype information in GenAlEx. As such, it is an error for these columns to contain non-numeric values that do not match na.strings.

Extra columns beyond the genotype columns are allowed. If these columns are named, they are read along with the genotype columns and are stored as a data frame in the extra.columns attribute and writeGenalex will write their values in the columns immediately to the right of the genotype values. These data are given their natural type as if read with read.table(..., stringsAsFactors = FALSE), so that character values are not converted to factors. Row names are assigned that are equivalent to the corresponding sample names.

More information on GenAlEx is available at http://biology-assets.anu.edu.au/GenAlEx. In particular, genotype information must be encoded numerically.

Value

An annotated data frame of class genalex containing sample data, with column names determined by line 3 of the input file. Special attributes of the data frame include:

data.file.name

The value of file

ploidy

Ploidy of input data

n.loci

Number of loci

n.samples

Total number of samples

n.pops

Number of populations

pop.labels

Names of populations

pop.sizes

Sizes of populations

dataset.title

Dataset title

sample.title

Sample title

pop.title

Population title

locus.names

Names of loci

locus.columns

Numeric column position of allele 1 of each locus in the data frame, with names matching the corresponding loci

extra.columns

data.frame containing any extra columns given in file to the right of the genotype columns. Row order is the same as for the genotype data. Data are given their natural types using type.convert(..., as.is = TRUE), so that characters are not converted to factors. Row names are assigned equal to the corresponding sample names. If no extra columns were found, this attribute does not exist.

genetic.data.format

"genalex", not present in package versions >= 1.0

Author(s)

Douglas G. Scofield

References

Peakall, R. and Smouse P.E. (2012) GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research-an update. Bioinformatics 28, 2537-2539.

Peakall, R. and Smouse P.E. (2006) GENALEX 6: genetic analysis in Excel. Population genetic software for teaching and research. Molecular Ecology Notes 6, 288-295.

See Also

read.table, type.convert

Examples

1
2
3
4
5
gt.file <- system.file("extdata/Qagr_pericarp_genotypes.txt",
                       package = "readGenalex")
gt <- readGenalex(gt.file)
head(gt)
names(attributes(gt))

douglasgscofield/readGenalex documentation built on May 15, 2019, 10:43 a.m.