readFreqs: Read in a file of allele frequencies

View source: R/readFreqs.R

readFreqsR Documentation

Read in a file of allele frequencies

Description

Reads in a file of alleles in a particular format.

Usage

readFreqs(strPath, FSIGenFormat = TRUE, delim = ",")

Arguments

strPath

The file from which to read the frequencies

FSIGenFormat

Tells the function whether the file is either in FSI Genetics format (see below) or 'Curran' format

delim

This argument is used when FSIGenFormat is TRUE, and is the regular expression used to delimit columns of the table. it is set to a single comma by default, and multiple delimiters are considered empty separate fields. There probably should be an additional argument which specifies the missing or empty cell symbol, but I won't programme this unless somebody asks for it

Details

This function reads frequencies in the rectangular allele freqency table format used by FSI Genetics and other journals. This file format assumes a comma separated value file (CSV) (although the column delimeter can be specified). The first column should be labelled 'Allele' and contain the STR allele designations that are used in the data set. The remaining columns will have the locus name as a header, and frequencies that are either blank, zero, or non-zero. Blanks or zeros are used to specify that the allele is not observed (and not used) at the locus. The final row of the file should start with 'N' or 'n' in the first column and give the number of individuals typed (or the number of alleles recorded) in assessing the frequency of the alleles.

The second format is a very particular 'Curran' text format. The first line contains the number of loci in the multiplex. The next line will contain the name of the first locus and the number of alleles, nA, the locus separated by a comma. The next nA lines contain the allele number (from 1 to nA), the STR designation of the allele, and the frequency separated by commas. This pattern is repeated for each locus. In the future this function will read the rectangular allele freqency table used by FSI Genetics and other journals.

Value

a list containing two vectors and a list, loci, counts, and freqs. The vector loci is a vector of the locus names in the frequency file. The vector counts is a vector of the number of individuals (or sometimes alleles) typed at each locus. This will null if the 'Curran' format is used. The list freqs, is a list of vectors with each vector containing the frequencies of the alleles at the locus. The names of the elements of the vectors are the STR allele designations.

Author(s)

James M. Curran


relSim documentation built on Aug. 29, 2023, 9:07 a.m.