readData: readData
In jmcurran/rbayesfst: Bayesian Estimation of Fst

readData

R Documentation

readData

Description

A function to read the data needed for the algorithm.

Usage

readData(fileName)

Arguments

fileName

A valid file path or URL with the file extension inp or json.

Details

A function that can read the data needed for the algorithm. The original program requires data to be in the following format Input format

The input file should be a space or tab-delimited ASCII (i.e. text) file. First, there are three header lines, containing:

interaction term (gamma) switch: 0 - for no interaction term 1 - to include interaction term
the number of subpopulations (npop)
the number of loci (nloc)

Then for locus i (i=1,..,nloc) there are npop+1 input lines, the first containing the number of alleles at that locus (numall[i]) and the remaining lines each contain the allele counts.

For example, for 3 subpops and 2 loci, with 2 and 4 alleles, where we do not want to include the interaction gamma in the analysis, the input should look like:

0
3
2

2
45 5
23 27
10 40

4
10 10 10 20
5 5 25 15
17 19 0 14

Blank lines are used to delimit information for the different loci. White space (space or tab) is used to delimit the numbers on each line. The locus and population names are extracted from comments on line which start with a %, e.g. using the same example as above

0
3
2

2
45 5 % Locus1 Pop1
23 27 % Locus1 Pop2
10 40 % Locus1 Pop3

4
10 10 10 20 % Locus2 Pop1
5 5 25 15 % Locus2 Pop2
17 19 0 14 % Locus2 Pop3

The function also has the ability to read in JSON formatted data. As an absolute minimum it must contain the the counts of each allele at each locus for each population stored in a list called dbCounts. Each element of the list should be a matrix with the same number of rows, one for each population. The number of columns in each row is equal to the number of alleles for the particular locus, and the entries of each row give the count of allele i in population j at locus l.

Value

A dataset of class bayesFst. This is essentially a list with the following members:

`dbCounts`	A list of matrices providing allele counts at each locus for each population. The items in the list are loci, the rows are populations, and the columns alleles.
`nLoci`	the number of loci in the dataset.
`Loci`	the locus names if available. If the locus names are not available they will be labelled Locus 1, Locus 2, ...
`nPops`	the number of populations in the dataset.
`Pops`	the population names if available. If the population names are not available they will be labelled Pop 1, Pop 2, ...
`numAlleles`	the number of possible alleles at each locus
`locusPopSums`	a nloc x npop matrix containing the number of alleles observed at the lth locus for the jth population.
`gammaSwitch`	either `TRUE` or `FALSE` depending on whether locus and popuation effects interact or not.
`name`	the fully qualified file name of the data set. Note that if an .inp file has been saved into JSON format, then the name of the JSON file is the name, not the original data.

Examples

## Example using the data provided by Balding from the web. 
## Not run: 
bd = readData('http://www.reading.ac.uk/Statistics/genetics/software/bayesfst/data_BB04.inp')
## End(Not run)

## Example using the Balding provided example, but from this pacakge
bd = readData(system.file("extdata", "data_BB04.inp", package = "rbayesfst"))

## Example using the Balding provided example but saved in JSON format 
## from this package.
bd = readData(system.file("extdata", "data_BB04.json", package = "rbayesfst"))
summary(bd)

jmcurran/rbayesfst documentation built on June 4, 2022, 9:57 a.m.