readData | R Documentation |
A function to read the data needed for the algorithm.
readData(fileName)
fileName |
A valid file path or URL with the file extension |
A function that can read the data needed for the algorithm. The original program requires data to be in the following format Input format
The input file should be a space or tab-delimited ASCII (i.e. text) file. First, there are three header lines, containing:
interaction term (gamma) switch: 0 - for no interaction term 1 - to include interaction term
the number of subpopulations (npop)
the number of loci (nloc)
Then for locus i (i=1,..,nloc) there are npop+1 input lines, the first containing the number of alleles at that locus (numall[i]) and the remaining lines each contain the allele counts.
For example, for 3 subpops and 2 loci, with 2 and 4 alleles, where we do not want to include the interaction gamma in the analysis, the input should look like:
0
3
2
2
45 5
23 27
10 40
4
10 10 10 20
5 5 25 15
17 19 0 14
Blank lines are used to delimit information for the different loci. White space (space or tab) is used to delimit the numbers on each line.
The locus and population names are extracted from comments on line which start with a %
, e.g. using the same example as above
0
3
2
2
45 5 % Locus1 Pop1
23 27 % Locus1 Pop2
10 40 % Locus1 Pop3
4
10 10 10 20 % Locus2 Pop1
5 5 25 15 % Locus2 Pop2
17 19 0 14 % Locus2 Pop3
The function also has the ability to read in JSON formatted data. As an absolute minimum it must contain the the counts of each allele at each locus for each population stored in a list
called
dbCounts
. Each element of the list should be a matrix
with the same number of rows, one for each population. The number of columns in each row is equal to the number of alleles for the
particular locus, and the entries of each row give the count of allele i in population j at locus l.
A dataset of class bayesFst
. This is essentially a list with the following members:
dbCounts | A list of matrices providing allele counts at each locus for each population. The items in the list are loci, the rows are populations, and the columns alleles. |
nLoci | the number of loci in the dataset. |
Loci | the locus names if available. If the locus names are not available they will be labelled Locus 1, Locus 2, ... |
nPops | the number of populations in the dataset. |
Pops | the population names if available. If the population names are not available they will be labelled Pop 1, Pop 2, ... |
numAlleles | the number of possible alleles at each locus |
locusPopSums | a nloc x npop matrix containing the number of alleles observed at the lth locus for the jth population. |
gammaSwitch | either TRUE or FALSE depending on whether locus and popuation effects interact or not. |
name | the fully qualified file name of the data set. Note that if an .inp file has been saved into JSON format, then the name of the JSON file is the name, not the original data. |
summary.bayesFstData
## Example using the data provided by Balding from the web. ## Not run: bd = readData('http://www.reading.ac.uk/Statistics/genetics/software/bayesfst/data_BB04.inp') ## End(Not run) ## Example using the Balding provided example, but from this pacakge bd = readData(system.file("extdata", "data_BB04.inp", package = "rbayesfst")) ## Example using the Balding provided example but saved in JSON format ## from this package. bd = readData(system.file("extdata", "data_BB04.json", package = "rbayesfst")) summary(bd)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.