read.data: Read Data
In DetSel: A computer program to detect markers responding to selection

Description Usage Arguments Details Value References Examples

Read the data file in DetSel format.

1	read.data(infile,dominance,maf,a,b)

`infile`	An input file in DetSel format.
`dominance`	A a logical variable, which is FALSE if co-dominant data are considered (e.g., microsatellite markers, SNPs, etc.), or TRUE, if bi-allelic dominant data are considered (e.g., AFLPs).
`maf`	The maximum allele frequency (the frequency of the most frequent allele over the full sample) to be considered in both the input file and the simulated data.
`a,b`	The parameters for the beta prior distribution, used in Zhivotovsky's (1999) Bayesian method to compute the underlying allele frequencies. The default values are a = b = 0.25, as suggested by Mark A. Beaumont in the DFdist manual, yet the user may alternatively chose to use Zhivotovsky's equation (13) to compute estimates of a and b from the data. Note that neither the parameter a nor the parameter b are not needed if dominance = FALSE.

The input file should be a space- or tab-delimited ASCII text file. The first line is a 0 / 1 indicator. ‘0’ indicates that the data matrix for each locus is a populations x alleles matrix; ‘1’ indicates that the data matrix for each locus is an alleles x populations matrix. The second line contains the number of populations. The third line contains the number of loci. Then, the data for each locus consists in the number of alleles at that locus, followed by the data matrix at that locus, with each row corresponding to the same allele (if the indicator variable is 1) or to the same population (if the indicator variable is 0). For dominant data, the data consists in the number of genotypes, not the number of alleles. It is important to note that the frequency of the homozygote individuals for the recessive allele appear first in either the rows or columns of the data matrix. In the following example, the data consists in 2 populations and 2 loci, with 5 alleles at the first locus and 8 alleles at the second locus.

0
2
2

5
1 0 4 10 5
0 1 13 0 6

8
3 1 1 0 0 0 1 14
6 0 2 1 2 5 2 2

Spaces and blank lines can be included as desired.

For dominant data, it is important to note that the frequency of the homozygote individuals for the recessive allele appears first in either the rows or columns of the data matrix.

The command line read.data creates a file named ‘infile.dat’, a file named ‘sample_sizes.dat’ and a set of files named ‘plot_i_j.dat’ where i and j correspond to population numbers, so that each file ‘plot_i_j.dat’ corresponds to the pairwise analysis of populations i and j. In the file infile.dat, each line corresponds to the pairwise analysis of populations i and j. Each line contains (in that order): the name of the output simulation file, the numbers i and j, the multi-locus estimates of F_1 and F_2, and Weir and Cockerham's (1984) estimate of F_{ST}. The file sample_sizes.dat contains sample sizes information, for internal use only. In the files ‘plot_i_j.dat’, each line corresponds to one locus observed in the data set. Each line contains (in that order): the locus-specific estimates of F_1 and F_2, Weir and Cockerham's (1984) estimate of F_{ST}, Nei's heterozygosity (H_e), the number of alleles at that locus in the pooled sample, and the rank of the locus in the data set.

The output files are saved in the current directory.

Weir, B. S., and Cockerham, C. C. (1984) Estimating F-statistics for the analysis of population structure, Evolution 38: 1358–1370.

Zhivotovsky, L. A. (1999) Estimating population structure in diploids with multilocus dominant DNA markers, Molecular Ecology 8, 907–913

## This is to generate an example file in the working directory.
make.example.files()

## This will read an input file named 'data.dat' that contains co-dominant markers,
## and a maximum allele frequency of 0.99 will be applied (i.e., by removing 
## marker loci in the observed and simulated datasets that have an allele with
## frequency larger than 0.99).
read.data(infile = 'data.dat',dominance = FALSE,maf = 0.99)

Loading required package: ash
[1] TRUE
Read 967 items
The data file data.dat contains 100 loci, with 2-5 alleles per locus, and 2 populations
The average values of population-specific measures of differentiation are:
----------------------------------------------
Pair		F_1			F_2
1-2		0.0847			0.054
----------------------------------------------