afs_to_f2_blocks: Compute all pairwise f2 statistics

Description Usage Arguments Details Examples

View source: R/fstats.R

Description

This function takes a allele frequency data and computes blocked f2 statistics for all population pairs, which are written to outdir. f2 for each SNP is computed as (p1 - p2)^2 - p1 (1 - p1)/(n1 - 1) - p2 (1 - p2)/(n2 - 1), where p1 and p2 are allele frequencies in populations 1 and 2, and n1 and n2 is the number of non-missing haplotypes in populations 1 and 2. See details

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
afs_to_f2_blocks(
  afdat,
  maxmem = 8000,
  blgsize = 0.05,
  poly_only = TRUE,
  pops1 = NULL,
  pops2 = NULL,
  outpop = NULL,
  outdir = NULL,
  overwrite = FALSE,
  verbose = TRUE
)

Arguments

afdat

A list with three items with the same SNP in each row (generated by packedancestrymap_to_afs or plink_to_afs)

  • afs A matrix of allele frequencies for all populations (columns) and SNPs (rows)

  • counts A matrix of allele counts for all populations (columns) and SNPs (rows)

  • snpdat A data frame with SNP metadata

maxmem

split up allele frequency data into blocks, if memory requirements exceed maxmem MB.

blgsize

SNP block size in Morgan. Default is 0.05 (50 cM). If blgsize is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance.

poly_only

Exclude sites with identical allele frequencies in all populations

outpop

If specified, f2-statistics will be weighted by heterozygosity in this population

outdir

Directory into which to write f2 data (if NULL, data is returned instead)

overwrite

Should existing files be overwritten? Only relevant if outdir is not NULL

verbose

Print progress updates

pop1

pops1 and pops2 can be specified if only a subset of pairs should be computed.

pop2

pops1 and pops2 can be specified if only a subset of pairs should be computed.

Details

For each population pair, each of the i = 1, …, n resutling values (n is around 700 in practice) is the mean f2 estimate across all SNPs except the ones in block i.

- p1 (1 - p1)/(2 n1 - 1) - p2 (1 - p2)/(2 n2 - 1) is a correction term which makes the estimates unbiased at low sample sizes.

Examples

1
2
3
4
5
## Not run: 
afdat = plink_to_afs('/my/geno/prefix')
afs_to_f2_blocks(afdat, outdir = '/my/f2/data/')

## End(Not run)

uqrmaie1/admixtools documentation built on Sept. 16, 2020, 5:55 a.m.