qp3pop: Estimate f3 statistics

View source: R/qpdstat.R

qp3popR Documentation

Estimate f3 statistics

Description

Computes f3 statistics of the form f3(A; B, C). This is generally equivalent to (f2(A, B) + f2(A, C) - f2(B, C)) / 2 and to f4(A, B; A, C). However, the exact estimates depend on the type of input data, selection of SNPs, and on the values of some of the arguments, which are described below.

Usage

qp3pop(
  data,
  pop1 = NULL,
  pop2 = NULL,
  pop3 = NULL,
  boot = FALSE,
  sure = FALSE,
  unique_only = TRUE,
  blgsize = NULL,
  block_lengths = NULL,
  verbose = FALSE,
  ...
)

Arguments

data

Input data in one of three forms:

  1. A 3d array of blocked f2 statistics, output of f2_from_precomp or f2_from_geno (fastest option)

  2. A directory which contains pre-computed f2-statistics

  3. The prefix of genotype files (slowest option)

pop1

One of the following four:

  1. NULL: all possible population combinations will be returned

  2. A vector of population labels. All combinations with the other pop arguments will be returned

  3. A matrix with population combinations to be tested, with one population per column and one combination per row. Other pop arguments will be ignored.

  4. the location of a file (poplistname or popfilename) which specifies the populations or population combinations to be tested. Other pop arguments will be ignored.

pop2

A vector of population labels

pop3

A vector of population labels

boot

If FALSE (the default), block-jackknife resampling will be used to compute standard errors. Otherwise, block-bootstrap resampling will be used to compute standard errors. If boot is an integer, that number will specify the number of bootstrap resamplings. If boot = TRUE, the number of bootstrap resamplings will be equal to the number of SNP blocks.

sure

The number of population combinations can get very large. This is a safety option that stops you from accidently computing all combinations if that number is large.

unique_only

If TRUE (the default), redundant combinations will be excluded

verbose

Print progress updates

...

Additional arguments passed to f3blockdat_from_geno if data is a genotype prefix, or to get_f2 otherwise

apply_corr

With apply_corr = FALSE, no bias correction is performed. With apply_corr = TRUE (the default), a bias correction term based on the heterozygosity in the first population is subtracted from the f3 estimate. The option is ineffective if the first argument is an array of pre-computed f2-statistics. In that case, the bias correction can be toggled by the apply_corr parameter in extract_f2 or f2_from_geno. The heterozygosity calculation used in apply_corr = TRUE requires at least two haploid samples or one diploid sample in the first population. Otherwise, apply_corr = TRUE will result in missing values.

outgroupmode

With outgroupmode = FALSE, estimates of f3 will be normalized by estimates of the heterozygosity of the target population. This is the default option if the first argument is the prefix of genotype data. If the first argument is an array of pre-computed f2-statistics, then no normalization can be performed, which corresponds to outgroupmode = TRUE. As with apply_corr = TRUE, the heterozygosity calculation used in outgroupmode = FALSE requires at least two pseudodiploid samples or one diploid sample in the first population. Otherwise, outgroupmode = FALSE will result in missing values.

poly_only

Only keep SNPs with mean allele frequency not equal to 0 or 1. Defaults to FALSE. Only effective if the first argument is the prefix of genotype data. If the first argument is an array of pre-computed f2-statistics, use the poly_only argument in extract_f2 or f2_from_geno

Details

The default values of the parameters apply_corr, outgroupmode, poly_only depend on the first argument (data), and they can affect the estimated f3-statistics. When the data is the prefix of genotype data, the default parameters are generally the same as in the original qp3pop program. When data is an array of pre-computed f2-statistics, the parameters may be set while computing f2-statistics. If the first population is a single pseudodiploid sample, it is not possible to get unbiased estimates of f3. To get biased estimates for a single pseudodiploid sample from genotype data, set outgroupmode = TRUE and apply_corr = FALSE. Under this combination of parameters, ⁠f3(A; B, C)⁠ should also be identical to ⁠f4(A, B; A, C)⁠, since f4-statistics generally do not require any bias correction. See examples for more information.

Value

qp3pop returns a data frame with f3 statistics, with columns for populations 1 to 3, f3-estimate (est), standard error of the estimate (se), z-score (z, est/se), p-value (2*(1-pnorm(z))), and the number of SNPs used (n; only if first argument is genotype prefix)

Alias

f3

References

Patterson, N. et al. (2012) Ancient admixture in human history Genetics

Peter, B. (2016) Admixture, Population Structure, and F-Statistics Genetics

Examples

## Not run: 
pop1 = 'Denisova.DG'
pop2 = c('Altai_Neanderthal.DG', 'Vindija.DG')
pop3 = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG')
qp3pop(example_f2_blocks, pop1, pop2, pop3)

pops = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG')
qp3pop(example_f2_blocks, pops)
qp3pop(example_f2_blocks, pops, unique_only = FALSE)
qp3pop(f2_dir, pop1, pop2, pop3)

# Below are three scenarios, and in each one `qp3pop()` and `qp3pop_wrapper()`
# should give the same or very similar estimates. Note that to compute `f3(A; B, C)`,
# `qp3pop_wrapper()`, and the original qp3pop program, expect populations to be in the order `B`, `C`, `A`.

prefix = '/path/to/geno/prefix'
qp3popbin = '/path/to/AdmixTools/bin/qp3Pop'
pops = dimnames(example_f2_blocks)[[1]]

# 1. target diploid, outgroupmode NO (this is the default when passing a geno file prefix)
qp3pop_wrapper(prefix, pops[2], pops[3], pops[1], bin = qp3popbin, outgroupmode = FALSE)
qp3pop(prefix, pops[1], pops[2], pops[3])
  # est, se, z match well; n is higher
qp3pop(prefix, pops[1], pops[2], pops[3], poly_only = TRUE)
  # est, se, z match less well; n is identical

# 2. target diploid, outgroupmode YES (this is the only option with precomputed f2-stats)
qp3pop_wrapper(prefix, pops[2], pops[3], pops[1], bin = qp3popbin, outgroupmode = TRUE)
qp3pop(prefix, pops[1], pops[2], pops[3], outgroupmode = TRUE)
  # est, se, z match (except for factor 1000)
f2b = f2_from_geno(prefix, pops = pops[1:3], poly_only = FALSE)
qp3pop(f2b, pops[1], pops[2], pops[3])

# 3. target pseudodiploid (no heterozygotes means heterozygosity rate correction is not possible)
qp3pop_wrapper(prefix, pops[1], pops[3], pops[2], bin = qp3popbin, outgroupmode = TRUE)
qp3pop(prefix, pops[2], pops[1], pops[3], outgroupmode = TRUE, apply_corr = FALSE)
  # est, se, z match (except for factor 1000)

## End(Not run)

uqrmaie1/admixtools documentation built on Nov. 3, 2024, 12:56 a.m.