f2_from_geno: Compute blocked f2 statistics
In uqrmaie1/admixtools: Inferring demographic history from genetic data

f2_from_geno

R Documentation

Compute blocked f2 statistics

Description

This function prepares data for various other ADMIXTOOLS 2 functions. It reads data from genotype files, computes allele frequencies and blocked f2-statistics for selected populations, and returns them as a 3d array.

Usage

f2_from_geno(
  pref,
  inds = NULL,
  pops = NULL,
  blgsize = 0.05,
  maxmem = 8000,
  maxmiss = 0,
  minmaf = 0,
  maxmaf = 0.5,
  pops2 = NULL,
  outpop = NULL,
  outpop_scale = TRUE,
  transitions = TRUE,
  transversions = TRUE,
  auto_only = TRUE,
  keepsnps = NULL,
  afprod = FALSE,
  fst = FALSE,
  poly_only = c("f2"),
  format = NULL,
  adjust_pseudohaploid = TRUE,
  remove_na = TRUE,
  apply_corr = TRUE,
  qpfstats = FALSE,
  verbose = TRUE,
  ...
)

Arguments

`pref`	Prefix of PLINK/EIGENSTRAT/PACKEDANCESTRYMAP files. EIGENSTRAT/PACKEDANCESTRYMAP have to end in `.geno`, `.snp`, `.ind`, PLINK has to end in `.bed`, `.bim`, `.fam`
`inds`	Individuals for which data should be extracted
`pops`	Populations for which data should be extracted. If both `pops` and `inds` are provided, they should have the same length and will be matched by position. If only `pops` is provided, all individuals from the `.ind` or `.fam` file in those populations will be extracted. If only `inds` is provided, each indivdual will be assigned to its own population of the same name. If neither `pops` nor `inds` is provided, all individuals and populations in the `.ind` or `.fam` file will be extracted.
`blgsize`	SNP block size in Morgan. Default is 0.05 (5 cM). If `blgsize` is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance.
`maxmem`	Maximum amount of memory to be used. If the required amount of memory exceeds `maxmem`, allele frequency data will be split into blocks, and the computation will be performed separately on each block pair. This doesn't put a precise cap on the amount of memory used (it used to at some point). Set this parameter to lower values if you run out of memory while running this function. Set it to higher values if this function is too slow and you have lots of memory.
`maxmiss`	Discard SNPs which are missing in a fraction of populations higher than `maxmiss`
`minmaf`	Discard SNPs with minor allele frequency less than `minmaf`
`maxmaf`	Discard SNPs with minor allele frequency greater than than `maxmaf`
`pops2`	If specified, only a pairs between `pops` and `pops2` will be computed
`outpop`	Keep only SNPs which are heterozygous in this population
`outpop_scale`	Scale f2-statistics by the inverse `outpop` heteroygosity (`1/(p(1-p))`). Providing `outpop` and setting `outpop_scale` to `TRUE` will give the same results as the original qpGraph* when the `outpop` parameter has been set, but it has the disadvantage of treating one population different from the others. This may limit the use of these f2-statistics for other models.
`transitions`	Set this to `FALSE` to exclude transition SNPs
`transversions`	Set this to `FALSE` to exclude transversion SNPs
`auto_only`	Keep only SNPs on chromosomes 1 to 22
`keepsnps`	SNP IDs of SNPs to keep. Overrides other SNP filtering options
`afprod`	Return negative average allele frequency products instead of f2-statistics. Setting `afprod = TRUE` will result in more precise f4-statistics when the original data had large amounts of missingness, and should be used in that case for `qpdstat` and `qpadm`. It can also be used for outgroup f3-statistics with a fixed outgroup (for example for `qpgraph`); values will be shifted by a constant amount compared to regular f3-statistics. This shift affects the fit of a graph only by small amounts, possibly less than bias in regular f3-statistics introduced by large amounts of missing data.
`fst`	Write files with pairwise FST for every population pair. Setting this to FALSE can make `extract_f2` faster and will require less memory.
`poly_only`	Specify whether SNPs with identical allele frequencies in every population should be discarded (`poly_only = TRUE`), or whether they should be used (`poly_only = FALSE`). By default (`poly_only = c("f2")`), these SNPs will be used to compute FST and allele frequency products, but not to compute f2 (this is the default option in the original ADMIXTOOLS).
`format`	Supply this if the prefix can refer to genotype data in different formats and you want to choose which one to read. Should be `plink` to read `.bed`, `.bim`, `.fam` files, or `eigenstrat`, or `packedancestrymap` to read `.geno`, `.snp`, `.ind` files.
`adjust_pseudohaploid`	Genotypes of pseudohaploid samples are usually coded as `0` or `2`, even though only one allele is observed. `adjust_pseudohaploid` ensures that the observed allele count increases only by `1` for each pseudohaploid sample. If `TRUE` (default), samples that don't have any genotypes coded as `1` among the first 1000 SNPs are automatically identified as pseudohaploid. This leads to slightly more accurate estimates of f-statistics. Setting this parameter to `FALSE` treats all samples as diploid and is equivalent to the ADMIXTOOLS `inbreed: NO` option. Setting `adjust_pseudohaploid` to an integer `n` will check the first `n` SNPs instead of the first 1000 SNPs.
`apply_corr`	Apply small-sample-size correction when computing f2-statistics (default `TRUE`)
`qpfstats`	Compute smoothed f2-statistics (default `FALSE`). In the presence of large amounts of missing data, this option can be used to retain information from all SNPs while introducing less bias than setting `maxmiss` to values greater than 0. When setting `qpfstats = TRUE`, most other options to `extract_f2` will be ignored. See `qpfstats` for more information. Arguments to `qpfstats` can be passed via `...`
`verbose`	Print progress updates
`...`	Pass arguments to `qpfstats`

Value

A 3d array of f2-statistics (or scaled allele frequency products if afprod = TRUE)

uqrmaie1/admixtools
Inferring demographic history from genetic data

f2_from_geno: Compute blocked f2 statistics
In uqrmaie1/admixtools: Inferring demographic history from genetic data

Compute blocked f2 statistics

Description

Usage

Arguments

Value

See Also

Related to f2_from_geno in uqrmaie1/admixtools...

R Package Documentation

Browse R Packages

We want your feedback!

uqrmaie1/admixtools Inferring demographic history from genetic data

f2_from_geno: Compute blocked f2 statistics In uqrmaie1/admixtools: Inferring demographic history from genetic data

Compute blocked f2 statistics

Description

Usage

Arguments

Value

See Also

Related to f2_from_geno in uqrmaie1/admixtools...

R Package Documentation

Browse R Packages

We want your feedback!

uqrmaie1/admixtools
Inferring demographic history from genetic data

f2_from_geno: Compute blocked f2 statistics
In uqrmaie1/admixtools: Inferring demographic history from genetic data