extract_f2_large: Compute and store blocked f2 statistics
In uqrmaie1/admixtools: Inferring demographic history from genetic data

extract_f2_large

R Documentation

Compute and store blocked f2 statistics

Description

extract_f2_large does the same as extract_f2, but it requires less memory and is slower. outdir has to be set in extract_f2_large.

Usage

extract_f2_large(
  pref,
  outdir,
  inds = NULL,
  pops = NULL,
  blgsize = 0.05,
  cols_per_chunk = 10,
  maxmiss = 0,
  minmaf = 0,
  maxmaf = 0.5,
  minac2 = FALSE,
  outpop = NULL,
  outpop_scale = TRUE,
  transitions = TRUE,
  transversions = TRUE,
  keepsnps = NULL,
  snpblocks = NULL,
  overwrite = FALSE,
  format = NULL,
  adjust_pseudohaploid = TRUE,
  afprod = TRUE,
  fst = TRUE,
  poly_only = c("f2"),
  apply_corr = TRUE,
  verbose = TRUE
)

Arguments

`pref`	Prefix of PLINK/EIGENSTRAT/PACKEDANCESTRYMAP files. EIGENSTRAT/PACKEDANCESTRYMAP have to end in `.geno`, `.snp`, `.ind`, PLINK has to end in `.bed`, `.bim`, `.fam`
`outdir`	Directory where data will be stored.
`inds`	Individuals for which data should be extracted
`pops`	Populations for which data should be extracted. If both `pops` and `inds` are provided, they should have the same length and will be matched by position. If only `pops` is provided, all individuals from the `.ind` or `.fam` file in those populations will be extracted. If only `inds` is provided, each indivdual will be assigned to its own population of the same name. If neither `pops` nor `inds` is provided, all individuals and populations in the `.ind` or `.fam` file will be extracted.
`blgsize`	SNP block size in Morgan. Default is 0.05 (5 cM). If `blgsize` is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance.
`cols_per_chunk`	Number of populations per chunk. Lowering this number will lower the memory requirements when running
`maxmiss`	Discard SNPs which are missing in a fraction of populations higher than `maxmiss`
`minmaf`	Discard SNPs with minor allele frequency less than `minmaf`
`maxmaf`	Discard SNPs with minor allele frequency greater than than `maxmaf`
`minac2`	Discard SNPs with allele count lower than 2 in any population (default `FALSE`). This option should be set to `TRUE` when computing f3-statistics where one population consists mostly of pseudohaploid samples. Otherwise heterozygosity estimates and thus f3-estimates can be biased. `minac2 == 2` will discard SNPs with allele count lower than 2 in any non-singleton population (this option is experimental and is based on the hypothesis that using SNPs with allele count lower than 2 only leads to biases in non-singleton populations). While the `minac2` option discards SNPs with allele count lower than 2 in any population, the `qp3pop` function will only discard SNPs with allele count lower than 2 in the first (target) population (when the first argument is the prefix of a genotype file).
`outpop`	Keep only SNPs which are heterozygous in this population
`outpop_scale`	Scale f2-statistics by the inverse `outpop` heteroygosity (`1/(p(1-p))`). Providing `outpop` and setting `outpop_scale` to `TRUE` will give the same results as the original qpGraph* when the `outpop` parameter has been set, but it has the disadvantage of treating one population different from the others. This may limit the use of these f2-statistics for other models.
`transitions`	Set this to `FALSE` to exclude transition SNPs
`transversions`	Set this to `FALSE` to exclude transversion SNPs
`keepsnps`	SNP IDs of SNPs to keep. Overrides other SNP filtering options
`overwrite`	Overwrite existing files in `outdir`
`format`	Supply this if the prefix can refer to genotype data in different formats and you want to choose which one to read. Should be `plink` to read `.bed`, `.bim`, `.fam` files, or `eigenstrat`, or `packedancestrymap` to read `.geno`, `.snp`, `.ind` files.
`adjust_pseudohaploid`	Genotypes of pseudohaploid samples are usually coded as `0` or `2`, even though only one allele is observed. `adjust_pseudohaploid` ensures that the observed allele count increases only by `1` for each pseudohaploid sample. If `TRUE` (default), samples that don't have any genotypes coded as `1` among the first 1000 SNPs are automatically identified as pseudohaploid. This leads to slightly more accurate estimates of f-statistics. Setting this parameter to `FALSE` treats all samples as diploid and is equivalent to the ADMIXTOOLS `inbreed: NO` option. Setting `adjust_pseudohaploid` to an integer `n` will check the first `n` SNPs instead of the first 1000 SNPs.
`afprod`	Write files with allele frequency products for every population pair. Setting this to FALSE can make `extract_f2` faster and will require less memory.
`fst`	Write files with pairwise FST for every population pair. Setting this to FALSE can make `extract_f2` faster and will require less memory.
`poly_only`	Specify whether SNPs with identical allele frequencies in every population should be discarded (`poly_only = TRUE`), or whether they should be used (`poly_only = FALSE`). By default (`poly_only = c("f2")`), these SNPs will be used to compute FST and allele frequency products, but not to compute f2 (this is the default option in the original ADMIXTOOLS).
`apply_corr`	Apply small-sample-size correction when computing f2-statistics (default `TRUE`)
`verbose`	Print progress updates

Details

extract_f2_large requires less memory because it writes allele frequency data to disk, and doesn't store the allele frequency matrix for all populations and SNPs in memory. If you still run out of memory, reduce cols_per_chunk. This function is a wrapper around extract_afs and afs_to_f2, and is slower than extract_f2. It may be faster to call extract_afs and afs_to_f2 directly, parallelizing over the different calls to afs_to_f2.

Value

SNP metadata (invisibly)

Examples

## Not run: 
pref = 'my/genofiles/prefix'
f2dir = 'my/f2dir/'
extract_f2_large(pref, f2dir, pops = c('popA', 'popB', 'popC'))

## End(Not run)

uqrmaie1/admixtools documentation built on July 16, 2025, 4:01 p.m.

uqrmaie1/admixtools index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

uqrmaie1/admixtools
Inferring demographic history from genetic data

extract_f2_large: Compute and store blocked f2 statistics
In uqrmaie1/admixtools: Inferring demographic history from genetic data

Compute and store blocked f2 statistics

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to extract_f2_large in uqrmaie1/admixtools...

R Package Documentation

Browse R Packages

We want your feedback!

uqrmaie1/admixtools Inferring demographic history from genetic data

extract_f2_large: Compute and store blocked f2 statistics In uqrmaie1/admixtools: Inferring demographic history from genetic data

Compute and store blocked f2 statistics

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to extract_f2_large in uqrmaie1/admixtools...

R Package Documentation

Browse R Packages

We want your feedback!

uqrmaie1/admixtools
Inferring demographic history from genetic data

extract_f2_large: Compute and store blocked f2 statistics
In uqrmaie1/admixtools: Inferring demographic history from genetic data