extract_counts: Extract and store data needed to compute blocked f2
In uqrmaie1/admixtools: Inferring demographic history from genetic data

extract_counts

R Documentation

Extract and store data needed to compute blocked f2

Description

Prepare data for various ADMIXTOOLS 2 functions. This function reads data from genotype files, and extracts data required to compute blocked f-statistics for any sets of samples. The data consists of .rds files with total and alternative allele counts for each individual, and products of total and alternative allele counts for each pair. The function calls packedancestrymap_to_afs or plink_to_afs and afs_to_f2_blocks.

Usage

extract_counts(
  pref,
  outdir,
  inds = NULL,
  blgsize = 0.05,
  maxmiss = 0,
  minmaf = 0,
  maxmaf = 0.5,
  transitions = TRUE,
  transversions = TRUE,
  auto_only = TRUE,
  keepsnps = NULL,
  maxmem = 8000,
  overwrite = FALSE,
  format = NULL,
  cols_per_chunk = NULL,
  verbose = TRUE
)

Arguments

`pref`	Prefix of PLINK/EIGENSTRAT/PACKEDANCESTRYMAP files. EIGENSTRAT/PACKEDANCESTRYMAP have to end in `.geno`, `.snp`, `.ind`, PLINK has to end in `.bed`, `.bim`, `.fam`
`outdir`	Directory where data will be stored.
`inds`	Individuals for which data should be read. Defaults to all individuals
`blgsize`	SNP block size in Morgan. Default is 0.05 (5 cM). If `blgsize` is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance.
`maxmiss`	Discard SNPs which are missing in a fraction of individuals greater than `maxmiss`
`minmaf`	Discard SNPs with minor allele frequency less than `minmaf`
`maxmaf`	Discard SNPs with minor allele frequency greater than than `maxmaf`
`transitions`	Set this to `FALSE` to exclude transition SNPs
`transversions`	Set this to `FALSE` to exclude transversion SNPs
`auto_only`	Keep only SNPs on chromosomes 1 to 22
`keepsnps`	SNP IDs of SNPs to keep. Overrides other SNP filtering options
`maxmem`	Maximum amount of memory to be used. If the required amount of memory exceeds `maxmem`, allele frequency data will be split into blocks, and the computation will be performed separately on each block pair. This doesn't put a precise cap on the amount of memory used (it used to at some point). Set this parameter to lower values if you run out of memory while running this function. Set it to higher values if this function is too slow and you have lots of memory.
`overwrite`	Overwrite existing files in `outdir`
`format`	Supply this if the prefix can refer to genotype data in different formats and you want to choose which one to read. Should be `plink` to read `.bed`, `.bim`, `.fam` files, or `eigenstrat`, or `packedancestrymap` to read `.geno`, `.snp`, `.ind` files.
`cols_per_chunk`	Number of genotype chunks to store on disk. Setting this to a positive integer makes the function slower, but requires less memory. The default value for `cols_per_chunk` in `extract_afs` is 10. Lower numbers will lower the memory requirement but increase the time it takes.
`verbose`	Print progress updates