extract_f2: Compute and store blocked f2 statistics

Description Usage Arguments Value See Also Examples

Description

This function prepares data for various other ADMIXTOOLS 2 functions. It reads data from genotype files, computes allele frequencies and blocked f2-statistics for selected populations, and writes the results to outdir.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
extract_f2(
  pref,
  outdir,
  inds = NULL,
  pops = NULL,
  blgsize = 0.05,
  maxmem = 8000,
  maxmiss = 0,
  minmaf = 0,
  maxmaf = 0.5,
  pops2 = NULL,
  outpop = NULL,
  outpop_scale = TRUE,
  transitions = TRUE,
  transversions = TRUE,
  auto_only = TRUE,
  keepsnps = NULL,
  overwrite = FALSE,
  format = NULL,
  adjust_pseudohaploid = TRUE,
  cols_per_chunk = NULL,
  verbose = TRUE
)

Arguments

pref

Prefix of PLINK/EIGENSTRAT/PACKEDANCESTRYMAP files. EIGENSTRAT/PACKEDANCESTRYMAP have to end in .geno, .snp, .ind, PLINK has to end in .bed, .bim, .fam

outdir

Directory where data will be stored.

inds

Individuals for which data should be extracted

pops

Populations for which data should be extracted. If both pops and inds are provided, they should have the same length and will be matched by position. If only pops is provided, all individuals from the .ind or .fam file in those populations will be extracted. If only inds is provided, each indivdual will be assigned to its own population of the same name. If neither pops nor inds is provided, all individuals and populations in the .ind or .fam file will be extracted.

blgsize

SNP block size in Morgan. Default is 0.05 (50 cM). If blgsize is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance.

maxmem

Maximum amount of memory to be used. If the required amount of memory exceeds maxmem, allele frequency data will be split into blocks, and the computation will be performed separately on each block pair.

maxmiss

Discard SNPs which are missing in a fraction of populations higher than maxmiss

minmaf

Discard SNPs with minor allele frequency less than minmaf

maxmaf

Discard SNPs with minor allele frequency greater than than maxmaf

pops2

If specified, only a pairs between pops and pops2 will be computed

outpop

Keep only SNPs which are heterozygous in this population

outpop_scale

Scale f2-statistics by the inverse outpop heteroygosity (1/(p*(1-p))). Providing outpop and setting outpop_scale to TRUE will give the same results as the original qpGraph when the outpop parameter has been set, but it has the disadvantage of treating one population different from the others. This may limit the use of these f2-statistics for other models.

transitions

Set this to FALSE to exclude transition SNPs

transversions

Set this to FALSE to exclude transversion SNPs

auto_only

Keep only SNPs on chromosomes 1 to 22

keepsnps

SNP IDs of SNPs to keep. Overrides other SNP filtering options

overwrite

Overwrite existing files in outdir

format

Supply this if the prefix can refer to genotype data in different formats and you want to choose which one to read. Should be plink to read .bed, .bim, .fam files, or eigenstrat, or packedancestrymap to read .geno, .snp, .ind files.

adjust_pseudohaploid

Genotypes of pseudohaploid samples are usually coded as 0 or 2, even though only one allele is observed. adjust_pseudohaploid ensures that the observed allele count increases only by 1 for each pseudohaploid sample. If TRUE (default), samples that don't have any genotypes coded as 1 among the first 1000 SNPs are automatically identified as pseudohaploid. This leads to slightly more accurate estimates of f-statistics. Setting this parameter to FALSE treats all samples as diploid and is equivalent to the ADMIXTOOLS inbreed: NO option.

cols_per_chunk

Number of allele frequency chunks to store on disk. Setting this to a positive integer makes the function slower, but requires less memory. The default value for cols_per_chunk in extract_afs is 10. Lower numbers will lower the memory requirement but increase the time it takes.

verbose

Print progress updates

Value

SNP metadata (invisibly)

See Also

f2_from_precomp for reading the stored f2-statistics back into R, f2_from_geno to skip writting f2-statistics to disk and return them directly

Examples

1
2
3
4
5
6
## Not run: 
pref = 'my/genofiles/prefix'
f2dir = 'my/f2dir/'
extract_f2(pref, f2dir, pops = c('popA', 'popB', 'popC'))

## End(Not run)

uqrmaie1/admixtools documentation built on Sept. 16, 2020, 5:55 a.m.