find.tree.popset: Find sets of populations that may used as scaffold tree

View source: R/find.tree.popset.R

find.tree.popsetR Documentation

Find sets of populations that may used as scaffold tree

Description

Find sets of populations that may used as scaffold tree

Usage

find.tree.popset(
  fstats,
  f3.zcore.threshold = -1.65,
  f4.zscore.absolute.threshold = 1.96,
  excluded.pops = NULL,
  nthreads = 1,
  verbose = TRUE
)

Arguments

fstats

Object of class fstats containing estimates of fstats (see the function compute.fstats)

f3.zcore.threshold

The significance threshold for Z-score of formal test of admixture based on the F3-statistics (default=-2)

f4.zscore.absolute.threshold

The significance threshold for |Z-score| of formal test of treeness based on the F4-statistics (default=2)

excluded.pops

Vector of pop names to be exclude from the exploration

nthreads

Number of available threads for parallelization of some part of the parsing (default=1, i.e., no parallelization)

verbose

If TRUE extra information is printed on the terminal

Details

The procedure first discards all the populations P that shows a significant signal of admixture with a Z-score for F3 statistics of the form F3(P;Q,R) < f3.zscore.thresholds. It then identifies all the sets of populations that pass the F4-based treeness with themselves. More precisely, for a given set E containing n populations, the procedure ensure that all the n(n-1)(n-2)(n-3)/8 possible F4 quadruplets have a |Z-score|<f4.zscore.absolute.threshold. The function aims at maximizing the size of the sets.

Value

A list with the following elements:

  1. "n.sets": The number of sets of (scaffold) unadmixed populations identified

  2. "set.size": The number of populations included in each set

  3. "pop.sets": A character matrix of n.sets rows and set.size columns giving for each set identified the names of the included populations.

  4. "Z_f4.range": A matrix of n.sets rows and 2 columns reported for each set the range of variation (min and max value) of the absolute F4 Z-scores for the quadruplets passing the treeness test. More precisely, for a given set consisting of n=set.size populations, a total of n(n-1)(n-2)(n-3)/8 quadruplets can be formed. Yet, any set of four populations A, B, C and D is represented by three quadruplets A,B;C,D (or one of its seven other equivalent combinations formed by permuting each pairs); A,C;B,D (or one of its seven other equivalent combinations) and A,D;B,C (or one of its seven other combinations). Among these three, only a single quadruplet is expected to pass the treeness test (i.e., if the correct unrooted tree topology is (A,C;B,D), then the absoulte value of the Z-scores associated to F4(A,B;C,D) and F4(A,D;B,C) or their equivalent will be high.

  5. "passing.quadruplets": A matrix of n.sets rows and set.size columns reporting for each sets the n(n-1)(n-2)(n-3)/24 quadruplets that pass the treeness test (see Z_f4.range detail).

See Also

see compute.fstats.

Examples

make.example.files(writing.dir=tempdir())
pooldata=popsync2pooldata(sync.file=paste0(tempdir(),"/ex.sync.gz"),poolsizes=rep(50,15))
res.fstats=compute.fstats(pooldata,nsnp.per.bjack.block = 50) 
#NOTE: toy example (in practice nsnp.per.bjack.block should be higher)
popsets=find.tree.popset(res.fstats,f3.zcore.threshold=-3)  

poolfstat documentation built on Sept. 8, 2023, 5:49 p.m.