View source: R/holoStats.R View source: R/OLD/holoStats.R
holoStats | R Documentation |
Calculates a variety of summary statistics from simulated SNP data.
holoStats(out, popDF, cores = 1)
out |
SNP dataset produced by coalescent simulation (using |
popDF |
a three-column data frame produced by |
cores |
the number of CPUs to use in parallelized portions of the function. |
This function calculates a wide variety of summary statistics from simulated or observed SNP datasets. Summary statistics can broadly be grouped into statistics: i) related to levels of genetic diversity within populations, ii) related to levels of differentiation between populations, and iii) summarizing spatial patterns in the dataset.
Statistics measuring levels of genetic diversity within populations:
Number of segregating sites (polymorphic SNPs; S) per population
Number of private segregating sites (pS) per popoulation and total across populations
"Frequency down-weighted marker values", calculated as the sum (across loci) of location-specific minor allele frequencies dividied by global minor allele frequency (see Schonswetter & Tribsch 2005)
Mean and standard deviation of minor allele counts per locus
Mean and standard deviations of Euclidean distances in spatial PCA space, considering within population comparisons (see Alvarado-Serrano & Hickerson 2016)
Levels of linkage disequilibrium among loci within populations (using correlation in allele frequencies within individuals, r^2; Hill 1981)
Statistics measuring levels of genetic differentiation between populations:
Pairwise Fst (Wright 1949) from local and combined expected heterozygosity (i.e., Hs and Ht)
Pairwise Nei's genetic distance (Nei 1973)
Mean and standard deviation of pairwise differences in minor allele counts per locus
Mean and standard deviations of Euclidean distances in spatial PCA space, considering between population comparisons (see Alvarado-Serrano & Hickerson 2016)
Pairwise Bray-Curtis dissimilarity calculated from Euclidean distance in allele frequencies between populations
Conditional genetic distance (see Dyer et al. 2010) between populations on a population graph calculated from SNP data
Statistics summarizing spatial patterns in the dataset:
Summaries of regressions between geographic distance and genetic distance (Fst, Nei's distance, etc.) from Isolation By Distance analyses
Summaries of polynomial regressions between latitude / longitude and measures of diversity (expected heterozygosity, principal component scores on the first 3 axes, LD)
Summaries of the Geographic Spectrum of Shared Alleles (Harpending's raggedness index, GSSA mean, and GSSA variance per population; see Alvarado-Serrano & Hickerson 2018)
Summaries of spatial autocorrelation in the site frequency spectrum (see Smouse & Peakall 1999 and Alvarado-Serrano & Hickerson 2016)
Measures of spatial autocorrelation in genetic data, including Moran's I (see Moran 1950) and the beta, sill, nugget, and range of the variogram
Summaries of Monmonier's algorithm - mean and standard deviation of path length, x and y positions of path vertices (see Alvarado-Serrano & Hickerson 2016)
Pairwise directionality index (see Peter & Slatkin 2013)
Per population betweenness and closeness centralities (Freeman 1979) from a population graph calculated from SNP genotype data
Returns a one-row dataframe with variable numbers of columns, representing summary statistics calculated from SNP data. Summaries in the output are named as follows:
tot_SNPs
The total number of polymorphic sites in the dataset, this should equal the nloci
argument to runFSC_step_agg3()
, and is not used in subsequent analyses
S.<POP_ID>
The number of polymorphic SNP loci in each population.
pS.<POP_ID>
The number of private polymorphic SNP loci (SNPs that are only variable in the focal population) per population.
DW_<POP_ID>
Frequency-weighted marker values (see Schonswetter & Tribsch 2005) for each population.
tot_priv
The total number of private SNP loci across populations.
Fst_<POP_ID1>.<POP_ID2>
Pairwise Fst (=1-Hs/Ht) between populations.
Nei_<POP_ID1>.<POP_ID2>
Pairwise Nei's genetic distance between populations.
ibdfst.*
Summaries of linear relationships between genetic distance and geographic distance (slope and intercept of IBD relationship)
bsfst.*
Summaries of broken-stick relationships between genetic distance and geographic distance (breakpoint, difference in log-likelihood compared to IBD model)
ibdedist.*
Similar to ibdfst.*
, but using Euclidean distance rather than Fst to measure genetic differentiation.
bsedist.*
Similar to bsfst.*
, but using Euclidean distance rather than Fst to measure genetic differentiation.
ibdnei.*
Similar to ibdfst.*
, but using Nei's genetic distance rather than Fst to measure genetic differentiation.
bsnei.*
Similar to bsfst.*
, but using Nei's genetic distance rather than Fst to measure genetic differentiation.
helat.*
Summaries of a polynomial model (intercept, first, and second coefficients) relating heterozygosity to latitude
helong.*
Summaries of a polynomial model (intercept, first, and second coefficients) relating heterozygosity to longitude
pc*lat.*
Summaries of a polynomial model (intercept, first, and second coefficients) relating position on principal component (axis 1, 2, or 3) to latitude
pc*long.*
Summaries of a polynomial model (intercept, first, and second coefficients) relating position on principal component (axis 1, 2, or 3) to longitude
W.Mean:<POP_ID>
Mean minor allele count (from the 1-D SFS) across loci per population.
W.SD:<POP_ID>
Standard deviation of minor allele count (from the 1-D SFS) across loci per population.
W.Mean.Diff:<POP_ID1>_<POP_ID2>
Mean pairwise difference in allele counts between two populations (from the 2-D SFS).
W.SD.Diff:<POP_ID1>_<POP_ID2>
Standard deviation of pairwise differences in allele counts between two populations (from the 2-D SFS).
r.*
Summaries of spatial autocorrelation in the site frequency spectrum - mean and standard deviation, maximum correlation coefficient, lag associated with max correlation.
HRi_<POP_ID>
Harpending's raggedness index calculated from the Geographic Spectrum of Shared Alleles for each population.
gssa_mean_<POP_ID>
Mean distance of allele sharing for each population, calculated from the Geographic Spectrum of Shared Alleles.
gssa_var_<POP_ID>
Variance in the distribution of allele sharing distances for each population, calculated from the Geographic Spectrum of Shared Alleles.
Spca.Dmean_<POP_ID>
The mean inter-individual distance in PCA space among individuals within a population, calculated from a spatial PCA analysis.
Spca.Dsd_<POP_ID>
The standard deviation in inter-individual distance in PCA space among individuals within a population, calculated from a spatial PCA analysis.
Spca.Dmean_<POP_ID1>_<POP_ID2>
The mean inter-individual distance in PCA space between individuals sampled from two populations, calculated from a spatial PCA analysis.
Spca.Dsd__<POP_ID1>_<POP_ID2>
The standard deviation in inter-individual distance in PCA space between individuals sampled from two populations, calculated from a spatial PCA analysis.
BrayCurt_<POP_ID1>.<POP_ID2>
Bray-Curtis dissimilarity calculated from Euclidean distance in allele frequencies between populations.
Moran.Beta
Estimate of Moran's I measuring spatial autocorrelation in genetic data.
Mon.*
Summaries of results of Monmonier's algorithm applied to the genetic dataset, mean and standard deviation of path length, x and y positions of path vertices
Var.*
Summaries of the variogram measuring spatial autocorrelation in the genetic data - beta, sill, nuggget, and range of the variogram.
LD_<POP_ID>
Estimate of linkage disequilibrium within a population, using the correlation in allele frequencies within individuals (r^2).
ldlat.*
Summaries of a polynomial model (intercept, first, and second coefficients) relating estiamted LD to latitude.
ldlong.*
Summaries of a polynomial model (intercept, first, and second coefficients) relating estiamted LD to longitude.
Psi_<POP_ID1>.<POP_ID2>
Peter & Slatkin's (2013) directionality index between a pair of populations.
cGD-<POP_ID1>.<POP_ID2>
Conditional genetic distance between populations, from a population graph calculated from SNP genotype data.
bwness-<POP_ID>
Betweenness centrality for a population, from a population graph calculated from SNP genotype data.
cness-<POP_ID>
Closeness centrality for a population, from a population graph calculated from SNP genotype data.
makePopdf
, run_FSC_step_agg3
, privateAlleles
, prcomp
, spca
, LD.Measures
, betweenness
, closeness
, popgraph
, vegdist
, https://onlinelibrary.wiley.com/doi/full/10.1111/evo.12202, https://www.nature.com/articles/6885180, https://www.biorxiv.org/content/10.1101/457556v1, https://doi.org/10.1017/S0016672300020553, https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1469-1809.1949.tb02451.x, https://www.pnas.org/doi/10.1073/pnas.70.12.3321, https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12489, https://onlinelibrary.wiley.com/doi/abs/10.2307/25065429, https://doi.org/10.1016/0378-8733(78)90021-7, https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1365-294X.2010.04748.x
library(holoSimCell)
parms <- drawParms(control = system.file("extdata/ashpaper","Ash_priors.csv",package="holoSimCell"))
load(file=paste0(system.file(package="holoSimCell"),"/extdata/landscapes/",pollenPulls[[1]]$file))
refpops <- pollenPulls[[1]]$refs
avgCellsz <- mean(c(res(landscape$sumrast)))
ph = getpophist2.cells(h = landscape$details$ncells, xdim = landscape$details$x.dim, ydim = landscape$details$y.dim,
landscape=landscape,
refs=refpops,
refsz=parms$ref_Ne,
lambda=parms$lambda,
mix=parms$mix,
shortscale=parms$shortscale*avgCellsz,
shortshape=parms$shortshape,
longmean=parms$longmean*avgCellsz,
ysz=res(landscape$sumrast)[2],
xsz=res(landscape$sumrast)[1],
K = parms$Ne)
gmap=make.gmap(ph$pophist,
xnum=2, #number of cells to aggregate in x-direction
ynum=2) #number of aggregate in the y-direction
ph2 <- pophist.aggregate(ph,gmap=gmap)
loc_parms <- data.frame(marker = "snp",
nloci = parms$nloci,
seq_length = parms$seq_length,
mu = parms$mu)
preLGMparms <- data.frame(preLGM_t = parms$preLGM_t/parms$G,
preLGM_Ne = parms$preLGM_Ne,
ref_Ne = parms$ref_Ne)
out <- runFSC_step_agg3(ph = ph2,
l = landscape,
sample_n = 14,
preLGMparms = preLGMparms,
label = "test",
delete_files = TRUE,
num_cores = 1,
exec = "fsc26",
loc_parms = loc_parms,
found_Ne = parms$found_Ne,
gmap = gmap,
MAF = 0.01,
maxloc = 50000)
popDF <- makePopdf(landscape,"cell")
stats <- holoStats(out, popDF, cores = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.