fst_wc | R Documentation |
This function implements the full FST and FIT estimators from Weir and Cockerham (1984). Handles very large data, passed as BEDMatrix or as a regular R matrix. Handles missing values correctly.
fst_wc( X, labs, FIT = FALSE, m = NA, ind_keep = NULL, loci_on_cols = FALSE, mem_factor = 0.7, mem_lim = NA, m_chunk_max = 1000 )
X |
The genotype matrix (BEDMatrix, regular R matrix, or function, same as |
labs |
A vector of subpopulation assignments for every individual. At least two subpoplations must be present. |
FIT |
If |
m |
The number of loci, required if |
ind_keep |
An optional vector of individuals to keep (as booleans or indexes, used to subset an R matrix). |
loci_on_cols |
Determines the orientation of the genotype matrix (by default, |
mem_factor |
Proportion of available memory to use loading and processing genotypes.
Ignored if |
mem_lim |
Memory limit in GB, used to break up genotype data into chunks for very large datasets.
Note memory usage is somewhat underestimated and is not controlled strictly.
Default in Linux and Windows is |
m_chunk_max |
Sets the maximum number of loci to process at the time. Actual number of loci loaded may be lower if memory is limiting. |
A list with the following named elements, in this order:
fst
: The genome-wide Fst estimate (scalar).
fst_loci
: A vector of per-locus Fst estimates.
data
: a 2-by-m matrix of per-locus numerator and denominator Fst estimates.
Useful to obtain a bootstrap distribution for the genome-wide Fst.
maf
: a vector of marginal allele frequency estimates, one for each locus.
Note that it has not been converted to minor allele frequencies.
If FIT = TRUE
, two additional elements, fit
and fit_loci
are appended to the list, and data
becomes a 3-by-m matrix, where the third column is the per-locus numerators of the Fit estimates (the denominators are the same for Fit and Fst).
The popkin package.
# dimensions of simulated data n_ind <- 100 m_loci <- 1000 k_subpops <- 10 n_data <- n_ind * m_loci # missingness rate miss <- 0.1 # simulate ancestral allele frequencies # uniform (0,1) # it'll be ok if some of these are zero p_anc <- runif(m_loci) # simulate some binomial data X <- rbinom(n_data, 2, p_anc) # sprinkle random missingness X[ sample(X, n_data * miss) ] <- NA # turn into a matrix X <- matrix(X, nrow = m_loci, ncol = n_ind) # create subpopulation labels # k_subpops groups of equal size labs <- ceiling( (1 : n_ind) / k_subpops ) # estimate FST using the Weir-Cockerham formula fst_wc_obj <- fst_wc(X, labs) # the genome-wide FST estimate fst_wc_obj$fst # vector of per-locus FST estimates fst_wc_obj$fst_loci # estimate FST and FIT using the Weir-Cockerham formula # (normally FIT estimates are not calculated, # this adds them to return object) fst_wc_obj <- fst_wc(X, labs, FIT = TRUE) # the genome-wide FIT estimate fst_wc_obj$fit # vector of per-locus FIT estimates fst_wc_obj$fit_loci
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.