Q_bootstrap: Generate and analyze bootstrap replicates of one or more Q...

View source: R/fstruct_functions.R

Q_bootstrapR Documentation

Generate and analyze bootstrap replicates of one or more Q matrices

Description

Generates bootstrap replicate Q matrices, computes Fst/FstMax for each bootstrap replicate, produces several plots of the bootstrap distributions of Fst/FstMax for each provided Q matrix, and runs two statistical tests comparing these bootstrap distributions. The tests comparing bootstrap distributions of Fst/FstMax facilitate statistical comparison of the variability in each of multiple Q matrices.

Usage

Q_bootstrap(matrices, n_replicates, K, seed, group)

Arguments

matrices

A dataframe, matrix, or array representing a Q matrix or a (possibly named) list of arbitrarily many Q matrices. For each Q matrix, matrix rows represent individuals and the last K columns contain individual membership coefficients (when restricted to the last K columns, the rows must sum to approximately 1). If the matrices are not named (e.g., matrices = list(matrix1, matrix2) instead of matrices = list(A = matrix1, B = matrix2)), the matrices will be numbered in the order they are provided in the list. If matrices is a single matrix, dataframe, or array and group is specified, the matrix will be split into multiple Q matrices, one for each distinct value of the column group, which will each be analyzed separately.

n_replicates

The number of bootstrap replicate matrices to generate for each provided Q matrix.

K

Optional; the number of ancestral clusters in each provided Q matrix, or a vector of such K values if the value of Q differs between matrices. If a single K is provided, each individual in every matrix must have K membership coefficients. If a vector of multiple K values is provided, matrices must be a list and the i^{th} entry of K must correspond to the i^{th} Q matrix in matrices. The default value of K is the number of columns in the matrix, the number of columns in the first matrix if a list is provided, or the number of columns minus 1 if group is specified but K is not.

seed

Optional; a number to set as the random seed. Use if reproducibility of random results is desired.

group

Optional; a string specifying the name of the column that describes which group each row (individual) belongs to. Use if matrices is a single matrix containing multiple groups of individuals you wish to compare. If the matrix was simulated using Q_simulate with rep > 1 and/or a vector for alpha, group = "Pop".

Value

A named list containing the following entries:

  • bootstrap_replicates: A named list of lists. Each element is named for a Q matrix provided in matrices and contains a list of n_replicates bootstrap replicates of the provided matrix. E.g., if n_replicates = 100 and the first Q matrix in matrices is named A, then the first element of bootstrap_replicates, bootstrap_replicates$bootstrap_matrices_A, is itself a list of 100 matrices, each representing a bootstrap replicate of matrix A.

  • statistics: A dataframe containing the output of Q_stat: Fst, FstMax, and ratio (Fst/FstMax), computed for each bootstrap replicate matrix in bootstrap_replicates. The ratio Fst/FstMax quantifies the variability of each Q matrix. The first column, titled Matrix, is a factor indicating which provided Q matrix the row corresponds to (the matrix name if matrices is a named list, or a number otherwise). The row names are of the form stats_matrix.replicate where matrix is the name of one of the provided Q matrices (or the entry number if the list elements were not named) and replicate is the number of bootstrap replicate (rep takes values from 1 to n_replicates).

  • plot_boxplot: A ggplot2 box plot depicting the bootstrap distribution of Fst/FstMax for each matrix in matrices.

  • plot_violin: A ggplot2 violin plot depicting the bootstrap distribution of Fst/FstMax for each matrix in matrices.

  • plot_ecdf: A ggplot2 empirical cumulative distribution function plot depicting the bootstrap distribution of Fst/FstMax for each matrix in matrices.

  • test_kruskal_wallis: Results of a Kruskal-Wallis test performed on the bootstrap distributions of Fst/FstMax. This test is a non-parametric statistical test of whether all provided bootstrap distributions are identically distributed.

  • test_pairwise_wilcox: Results of a Wilcoxon rank-sum test performed on the bootstrap distributions of Fst/FstMax. This test is a non-parameteric statistical test of whether each pairwise combination of provided bootstrap distributions is identically distributed. The result is a matrix of p-values whose entries correspond to each pair of Q matrices.

Examples

# Use Q_simulate to generate 4 random Q matrices
A <- Q_simulate(
  alpha = .1,
  lambda = c(.5, .5),
  popsize = 20,
  rep = 1,
  seed = 1
)

B <- Q_simulate(
  alpha = .1,
  lambda = c(.5, .5),
  popsize = 20,
  rep = 1,
  seed = 2
)

C <- Q_simulate(
  alpha = 1,
  lambda = c(.5, .5),
  popsize = 20,
  rep = 1,
  seed = 3
)

D <- Q_simulate(
  alpha = 1,
  lambda = c(.5, .5),
  popsize = 20,
  rep = 1,
  seed = 4
)

# Draw 100 bootstrap replicates from
# each of the 4 Q matrices
bootstrap_1 <- Q_bootstrap(
  matrices = list(
    A = A,
    B = B,
    C = C,
    D = D
  ),
  n_replicates = 100,
  K = 2
)

# Access the elements of this list using $.
# For example:
# To look at all 400 bootstrap Q matrix
# replicates:
bootstrap_1$bootstrap_replicates

# To look at Fst, FstMax, and
# the ratio (Fst/FstMax) for each replicate
bootstrap_1$statistics

# To look at a plot of the distribution of
# Fst/FstMax for each Q matrix:
bootstrap_1$plot_violin

# To determine if each of the 4 distibutions of
# Fst/FstMax is significantly different from
# each of the other distributions:
bootstrap_1$test_pairwise_wilcox

# Alternatively, you can simulate all of your comparison populations at once
# and use the group parameter:

# Here, Q_simulate generates 4 populations with the same parameters used to
# simulate the 4 Q matrices above. However, these will all be stacked in one
# matrix, rather than assigning each to a separate matrix.

Q_4 <- Q_simulate(alpha = c(0.1, 1),
                  lambda = c(0.5, 0.5),
                  popsize = 20,
                  rep = 2,
                  seed = 1)

# Look at the first few rows of Q_4
head(Q_4)

# Generate 100 bootstrap replicates for each of the
bootstrap_2 <- Q_bootstrap(matrices = Q_4,
                           n_replicates = 100,
                           K = 2,
                           seed = 1,
                           group = "Pop")

# To look at a plot of the distribution of
# Fst/FstMax for each Q matrix:
bootstrap_2$plot_violin

# To determine if each of the 4 distibutions of
# Fst/FstMax is significantly different from
# each of the other distributions:
bootstrap_2$test_pairwise_wilcox


MaikeMorrison/FSTruct documentation built on Aug. 26, 2023, 7:01 a.m.