tess3: Estimate ancestry coefficients and run genome scans for...
In cayek/TESS3_encho_sen: Inference of Spatial Population Genetic Structure

tess3

R Documentation

Estimate ancestry coefficients and run genome scans for selection

Description

tess3 is the main function of the tess3r package. It runs a graph-based nonnegative matrix factorization algorithm that includes geographic data in the estimation of spatial population structure. The function requires individual genotypes, geographic coordinates, and it can be run for multiple values of the number of ancestral populations. In addition, the function uses the estimates of ancestry coefficients to compute an Fst statistic for each locus, and to return test significance values for a null hypothesis of selective neutrality. See the references for more details.

Usage

tess3(X, XProba = NULL, coord, K, ploidy, lambda = 1, rep = 1, W = NULL,
  method = "projected.ls", max.iteration = 200, tolerance = 1e-05,
  openMP.core.num = 1, Q.init = NULL, mask = 0, algo.copy = TRUE,
  keep = "best", verbose = FALSE)

Arguments

`X`	a matrix of individual genotypes. This matrix must have `n` rows and `L` columns where `n` is the number of individuals and `L` is the number of loci. The entries of this matrix are integers between 0 and ploidy, that correspond to the number of derived/reference alleles observed at each locus. If `NULL`, the matrix of genotype likelihood, `XProba`, is used.
`XProba`	A matrix which contains individual genotype likelihoods (probabilities) for each locus. This matrix must contain `n` rows and `(ploidy + 1)L` columns where `n` is the number of individuals, and `L` is the number of loci. The entries of this matrix are numeric values ranging between 0 and 1, and corresponding to genotype probabilities for each locus. If `NULL`, deterministic values are computed from the genotypic matrix `X`. See the references for more details.
`coord`	a matrix of size `n \times 2` where `n` is the number of individuals encoding the longitude and latitude of each individual (numeric system).
`K`	an integer or a vector of integers corresponding to the number of ancestral populations.
`ploidy`	an integer corresponding to ploidy of the studied organism. Haploids have ploidy = 1, diploids have ploidy = 2, etc.
`lambda`	a numeric value for the spatial regularization parameter. The default value lambda = 1 attributes equal weights to the loss function and to the penalty function.
`rep`	integer. The number of time the algorithm will be repeated for each value of `K`.
`W`	a matrix which corresponds to the graph weightings. If NULL, W is computed as `W[i,j] = exp(-(coord[i] - coord[j])^2 / sigma^2)`, where `coord[i]` represents the geographic coordinates for individual i, and where `sigma` is equal to 5 percent of the average geographic distance between individuals.
`method`	a character string `"projected.ls"` or `"qp"`. If `"projected.ls"`, an alternating projected least squares algorithm is used. If `"qp"`, an alternating quadratic programing algorithm is used. See references for more details
`max.iteration`	the maximum number of iterations of the optimization algorithm.
`tolerance`	a numeric value corresponding to the stopping criteria of the optimization algorithm.
`openMP.core.num`	integer representing the number of cores used by the optimization algorithm. It requires that the openMP library is installed in your OS (default for macOS is no).
`Q.init`	a matrix for initial values of ancestry coefficients for the algorithm. The default value is a random matrix.
`mask`	If not `NULL`, this numeric value is the proportion of masked data when computing the cross-validation criterion.
`algo.copy`	boolean. If TRUE data is copied to speed up the algorithm.
`keep`	If `"best"`, only the result with the lowest `rmse` score will be kept for each value of `K`. If `"all"`, all results will be kept and returned for each value of `K`. The second option uses more space in memory.
`verbose`	If `TRUE` more information is printed.

Value

An object of class tess3 which corresponds to a list of length length(K). Each element of this list has the following attributes

K: the number of ancestral populations
tess3.run: if keep = "best", the tess3Main result with the lowest value of the rmse (loss) function. If keep = "all", a list of tess3Main results for each repetition
rmse: root mean squared error between the genotypic matrix XProba and the fitted matrix for each program repetition
crossentropy: cross-entropy between the genotypic matrix XProba and the fitted matrix for each program repetition
crossvalid.rmse: root square mean error between the masked values of genotypic matrix XProba[masked] and their fitted values for each repetition. If mask is FALSE, then NULL.
crossvalid.crossentropy: cross-entropy between the masked values of genotypic matrix XProba[masked] and their fitted values for each repetition. If mask is FALSE, then NULL.

Methods available for this class:

plot.tess3
summary.tess3
is.tess3
Gettess3res
qmatrix
pvalue

References

Caye, K., Deist, T. M., Martins, H., Michel, O., & François, O. (2016). TESS3: fast inference of spatial population structure and genome scans for selection. Molecular Ecology Resources, 16(2), 540-548. http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12471/full Caye, Kevin et al. (2016) Fast Inference of Individual Admixture Coefficients Using Geographic Data. bioRxiv doi:10.1101/080291. http://biorxiv.org/content/early/2016/10/12/080291 Caye, K., Jay, F., Michel, O., François, O. (2018). Fast inference of individual admixture coefficients using geographic data. The Annals of Applied Statistics, 12(1), 586-608.

Examples

library(tess3r)

# Arabidopsis thaliana data set
data(data.at)
genotype <- data.at$X
coordinates <- data.at$coord

# Running the tess3 function
tess3.obj <- tess3(X = genotype, coord = coordinates, K = 1:4,
                   method = "projected.ls",
                   ploidy = 1)

# Plot error
plot(tess3.obj, pch = 19, col = "blue",
     xlab = "Number of ancestral populations",
     ylab = "Cross-validation score")

# Retrieve the Q-matrix for K = 3 clusters
q.matrix <- qmatrix(tess3.obj, K = 3)

## STRUCTURE-like barplot for the Q-matrix
barplot(q.matrix, border = NA, space = 0,
       xlab = "Individuals", ylab = "Ancestry proportions",
       main = "Ancestry matrix") -> bp
axis(1, at = 1:nrow(q.matrix), labels = bp$order, las = 3, cex.axis = .4)

## Spatial interpolation of ancestry coefficient
my.colors <- c("tomato", "orange", "lightblue")
my.palette <- CreatePalette(my.colors, 9)
plot(q.matrix, coordinates, method = "map.max", interpol = FieldsKrigModel(10),
     main = "Ancestry coefficients",
     xlab = "Longitude", ylab = "Latitude",
     resolution = c(500,500), cex = .4,
     col.palette = my.palette)

## Genome scan p-values for K = 3
p.values <- pvalue(tess3.obj, K = 3)
hist(p.values, col = "lightblue")

## Manhatan plot
plot(p.values, main = "Manhattan plot",
    xlab = "Locus id",
    ylab = "-log10(P-values)",
    cex = .3, col = "grey")

cayek/TESS3_encho_sen documentation built on July 4, 2023, 7:51 p.m.