tess3Main: Estimate ancestry coefficients and run genome scans for...

View source: R/tess3.R

tess3MainR Documentation

Estimate ancestry coefficients and run genome scans for selection

Description

tess3Main estimates spatial population structure using a graph based non negative matrix factorization. After estimating the population structure is used to compute a Fst statistic for each locus. See references for more details.

Usage

tess3Main(X, XProba = NULL, coord, K, ploidy, lambda = 1, W = NULL,
  method = "projected.ls", max.iteration = 200, tolerance = 1e-05,
  openMP.core.num = 1, Q.init = NULL, mask = 0, copy = TRUE,
  algo.copy = TRUE, verbose = FALSE)

Arguments

X

a numeric matrix which corresponds to the genotype matrix. This matrix must be of size n \times L where n is the number of individuals and L is the number of loci. Values of this matrix are integers corresponding to the number of variant alleles observed at a locus. If NULL, XProba is used.

XProba

a numeric matrix which corresponds to genotype likelihoods (probabilities). This matrix must be of size n \times (ploidy + 1)L where n is the number of individuals and L is the number of loci. Entries of this matrix are numeric values between 0 and 1 corresponding to genotype probability. If NULL, this matrix is computed from the genotype matrix X. See reference for details.

coord

a numeric matrix of size n \times 2 where n is the number of individuals. It contains the geographic coordinates (Longitude, Latitude) of all sampled individuals.

K

an integer corresponding to the number of ancestral populations.

ploidy

an integer which corresponds to the ploidy of the number of copy of chromosomes.

lambda

a nonnegative numeric which corresponds to the spatial regularization parameter.

W

a numeric matrix which corresponds to the graph weight matrix. If NULL, W is computed as W[i,j] = exp(-(coord[i] - coord[j])^2 / sigma^2), where coord[i] is the set of geographic coordinates for individual i and sigma equals 5 percent of the average geographic distance between individuals.

method

"projected.ls" or "qp". If "projected.ls", an alternating projected least squares algorithm is used. If "qp", an alternating quadratic programing algorithm is used. See references for details.

max.iteration

the maximum number of iterations in the optimization algorithm.

tolerance

a numeric value which corresponds to the tolerance paramter in the stopping criterion of the optimization algorithm.

openMP.core.num

number of core used by the algorithm. It requires that openMP is installed.

Q.init

a numeric matrix which corresponds to the initial value of Q for the algorithm.

mask

if not NULL, this numeric value is the proportion of genotypic matrix entries which are masked when computing the cross validation criterion.

copy

if TRUE data will be copied once.

algo.copy

if TRUE, data will be copied in order to speed the algorithm.

verbose

If TRUE run information is printed.

Value

An object of class tess3Main which is a list with the following attributes:

L

the number of loci

n

the number of individuals

ploidy

the number of copies of chromosomes.

K

the number of ancestral populations.

G

the ancestral genotype frequency matrix.

Q

the ancestry coefficient matrix.

Fst

Fst statistic computed at each locus.

Fscore

Fscores computed from the Fst statistics.

pvalue

pvalues computed from the Fscores.

log.pvalue

The log(pvalue).

rmse

root square mean error between XProba and tcrossprod(Q, G).

crossentropy

cross-entropy error between XProba and tcrossprod(Q, G).

crossvalid.rmse

if masked is not NULL, root square mean error between XProba[masked] and tcrossprod(Q, G)[masked].

crossvalid.crossentropy

if masked not NULL, the cross-entropy error between XProba[masked] and tcrossprod(Q, G)[masked].

References

https://hal.archives-ouvertes.fr/hal-01222555/ http://biorxiv.org/content/early/2016/10/12/080291

See Also

tess3

Examples

library(tess3r)

# Arabidopsis thaliana data set
data(data.at)
genotype <- data.at$X
coordinates <- data.at$coord

# Run of tess3 main algorithm
tess3.obj <- tess3Main(X = genotype,
                      coord = coordinates,
                      K = 3,
                      method = "projected.ls",
                      ploidy = 1)

# Run of tess3 main algorithm with cross validation errors computation.
tess3.obj <- tess3Main(X = genotype,
                      coord = coordinates,
                      K = 3,
                      method = "projected.ls",
                      ploidy = 1,
                      mask = 0.05)



cayek/TESS3_encho_sen documentation built on July 4, 2023, 7:51 p.m.