AdmixGlobal: Expectation-Maximization algorithm for global (genome-wide)...

View source: R/AdmixGlobal.R

AdmixGlobalR Documentation

Expectation-Maximization algorithm for global (genome-wide) admixture inference

Description

This function performs global admixture inference from discrete or continuous allele dosages

Usage

AdmixGlobal(
  Geno,
  K,
  P = NULL,
  ParamToUpdate = "both",
  MaxIter = 200L,
  MinIter = 10L,
  MinParamBound = 1e-06,
  LogLikThresh = 0.001,
  PropInit = NULL,
  FreqInit = NULL,
  NbThreads = 0L,
  Seed = 123L,
  Verbose = TRUE
)

Arguments

Geno

List of genotying matrices of size M (number of markers), with each matrix of dimension N (number of individuals) x L (number of alleles) whose elements consists of discrete or continuous allele dosages

K

Number of ancestral groups (positive integer superior or equal to 2)

P

Ploidy level (positive integer), to be used when read depth ratios are specified in Geno, either as a single value to specify the same ploidy for all individuals, or as a vector of size N

ParamToUpdate

Specify both, Prop or Freq to update both admixture proportions and ancestral allele frequencies, only admixture proportions or only ancestral allele frequencies, respectively

MaxIter

Maximum number of iterations (positive integer greater than or equal to MinIter)

MinIter

Minimum number of iterations (positive integer greater than or equal to 2 and smaller than or equal to MaxIter)

MinParamBound

Minimum value for admixture proportions and ancestral allele frequencies (positive numeric value)

LogLikThresh

Algorithm convergence criterion (positive numeric value) consisting of a log-likelihood difference value between two iterations

PropInit

Matrix of dimension N x K with initial admixture proportions

FreqInit

List of matrices of size M (number of markers), with each matrix being of dimension K x L with initial allele frequencies in ancestral groups

NbThreads

Number of threads to be used (positive integer) with a default value of 0 setting automatically all threads available

Seed

Seed for reproducible inference (integer)

Verbose

A boolean describing if detailed information should be printed

Details

The function AdmixGlobal() performs global (genome-wide) admixture inference from genotyping data (Geno) formatted as a list of matrices (one for each marker), by specifying the number of ancestral groups (K).

The inference is performed using an expectation-maximization (EM) algorithm whose minimum (MinIter) and maximum (Maxiter) number of iterations can be fixed by the user, as well as the log-likelihood convergence threshold (LogLikThresh). A minimum value for admixture proportions ancestral allele frequencies is set using MinParamBound, which should be above zero to prevent computational issues. By default, all available threads/CPU cores are used but the number can be chosen using NbThreads.

If allele dosages are used, the ploidy level (P) does not have to be specified as it is automatically calculated from the dosages. However, it must be specified if the genotypic data correspond to ratios constrained between 0 and 1 (e.g. obtained from allele read depths).

By default, the EM algorithm is initialized with random parameter values but admixture proportions and ancestral allele frequencies can be initialized using PropInit and FreqInit, respectively.

By default, both types of parameters are updated by the EM, but only one of them can be updated while the other is fixed using ParamToUpdate. This is particularly interesting when allele frequencies have been estimated on a reference panel (e.g. stored into a list FreqRef) and are used to estimated the admixture proportions of a new panel of individuals. This scenario can be implemented by initializing allele frequencies (FreqInit=FreqRef) and specifying to update only admixture proportions (ParamToUpdate=Prop).

Value

A list of three items: a matrix of admixture proportions (Prop), a list of matrices of allele frequencies in ancestral groups (Freq), and a vector of log-likelihood values over iterations (LogLik)

See Also

  • SimulatePop() to simulate a polyploid admixed population.

  • AdmixLocal() to perform local admixture inference using the results from the AdmixGlobal() function.

  • GlobalPlot() to generate an admixture barplot using the results from the AdmixGlobal() function.

Examples

## Simulate a polyploid admixed population
DataSim <- SimulatePop(K=3L, N=10L, P=6L, M=50L, C=5L, L=10L, Seed=123, NbThreads=1)

## Perform global admixture inference
ResGlobalAdmix <- AdmixGlobal(Geno=DataSim$Geno, K=3, Verbose=FALSE, NbThreads=1)

## Estimated admixture proportions
head(ResGlobalAdmix$Prop)

## Estimated allele frequencies at first marker
ResGlobalAdmix$Freq[[1]]

## Log-likelihood over iterations
head(ResGlobalAdmix$LogLik)

AdmixPoly documentation built on June 18, 2026, 1:06 a.m.