Coding-variant Allelic Series Test

Updated: 2025-03-26

knitr::opts_chunk$set(echo = TRUE)

Data

To run an allelic series test, there are 4 key inputs:

The example data used below were generated using the DGP function provided with the package. The data set includes 100 subjects, 300 variants, and a continuous phenotype. The true effect sizes follow an allelic series, with magnitudes proportional to c(1, 2, 3) for BMVs, DMVs, and PTVs respectively.

set.seed(101)
n <- 100
data <- AllelicSeries::DGP(
  n = n,
  snps = 300,
  beta = c(1, 2, 3) / sqrt(n),
)

# Annotations.
anno <- data$anno
head(anno)

# Covariates.
covar <- data$covar
head(covar)

# Genotypes.
geno <- data$geno
head(geno[,1:5])

# Phenotype.
pheno <- data$pheno
head(pheno)

The example data generated by the preceding are available under vignettes/vignette_data.

Running the alleic series test

The COding-variant Allelic Series Test (COAST) is run using the COAST function. By default, the output of COAST includes a data.frame of counts showing the number of alleles, variants, and carriers in each class that contributed to the test, and a data.frame of p-values, with the omni test denoting the final, overall p-value. Inspection of the component p-values may be useful for determining which model(s) detected evidence of an association. In the present case, the baseline count model provided the greatest power.

results <- AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar
)
show(results)

The effect sizes data.frame is accessed via:

results@Betas

the counts data.frame via:

results@Counts

and the p-values data.frame via:

results@Pvals

Ultra-rare variants

Effect size estimation for ultra-rare variants is challenging due to the scarcity of carriers on which to base the estimate. Following works such as SAIGE-GENE+, we recommend collapsing variants with a minor allele count (MAC) $\leq 10$ into an aggregate variant, separately within each annotation category. The CollapseGeno utility is provided for this purpose. The min_mac specifies the threshold for retention as an individual variant (e.g. min_mac = 11 will collapse variants with a MAC $\leq 10$). The output is a list containing the annotation vector anno and genotype matrix geno after collapsing. In addition, a mapping vars is provided showing which variants were collapsed to create each aggregate variant. Note that:

set.seed(102)

# Generate data.
n <- 1e3
data <- AllelicSeries::DGP(
  n = n,
  snps = 10,
  prop_anno = c(1, 1, 1) / 3
)

# Collapse ultra-rare variants.
collapsed <- AllelicSeries::CollapseGeno(
  anno = data$anno, 
  geno = data$geno,
  min_mac = 11
)

# Variants collapsed to form each aggregate variant.
head(collapsed$vars)
# Run COAST on the collapsed data.
results <- AllelicSeries::COAST(
  anno = collapsed$anno,
  covar = data$covar,
  geno = collapsed$geno,
  pheno = data$pheno,
  min_mac = 10
)

Different numbers of annotation categories

COAST was originally intended to operate on the benign missense variants, damaging missense variants, and protein truncating variants within a gene, but it has been generalized to allow for an arbitrary number of discrete annotation categories. The following example simulates and analyzes data with 4 annotation categories. The main difference when analyzing a different number of annotation categories is that the weight vector should be specified, and should have length equal to the number of possible annotation categories. COAST will run, albeit with a warning, if there are possible annotation categories to which no variants are assigned (e.g. a gene contains no PTVs).

set.seed(102)

# Generate data.
n <- 1e2
data <- AllelicSeries::DGP(
  n = n,
  snps = 400,
  beta = c(1, 2, 3, 4) / sqrt(n),
  prop_anno = c(0.4, 0.3, 0.2, 0.1),
  weights = c(1, 1, 1, 1)
)

# Run COAST-SS.
results <- AllelicSeries::COAST(
  anno = data$anno,
  covar = data$covar,
  geno = data$geno,
  pheno = data$pheno,
  weights = c(1, 2, 3, 4)
)
show(results)

Test options

AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  apply_int = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  include_orig_skato_all = TRUE,
  include_orig_skato_ptv = TRUE,
  ptv_anno = 3
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = 1 * (pheno > 0),
  covar = covar,
  is_pheno_binary = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  min_mac = 2
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  return_omni_only = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  score_test = TRUE
)
AllelicSeries::COAST(
  anno = anno,
  geno = geno,
  pheno = pheno,
  covar = covar,
  weights = c(1, 2, 3)
)


Try the AllelicSeries package in your browser

Any scripts or data that you put into this service are public.

AllelicSeries documentation built on April 3, 2025, 7:46 p.m.