summix: summix
In hendriau/Summix: Summix2: A suite of methods to estimate, adjust, and leverage substructure in genetic summary data

View source: R/summix.R

summix

R Documentation

summix

Description

Estimating mixture proportions of reference groups from large (N SNPs>10,000) genetic AF data.

Usage

summix(
  data,
  reference,
  observed,
  pi.start = NA,
  goodness.of.fit = TRUE,
  override_removeSmallRef = FALSE,
  network = FALSE,
  N_reference = NA,
  reference_colors = NA
)

Arguments

`data`	A dataframe of the observed and reference allele frequencies for N genetic variants. See data formatting document at https://github.com/hendriau/Summix for more information.
`reference`	A character vector of the column names for the reference groups.
`observed`	A character value that is the column name for the observed group.
`pi.start`	Length K numeric vector of the starting guess for the reference group proportions. If not specified, this defaults to 1/K where K is the number of reference groups.
`goodness.of.fit`	Default value is TRUE. If set as FALSE, the user will override the default goodness of fit measure and return the raw objective loss from slsqp.
`override_removeSmallRef`	Default value is FALSE. If set as TRUE, the user will override the automatic removal of reference groups with <1% global proportions - this is not recommended.
`network`	Default value is FALSE. If set as TRUE, function will return a network diagram with nodes as estimated substructure proportions and edges as degree of similarity between the given node pair.
`N_reference`	numeric vector of the sample sizes for each of the K reference groups; must be specified if network = "TRUE".
`reference_colors`	A character vector of length K that specifies the color each reference group node in the network plot. If not specified, this defaults to K random colors.

Value

A data frame with the following columns:

goodness.of.fit: scaled objective loss from slsqp() reflecting the fit of the reference data. Values between 0.5-1.5 are considered moderate fit and should be used with caution. Values greater than 1.5 indicate poor fit, and users should not perform further analyses using Summix.

iterations: number of iterations for SLSQP algorithm

time: time in seconds of SLSQP algorithm

filtered: number of genetic variants not used in the reference group mixture proportion estimation due to missing values.

K columns of mixture proportions of reference groups input into the function

Author(s)

Adelle Price, adelle.price@cuanschutz.edu

Hayley Wolff, hayley.wolff@cuanschutz.edu

Audrey Hendricks, audrey.hendricks@cuanschutz.edu

References

https://github.com/hendriau/Summix

Examples

# load the data
data("ancestryData")

# Estimate 5 reference ancestry proportion values for the gnomAD African/African American group
# using a starting guess of .2 for each ancestry proportion.
summix(data = ancestryData,
    reference=c("reference_AF_afr",
        "reference_AF_eas",
        "reference_AF_eur",
        "reference_AF_iam",
        "reference_AF_sas"),
    observed="gnomad_AF_afr",
    pi.start = c(.2, .2, .2, .2, .2),
    goodness.of.fit=TRUE)

hendriau/Summix documentation built on Nov. 13, 2024, 6:53 a.m.