summix | R Documentation |
Estimating mixture proportions of reference groups from large (N SNPs>10,000) genetic AF data.
summix(
data,
reference,
observed,
pi.start = NA,
goodness.of.fit = TRUE,
override_removeSmallRef = FALSE,
network = FALSE,
N_reference = NA,
reference_colors = NA
)
data |
A dataframe of the observed and reference allele frequencies for N genetic variants. See data formatting document at https://github.com/hendriau/Summix for more information. |
reference |
A character vector of the column names for the reference groups. |
observed |
A character value that is the column name for the observed group. |
pi.start |
Length K numeric vector of the starting guess for the reference group proportions. If not specified, this defaults to 1/K where K is the number of reference groups. |
goodness.of.fit |
Default value is TRUE. If set as FALSE, the user will override the default goodness of fit measure and return the raw objective loss from slsqp. |
override_removeSmallRef |
Default value is FALSE. If set as TRUE, the user will override the automatic removal of reference groups with <1% global proportions - this is not recommended. |
network |
Default value is FALSE. If set as TRUE, function will return a network diagram with nodes as estimated substructure proportions and edges as degree of similarity between the given node pair. |
N_reference |
numeric vector of the sample sizes for each of the K reference groups; must be specified if network = "TRUE". |
reference_colors |
A character vector of length K that specifies the color each reference group node in the network plot. If not specified, this defaults to K random colors. |
A data frame with the following columns:
goodness.of.fit: scaled objective loss from slsqp() reflecting the fit of the reference data. Values between 0.5-1.5 are considered moderate fit and should be used with caution. Values greater than 1.5 indicate poor fit, and users should not perform further analyses using Summix.
iterations: number of iterations for SLSQP algorithm
time: time in seconds of SLSQP algorithm
filtered: number of genetic variants not used in the reference group mixture proportion estimation due to missing values.
K columns of mixture proportions of reference groups input into the function
Adelle Price, adelle.price@cuanschutz.edu
Hayley Wolff, hayley.wolff@cuanschutz.edu
Audrey Hendricks, audrey.hendricks@cuanschutz.edu
https://github.com/hendriau/Summix
https://github.com/hendriau/Summix for further documentation and https://github.com/hendriau/Summix2_manuscript for a larger sample data set and description of simulations in Summix2 manuscript. slsqp
function in the nloptr package for further details on Sequential Quadratic Programming https://www.rdocumentation.org/packages/nloptr/versions/1.2.2.2/topics/slsqp
# load the data
data("ancestryData")
# Estimate 5 reference ancestry proportion values for the gnomAD African/African American group
# using a starting guess of .2 for each ancestry proportion.
summix(data = ancestryData,
reference=c("reference_AF_afr",
"reference_AF_eas",
"reference_AF_eur",
"reference_AF_iam",
"reference_AF_sas"),
observed="gnomad_AF_afr",
pi.start = c(.2, .2, .2, .2, .2),
goodness.of.fit=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.