summix_local | R Documentation |
Estimates local substructure mixture proportions in genetic summary data; Also performs a selection scan (optional) that identifies potential regions of selection along the given chromosome.
summix_local(
data,
reference,
observed,
goodness.of.fit = TRUE,
type = "variants",
algorithm = "fastcatch",
minVariants = 0,
maxVariants = 0,
maxWindowSize = 0,
minWindowSize = 0,
windowOverlap = 200,
maxStepSize = 1000,
diffThreshold = 0.02,
NSimRef = NULL,
override_fit = FALSE,
override_removeSmallAnc = FALSE,
selection_scan = FALSE,
position_col = "POS",
nSimSE = 1000
)
data |
a data frame of the observed group and reference group allele frequencies for N genetic variants on a single chromosome. Must contain a column specifying the genetic variant positions. |
reference |
a character vector of the column names for K reference groups. |
observed |
a character value that is the column name for the observed group. |
goodness.of.fit |
an option to override the default scaled objective to return the raw loss from slsqp |
type |
user choice of how to define window size; options "variants" and "bp" are available where "variants" defines window size as the number of variants in a given window and "bp" defines window size as the number of base pairs in a given window. Default is "variants". |
algorithm |
user choice of algorithm to define local substructure blocks; options "fastcatch" and "windows" are available. "windows" uses a fixed window in a sliding windows algorithm. "fastcatch" allows dynamic window sizes. The "fastcatch" algorithm is recommended- though it is computationally slower. Default is "fastcatch". |
minVariants |
Used if algorithm = "fastcatch" and type = "variants". A numeric value that specifies the minimum number of genetic variants allowed to define a given window. |
maxVariants |
Used if type = "variants". A numeric value that specifies the maximum number of genetic variants allowed to define a given window. |
maxWindowSize |
Used if type = "bp". A numeric value that defines the maximum allowed window size by the number of base pairs in a given window. |
minWindowSize |
Used if algorithm = "fastcatch" and type = "bp". A numeric value that specifies the minimum number of base pairs allowed to define a given window. |
windowOverlap |
Used if algorithm = "windows". A numeric value that defines the number of variants or the number of base pairs that overlap between the given sliding windows. Default is 200. |
maxStepSize |
a numeric value that defines the maximum gap in base pairs between two consecutive genetic variants within a given window. Default is 1000. |
diffThreshold |
Used if algorithm = "fastcatch". A numeric value that defines the percent difference threshold to mark the end of a local substructure block. Default is 0.02. |
NSimRef |
Used if f selection_scan = TRUE. A numeric vector of the sample sizes for each of the K reference groups that is in the same order as the reference parameter. This is used in a simulation framework that calculates within local substructure block standard error. |
override_fit |
default is FALSE. If set as TRUE, the user will override the auto-stop of summix_local() that occurs if the global goodness of fit value is greater than 1.5 (indicating a poor fit of the reference data to the observed data). |
override_removeSmallAnc |
default is FALSE. If set as TRUE, the user will override the automatic removal of reference ancestries with <2% global proportions – this is not recommended. |
selection_scan |
user option to perform a selection scan on the given chromosome. Default is FALSE. If set as TRUE, a test statistic will be calculated for each local substructure block. Note: the user can expect extended computation time if this option is set as TRUE. |
position_col |
a character value that is the column name for the genetic variants positions. Default is "POS". |
nSimSE |
user choice of number of internal simulations to run to calculate standard error of estimates. Default is 1000. |
data frame with a row for each local substructure block and the following columns:
goodness.of.fit: scaled objective reflecting the fit of the reference data. Values between 0.5-1.5 are considered moderate fit and should be used with caution. Values greater than 1.5 indicate poor fit, and users should not perform further analyses using summix
iterations: number of iterations for SLSQP algorithm
time: time in seconds of SLSQP algorithm
filtered: number of SNPs not used in estimation due to missing values
K columns of mixture proportions of reference groups input into the function
nSNPs: number of SNPs in the given local substructure block
Hayley Wolff (Stoneman), hayley.wolff@cuanschutz.edu
Audrey Hendricks, audrey.hendricks@cuanschutz.edu
https://github.com/hendriau/Summix2
https://github.com/hendriau/Summix2 for further documentation.
data(ancestryData)
results <- summix_local(data = ancestryData,
reference = c("reference_AF_afr",
"reference_AF_eas",
"reference_AF_eur",
"reference_AF_iam",
"reference_AF_sas"),
NSimRef = c(704,787,741,47,545),
observed="gnomad_AF_afr",
goodness.of.fit = TRUE,
type = "variants",
algorithm = "fastcatch",
minVariants = 150,
maxVariants = 250,
maxStepSize = 1000,
diffThreshold = .02,
override_fit = FALSE,
override_removeSmallAnc = TRUE,
selection_scan = FALSE,
position_col = "POS")
print(results$results)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.