View source: R/standardization.R
standardize | R Documentation |
This function performs signal standardization of genotype data by aligning 'theta' values (allelic ratios or normalized intensities) to expected genotype clusters. It outputs standardized BAF (B-allele frequency) and Z-scores per sample and marker.
standardize(
data = NULL,
genos = NULL,
geno.pos = NULL,
threshold.missing.geno = 0.9,
threshold.geno.prob = 0.8,
ploidy.standardization = NULL,
threshold.n.clusters = NULL,
n.cores = 1,
out_filename = NULL,
type = "intensities",
multidog_obj = NULL,
parallel.type = "PSOCK",
verbose = TRUE,
rm_outlier = TRUE,
cluster_median = TRUE
)
data |
A 'data.frame' containing the full dataset with the following columns:
|
genos |
A 'data.frame' containing genotype dosage information for the reference panel. This should include samples of known ploidy and ideally euploid individuals. Required columns:
|
geno.pos |
A 'data.frame' with marker position metadata. Required columns:
|
threshold.missing.geno |
Numeric (0–1). Maximum fraction of missing genotype data allowed per marker. Markers with a higher fraction will be removed. |
threshold.geno.prob |
Numeric (0–1). Minimum genotype call probability threshold. Genotypes with lower probability will be treated as missing. |
ploidy.standardization |
Integer. The ploidy level of the reference panel used for standardization. |
threshold.n.clusters |
Integer. Minimum number of expected dosage clusters per marker. For diploid data, this is typically 3 (corresponding to genotypes 0, 1, and 2). |
n.cores |
Integer. Number of cores to use in parallel computations (e.g., for cluster center estimation and BAF generation). |
out_filename |
Optional. Path to save the final standardized dataset to disk as a CSV file (suitable for Qploidy). |
type |
Character. Type of data used for clustering:
|
multidog_obj |
Optional. An object of class 'multidog' from the 'updog' package, containing model fits and estimated biases. If provided, this will override the ‘type' parameter and use 'updog'’s expected cluster positions. |
parallel.type |
Character. Parallel backend to use ('"FORK"' or '"PSOCK"'). '"FORK"' is faster but only works on Unix-like systems. |
verbose |
Logical. If 'TRUE', prints progress and filtering information to the console. |
rm_outlier |
Logical. If 'TRUE', uses Bonferroni-Holm corrected residuals to remove outliers before estimating cluster centers. |
cluster_median |
Logical. If 'TRUE', uses the median of theta values to estimate cluster centers. If 'FALSE', uses the mean. |
Reference genotypes are used to estimate cluster centers either from dosage data (e.g., via 'fitpoly' or 'updog') or using an 'updog' 'multidog' object directly. This function supports both array-based (intensity) and sequencing-based (count) data.
It applies marker and genotype-level quality filters, uses parallel computing to estimate BAF, and generates a final annotated output suitable for CNV or dosage variation analyses.
An object of class '"qploidy_standardization"' (list) with the following components:
Named vector of standardization parameters.
Named vector summarizing how many markers were removed at each filtering step.
A data.frame containing merged BAF, Z-score, and genotype information by marker and sample.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.