View source: R/create_knockoffs.R
create_knockoffs | R Documentation |
Generate knockoff variables for genotype data using the Multiple knockoff method with leveraging scores and clustering specifically optimized for genetic variant data.
create_knockoffs(
X,
pos,
chr_info = NULL,
sample_ids = NULL,
M = 5,
save_gds = TRUE,
output_dir = NULL,
start = NULL,
end = NULL,
corr_max = 0.75,
maxN_neighbor = Inf,
maxBP_neighbor = 1e+05,
n_AL = floor(10 * nrow(X)^(1/3) * log(nrow(X))),
thres_ultrarare = 25,
R2_thres = 1,
prob_eps = 1e-12,
irlba_maxit = 1500
)
X |
A sparse matrix (n x p) of genotype data where n is the number of samples and p is the number of SNPs. Typically coded as 0, 1, 2 for genotype dosages. |
pos |
A numeric vector of SNP positions (in base pairs) for linkage disequilibrium-aware knockoff generation. |
chr_info |
Optional chromosome information. Can be either: (1) A data frame with chromosome information from BIM file containing a column named "chr" or "CHR" with chromosome numbers, or (2) A vector of chromosome numbers directly. Chromosome information will be automatically extracted. |
sample_ids |
A character vector of sample IDs (default: NULL, will generate) |
M |
Number of knockoff copies to generate (default: 5). More copies can improve statistical power but increase computational cost. |
save_gds |
Whether to save knockoffs to GDS format (default: TRUE) |
output_dir |
Directory to save GDS files (default: NULL, uses tempdir()) |
start |
Start position for file naming (default: min(pos)) |
end |
End position for file naming (default: max(pos)) |
corr_max |
Maximum correlation threshold for clustering variants (default: 0.75). Higher values create fewer, larger clusters. |
maxN_neighbor |
Maximum number of neighboring variants to consider for each variant (default: Inf). |
maxBP_neighbor |
Maximum base pair distance to consider variants as neighbors (default: 100,000 bp). |
n_AL |
Number of samples to use for adaptive lasso fitting (default: automatically determined based on sample size). |
thres_ultrarare |
Minimum minor allele count threshold for variant inclusion (default: 25). |
R2_thres |
R-squared threshold for model fitting (default: 1). |
prob_eps |
Minimum probability value to prevent numerical issues (default: 1e-12). |
irlba_maxit |
Maximum iterations for truncated SVD (default: 1500). |
If save_gds is TRUE, returns the path to the saved GDS file. Otherwise, returns a list of M matrices, each of the same dimensions as X, containing knockoff variables.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.