View source: R/gt_pca_autoSVD.R
| gt_pca_autoSVD | R Documentation |
gen_tibble objectsThis function performs Principal Component Analysis on a gen_tibble, using
a fast truncated SVD with initial pruning and then iterative removal of
long-range LD regions. This function is a wrapper for
bigsnpr::snp_autoSVD()
gt_pca_autoSVD(
x,
k = 10,
fun_scaling = bigsnpr::snp_scaleBinom(),
thr_r2 = 0.2,
use_positions = TRUE,
size = 100/thr_r2,
roll_size = 50,
int_min_size = 20,
alpha_tukey = 0.05,
min_mac = 10,
max_iter = 5,
n_cores = 1,
verbose = TRUE,
total_var = TRUE
)
x |
a |
k |
Number of singular vectors/values to compute. Default is |
fun_scaling |
Usually this can be left unset, as it defaults to
|
thr_r2 |
Threshold over the squared correlation between two SNPs.
Default is |
use_positions |
a boolean on whether the position is used to define
|
size |
For one SNP, window size around this SNP to compute correlations. Default is 100 / thr_r2 for clumping (0.2 -> 500; 0.1 -> 1000; 0.5 -> 200). If not providing infos.pos (NULL, the default), this is a window in number of SNPs, otherwise it is a window in kb (genetic distance). I recommend that you provide the positions if available. |
roll_size |
Radius of rolling windows to smooth log-p-values. Default is
|
int_min_size |
Minimum number of consecutive outlier SNPs in order to be
reported as long-range LD region. Default is |
alpha_tukey |
Default is |
min_mac |
Minimum minor allele count (MAC) for variants to be included.
Default is |
max_iter |
Maximum number of iterations of outlier detection. Default is
|
n_cores |
Number of cores used. Default doesn't use parallelism. You may
use |
verbose |
Output some information on the iterations? Default is |
total_var |
a boolean indicating whether to compute the total variance
of the matrix. Default is |
Using gt_pca_autoSVD requires a reasonably large dataset, as the function
iteratively removes regions of long range LD. If you encounter: 'Error in
rollmean(): Parameter 'size' is too large.', roll_size exceeds the number
of variants on at least one of your chromosomes. Try reducing 'roll_size' to
avoid this error.
Note: rather than accessing these elements directly, it is better to use
tidy and augment. See gt_pca_tidiers.
a gt_pca object, which is a subclass of bigSVD; this is an S3
list with elements: A named list (an S3 class "big_SVD") of
d, the eigenvalues (singular values, i.e. as variances),
u, the scores for each sample on each component
(the left singular vectors)
v, the loadings (the right singular vectors)
center, the centering vector,
scale, the scaling vector,
method, a string defining the method (in this case 'autoSVD'),
call, the call that generated the object.
loci, the loci used after long range LD removal.
bigsnpr::snp_autoSVD() which this function wraps.
# Create a gen_tibble of lobster genotypes
bed_file <-
system.file("extdata", "lobster", "lobster.bed", package = "tidypopgen")
lobsters <- gen_tibble(bed_file,
backingfile = tempfile("lobsters"),
quiet = TRUE
)
# Remove monomorphic loci and impute
lobsters <- lobsters %>% select_loci_if(loci_maf(genotypes) > 0)
lobsters <- gt_impute_simple(lobsters, method = "mode")
show_loci(lobsters)$chromosome <- "1"
# Create PCA object, including total variance
gt_pca_autoSVD(lobsters,
k = 10,
roll_size = 20,
total_var = TRUE
)
# Change number of components and exclude total variance
gt_pca_autoSVD(lobsters,
k = 5,
roll_size = 20,
total_var = FALSE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.