View source: R/gt_pca_autoSVD.R
gt_pca_autoSVD | R Documentation |
gen_tibble
objectsThis function performs Principal Component Analysis on a gen_tibble
, using
a fast truncated SVD with initial pruning and then iterative removal of
long-range LD regions. This function is a wrapper for
bigsnpr::snp_autoSVD()
gt_pca_autoSVD(
x,
k = 10,
fun_scaling = bigsnpr::snp_scaleBinom(),
thr_r2 = 0.2,
use_positions = TRUE,
size = 100/thr_r2,
roll_size = 50,
int_min_size = 20,
alpha_tukey = 0.05,
min_mac = 10,
max_iter = 5,
n_cores = 1,
verbose = TRUE,
total_var = TRUE
)
x |
a |
k |
Number of singular vectors/values to compute. Default is |
fun_scaling |
Usually this can be left unset, as it defaults to
|
thr_r2 |
Threshold over the squared correlation between two SNPs.
Default is |
use_positions |
a boolean on whether the position is used to define
|
size |
For one SNP, window size around this SNP to compute correlations. Default is 100 / thr_r2 for clumping (0.2 -> 500; 0.1 -> 1000; 0.5 -> 200). If not providing infos.pos (NULL, the default), this is a window in number of SNPs, otherwise it is a window in kb (genetic distance). I recommend that you provide the positions if available. |
roll_size |
Radius of rolling windows to smooth log-p-values. Default is
|
int_min_size |
Minimum number of consecutive outlier SNPs in order to be
reported as long-range LD region. Default is |
alpha_tukey |
Default is |
min_mac |
Minimum minor allele count (MAC) for variants to be included.
Default is |
max_iter |
Maximum number of iterations of outlier detection. Default is
|
n_cores |
Number of cores used. Default doesn't use parallelism. You may
use |
verbose |
Output some information on the iterations? Default is |
total_var |
a boolean indicating whether to compute the total variance
of the matrix. Default is |
Using gt_pca_autoSVD requires a reasonably large dataset, as the function
iteratively removes regions of long range LD. If you encounter: 'Error in
rollmean(): Parameter 'size' is too large.', roll_size
exceeds the number
of variants on at least one of your chromosomes. Try reducing 'roll_size' to
avoid this error.
Note: rather than accessing these elements directly, it is better to use
tidy
and augment
. See gt_pca_tidiers
.
a gt_pca
object, which is a subclass of bigSVD
; this is an S3
list with elements: A named list (an S3 class "big_SVD") of
d
, the eigenvalues (singular values, i.e. as variances),
u
, the scores for each sample on each component
(the left singular vectors)
v
, the loadings (the right singular vectors)
center
, the centering vector,
scale
, the scaling vector,
method
, a string defining the method (in this case 'autoSVD'),
call
, the call that generated the object.
loci
, the loci used after long range LD removal.
# Create a gen_tibble of lobster genotypes
bed_file <-
system.file("extdata", "lobster", "lobster.bed", package = "tidypopgen")
lobsters <- gen_tibble(bed_file,
backingfile = tempfile("lobsters"),
quiet = TRUE
)
# Remove monomorphic loci and impute
lobsters <- lobsters %>% select_loci_if(loci_maf(genotypes) > 0)
lobsters <- gt_impute_simple(lobsters, method = "mode")
show_loci(lobsters)$chromosome <- "1"
show_loci(lobsters)$chr_int <- 1
# Create PCA object, including total variance
gt_pca_autoSVD(lobsters,
k = 10,
roll_size = 20,
total_var = TRUE
)
# Change number of components and exclude total variance
gt_pca_autoSVD(lobsters,
k = 5,
roll_size = 20,
total_var = FALSE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.