snp.pruning | R Documentation |
For a given molecular dataset \boldsymbol{M}
(in the format 0, 1 and 2)
it produces a reduced molecular matrix by eliminating "redundant"
markers using pruning techniques. This function finds and drops some of the
SNPs in high linkage disequilibrium (LD).
snp.pruning(
M = NULL,
map = NULL,
marker = NULL,
chrom = NULL,
pos = NULL,
method = c("correlation"),
criteria = c("callrate", "maf"),
pruning.thr = 0.95,
by.chrom = FALSE,
window.n = 50,
overlap.n = 5,
iterations = 10,
seed = NULL,
message = TRUE
)
M |
A matrix with marker data of full form ( |
map |
(Optional) A data frame with the map information with |
marker |
A character indicating the name of the column in data frame |
chrom |
A character indicating the name of the column in data frame |
pos |
A character indicating the name of the column in data frame |
method |
A character indicating the method (or algorithm) to be used as reference for
identifying redundant markers.
The only method currently available is based on correlations (default = |
criteria |
A character indicating the criteria to choose which marker to drop
from a detected redundant pair.
Options are: |
pruning.thr |
A threshold value to identify redundant markers with Pearson's correlation larger than the
value provided (default = |
by.chrom |
If TRUE the pruning is performed independently by chromosome (default = |
window.n |
A numeric value with number of markers to consider in each
window to perform pruning (default = |
overlap.n |
A numeric value with number of markers to overlap between consecutive windows
(default = |
iterations |
An integer indicating the number of sequential times the pruning procedure
should be executed on remaining markers.
If no markers are dropped in a given iteration/run, the algorithm will stop (default = |
seed |
An integer to be used as seed for reproducibility. In case the criteria has the
same values for a given pair of markers, one will be dropped at random (default = |
message |
If |
Pruning is recommended as redundancies can affect the quality of matrices used for downstream analyses. The algorithm used is based on the Pearson's correlation between markers as a proxy for LD. In the event of a pairwise correlation higher than the selected threshold markers will be eliminated as specified by: call rate, minor allele frequency. In case of tie, one marker will be dropped at random.
Filtering markers (qc.filtering) is of high relevance before pruning. Poor quality markers (e.g., monomorphic markers) may prevent correlations from being calculated and may affect eliminations.
Mpruned
: a matrix containing the pruned marker M matrix.
map
: an data frame containing the pruned map.
# Read and filter genotypic data.
M.clean <- qc.filtering(
M = geno.pine655,
maf = 0.05,
marker.callrate = 0.20, ind.callrate = 0.20,
Fis = 1, heterozygosity = 0.98,
na.string = "-9",
plots = FALSE)$M.clean
# Prune correlations > 0.9.
Mpr <- snp.pruning(
M = M.clean, pruning.thr = 0.90,
by.chrom = FALSE, window.n = 40, overlap.n = 10)
head(Mpr$map)
Mpr$Mpruned[1:5, 1:5]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.