View source: R/select_markers_for_pairscan.R
select_markers_for_pairscan | R Documentation |
This function selects markers for the pairwise scan. Beause Cape is computationally intensive, pairscans should not be run on large numbers of markers. As a rule of thumb, 1500 markers in a population of 500 individuals takes about 24 hours to run without the kinship correction. The kinship correction increases the time of the analysis, and users may wish to reduce the number of markers scanned even further to accommodate the extra computational burden of the kinship correction.
select_markers_for_pairscan(
data_obj,
singlescan_obj,
geno_obj,
specific_markers = NULL,
num_alleles = 50,
peak_density = 0.5,
window_size = NULL,
tolerance = 5,
plot_peaks = FALSE,
verbose = FALSE,
pdf_filename = "Peak.Plots.pdf"
)
data_obj |
a |
singlescan_obj |
a singlescan object from |
geno_obj |
a genotype object |
specific_markers |
A vector of marker names specifying which markers should be selected. If NULL, the function uses main effect size to select markers. |
num_alleles |
The target number of markers to select if using main effect size |
peak_density |
The fraction of markers to select under each peak exceeding the current threshold. Should be set higher for populations with low LD. And should be set lower for populations with high LD. Defaults to 0.5, corresponding to 50% of markers selected under each peak. |
window_size |
The number of markers to use in a smoothing window when calculating main effect peaks. If NULL, the window size is selected automatically based on the number of markers with consecutive rises and falls of main effect size. |
tolerance |
The allowable deviation from the target marker number in number of markers. For example, If you ask the function to select 100 markers, an set the tolerance to 5, the algorithm will stop when it has selected between 95 and 105 markers. |
plot_peaks |
Whether to plot the singlescan peaks identified by |
verbose |
Whether progress should be printed to the screen |
pdf_filename |
If plot_peaks is TRUE, this argument specifies the filename to which the peaks are plotted. |
This function can select markers either from a pre-defined list
input as the argument specific_markers
, or can select
markers based on their main effect size.
To select markers based on main effect size, this function
first identifies effect score peaks using an automated
peak detection algorithm. It finds the peaks rising
above a starting threshold and samples markers within each
peak based on the user-defined sampling density peak_density
.
Setting peak_density
to 0.5 will result in 50% of the markers
in a given peak being sampled uniformly at random. Sampling
reduces the redundancy among linked markers tested in the pairscan.
If LD is relatively low in the population, this density can be
increased to 1 to include all markers under a peak. If LD is high,
the density can be decreased to reduce redundancy further.
The algorithm compares the number of markers sampled to the target
defined by the user in the argument num_alleles
. If fewer
than the target have been selected, the threshold is lowered, and
the process is repeated until the target number of alleles have
been selected (plus or minus the number set in tolerance
).
If the number of target alleles exceeds the number of markers genotyped, all alleles will be selected automatically.
Returns the Cape
object with a new matrix called
geno_for_pairscan
containing the genotypes of the selected markers
for each individual.
bin_curve
, singlescan
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.