Description Usage Arguments Author(s) References See Also
Function for the efficient whole-genome haplotype block partitioning.
It is analogous to mig
and allows to specify several input files (chromosomes) at once and process them in parallel.
Haplotype blocks are defined based on D' coefficient of linkage disequilibrium (Gabriel et al., 2002).
1 2 3 4 | mig_multi_chr(phase_files, output_files, processes = 1, phase_file_format = "VCF",
map_files = NULL, maf = 0.0, ci_method = "WP",
l_density = 100, ld_ci = c(0.7, 0.98), ehr_ci = 0.9,
ld_fraction = 0.95, pruning_method = "MIG++", windows = NULL)
|
phase_files |
The list of names of the input files with phased genotypes. All input files must be in the same format: VCF, HAPMAP2 or IMPUTE2. |
output_files |
The list of names of the output files where to store the haplotype blocks. One output file for every input file. |
processes |
An integer >= 1, which indicates the number of parallel processes. All processes are created on the localhost and communicate through sockets. |
phase_file_format |
Format of the phase_files: VCF (default), HAPMAP2 or IMPUTE2. If VCF, then only SNPs with "PASS" or "." in the FILTER field are considered. |
map_files |
The list of names of the map files with base-pair positions of each SNP. One map file for every input file. The order of map files must correspond to the order of input files. Mandatory when file_format = HAPMAP2. |
maf |
Minor Allele Frequency (MAF) threshold: SNPs with MAF <= maf will not be considered. The threshold may vary from 0 (default) to 0.5. |
ci_method |
Confidence interval (CI) estimation method. Supported methods are WP (default) = Wall and Pritchard (2003) method; AV = approximate variance estimator by Zapata et al. (1997). |
l_density |
Number of points at which to evaluate the likelihood (applies only to the WP method). Default is 100. The higher the number the longer the runtime. The lower the number the lower the precision. |
ld_ci |
Numeric vector with 2 values: thresholds for the lower bound (CL) and upper bound (CU) of the 90% CI of D'. Following Gabriel et al. (2002), default is c(0.7, 0.98). |
ehr_ci |
Threshold value for the evidence of historical recombination. Following Gabriel et al. (2002), default is 0.9. |
ld_fraction |
Fraction of strong LD SNP pairs over all informative pairs that is needed to classify a sequence of SNP as a haplotype block. Following Gabriel et al. (2002), default is 0.95. |
pruning_method |
Name of a search space pruning method. Supported methods are MIG, MIG+ and MIG++ (default). |
windows |
Numeric vector where every value corresponds to the according input file (chromosome) and specifies the number of SNPs within the window in MIG++ search space pruning method. If NULL (default), all values are calculated on the fly based on the corresponding chromosome lengths and ld_fraction. |
Daniel Taliun, Johann Gamper, Cristian Pattaro
Zapata, C., Alvarez, G., Carollo, C. (1997) Approximate variance of the standardized measure of gametic disequilibrium D'. American Journal of Human Genetics, 61(3), 771–774.
Gabriel, S. B. et al. (2002) The Structure of Haplotype Blocks in the Human Genome. Science, 296(5576), 2225–2229.
Wall, J. D. and Pritchard, J. K. (2003) Assessing the performance of the haplotype block model of linkage disequilibrium. American Journal of Human Genetics, 73(3), 502–515.
Barrett, J. C. et al. (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21(2), 263–265.
See mig for the haplotype block definition, description of the D' distribution modeling and pruning methods.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.