Home

/

R-Forge

/

LDExplorer

/

mig_rsq: Memory-efficient Implementation of Gabriel et al. 2002...

mig_rsq: Memory-efficient Implementation of Gabriel et al. 2002...
In LDExplorer: Efficient Whole-Genome LD Based Haplotpe Block Recognition

Description Usage Arguments Haplotype Blocks Search Space Pruning Output File Author(s) References Examples

Function for the efficient whole-genome haplotype block partitioning. Haplotype blocks are defined based on r^2 coefficient of linkage disequilibrium.

1
2
3

	mig_rsq(phase_file, output_file, phase_file_format = "VCF", 
	map_file = NULL, region = NULL, maf = 0.0, 
	weak_rsq = 0.5, strong_rsq = 0.8, fraction = 0.95, pruning_method = "MIG++", window = NULL)

`phase_file`	Name of the input file with phased genotypes in VCF, HAPMAP2 or IMPUTE2 format.
`output_file`	Name of the output file where to store the haplotype blocks.
`phase_file_format`	Format of the phase_file: VCF (default), HAPMAP2 or IMPUTE2. If VCF, then only SNPs with "PASS" or "." in the FILTER field are considered.
`map_file`	Name of the map file with base-pair positions of each SNP. Mandatory when file_format = HAPMAP2.
`region`	Numeric vector with start and end positions (in base-pairs) of the chromosomal region to be partitioned. If NULL (default), then the whole chromosome is processed.
`maf`	Minor Allele Frequency (MAF) threshold: SNPs with MAF <= maf will not be considered. The threshold may vary from 0 (default) to 0.5.
`weak_rsq`	r^2 threshold value for the SNP pairs in weak LD. A SNP pair is in weak LD if r^2 < weak_rsq. By default, weak_rsq = 0.5.
`strong_rsq`	r^2 threshold value for the SNP pairs in strong LD. A SNP pair is in strong LD if r^2 >= strong_rsq. By default, strong_rsq = 0.8.
`fraction`	Fraction of strong LD SNP pairs over all informative SNP pairs that is needed to classify a sequence of SNP as a haplotype block. Informative SNP pairs are those that are in strong LD or in weak LD. Default is 0.95.
`pruning_method`	Name of a search space pruning method. Supported methods are MIG, MIG+ and MIG++ (default).
`window`	Number of SNPs within the window in MIG++ search space pruning method. If NULL (default), it is calculated on the fly based on the region length and ld_fraction.

The haplotype blocks are defined based on r^2 coefficient of linkage disequilibrium (LD) between a pair of SNPs following the logic suggested by Gabriel et al., 2002.

SNP Pairs Classification

According to the r^2 statistic, LD between any SNP pair is classified as follows:
(1) strong LD if r^2 >= 0.8;
(2) weak LD if r^2 < 0.5;

All SNP pairs satisfying conditions (1) and (2) are called informative.

Haplotype Block Definition

A chromosomal region is a haplotype block if:
(a) the two outer-most SNPs are in strong LD;
(b) at least 95% of informative SNP pairs within the region are in strong LD.

MIG

The MIG is a memory efficient implementation of Gabriel et al. 2002 haplotype block definition using the r^2 coefficient. It has extremely low memory requirements (linear memory complexity) and is able to represent LD structure of the entire chromosome (millions of SNPs) in the main memory. The MIG doesn't prune the search space i.e. it computes LD for all possible SNP pairs.

MIG+

The MIG+ identifies and omits computations in regions that can never become haplotype blocks. The chromosome is processed from left to right. Assuming all unprocessed SNP pairs to the right being in strong LD, an early termination point is determined using already known LD structure to the left. It is theoretically guaranteed, that haplotype blocks produced by MIG and MIG+ are identical. The MIG+ has low memory requirements and improved runtime compared to MIG.

MIG++

The MIG++ is the improved version of MIG+ search space pruning method. The chromosome is processed in several iterations. At every iteration it is scanned from the very start to the very end and the computations are restricted by a user-defined window. The window is expressed as a maximal number of SNPs that will be considered when computing LD with the currently considered SNP. In every new iteration, the partially calculated LD structure across the whole chromosome allows to predict termination points better. It is theoretically guaranteed, that haplotype blocks produced by MIG, MIG+ and MIG++ are identical. The MIG++ has low memory requirements and significantly better runtime compared to MIG and MIG+.

The output file consists of the following columns:

BLOCK_NAME	Generated unique block name
FIRST_SNP	Name of the first SNP in block
LAST_SNP	Name of the last SNP in block
FIRST_SNP_ID	Index of the first SNP in block with respect to the filtered SNPs
LAST_SNP_ID	Index of the last SNP in block with respect to the filtered SNPs
START_BP	The base-pair position of the first SNP in block
END_BP	The base-pair position of the last SNP in block
N_SNPS	Number of SNPs in block
N_HAPS	Number of haplotypes in block
N_UNIQUE_HAPS	Number of unique haplotypes in block
N_COMMON_HAPS	Number of common (which appear more than once) haplotypes in block
N_HAPS_DIVERSITY	The haplotype diversity in block (Patil et al., 2001). 1 - low diversity, 0 - high diversity.

Daniel Taliun, Johann Gamper, Cristian Pattaro

Patil, N. et al. (2001) Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21. Science, 294(5547), 1719–1723.

Gabriel, S. B. et al. (2002) The Structure of Haplotype Blocks in the Human Genome. Science, 296(5576), 2225–2229.

	
    # load LDExplorer library
    library(LDExplorer)
	
    # run mig_rsq() function on HapMap phase II CEU data with default arguments.
    mig_rsq(
     phase_file = "HapMapII_r22_CEU_chr20_31767872_33700401.phase.gz", 
     output_file = "HapMapII_r22_CEU_chr20_31767872_33700401.rsq_blocks.txt",
     phase_file_format = "HAPMAP2",
     map_file = "HapMapII_r22_CEU_chr20_31767872_33700401.legend.txt.gz"
    )
	
    # show contents of the output file
    file.show(
     "HapMapII_r22_CEU_chr20_31767872_33700401.rsq_blocks.txt",
     title="HapMapII_r22_CEU_chr20_31767872_33700401.rsq_blocks.txt"
    )
	
    # run mig_rsq() function on 1000 Genomes Project CEU data with default arguments.
    mig_rsq(
     phase_file = "1000G_phase1_v3_20101123_CEU_chr2_89153688_89307566.vcf.gz", 
     output_file = "1000G_phase1_v3_20101123_CEU_chr2_89153688_89307566.rsq_blocks.txt"
    )
	
    # show contents of the output file
    file.show(
     "1000G_phase1_v3_20101123_CEU_chr2_89153688_89307566.rsq_blocks.txt",
     title="1000G_phase1_v3_20101123_CEU_chr2_89153688_89307566.rsq_blocks.txt"
    )

LDExplorer documentation built on May 2, 2019, 5:54 p.m.

LDExplorer index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

LDExplorer
Efficient Whole-Genome LD Based Haplotpe Block Recognition

mig_rsq: Memory-efficient Implementation of Gabriel et al. 2002...
In LDExplorer: Efficient Whole-Genome LD Based Haplotpe Block Recognition

Description

Usage

Arguments

Haplotype Blocks

SNP Pairs Classification

Haplotype Block Definition

Search Space Pruning

MIG

MIG+

MIG++

Output File

Author(s)

References

Examples

Related to mig_rsq in LDExplorer...

R Package Documentation

Browse R Packages

We want your feedback!

LDExplorer Efficient Whole-Genome LD Based Haplotpe Block Recognition

mig_rsq: Memory-efficient Implementation of Gabriel et al. 2002... In LDExplorer: Efficient Whole-Genome LD Based Haplotpe Block Recognition

Description

Usage

Arguments

Haplotype Blocks

SNP Pairs Classification

Haplotype Block Definition

Search Space Pruning

MIG

MIG+

MIG++

Output File

Author(s)

References

Examples

Related to mig_rsq in LDExplorer...

R Package Documentation

Browse R Packages

We want your feedback!

LDExplorer
Efficient Whole-Genome LD Based Haplotpe Block Recognition

mig_rsq: Memory-efficient Implementation of Gabriel et al. 2002...
In LDExplorer: Efficient Whole-Genome LD Based Haplotpe Block Recognition