knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
RESOURCE_ROOT = "<YOUR_LOCAL_RESOURCE_FS>"

TO DO

Genetic maps

Definition

Rutgers genetic maps

The Rutgers Combined Linkage-Physical Map is a high-resolution genetic map that contains interpolated genetic positions for variants from dbSNP ...

The Latest release of Rutgers Map v.3 (June 2012) includes the interpolated positions (Kosambi) of dbSNP Build 137 reference SNPs and UniSTS markers from Build 37.3 (GRCh37 patch 5).

This map is available as ascii files from this link. Get and install a version in "rugers_map_v3" directory.

RUTGERS_GENMAP = file.path(RESOURCE_ROOT, "rutgers_map_v3")
dir(RUTGERS_GENMAP)

You can download it using the gwhap package as well. Please refer to the 'How to use gwhap package' vignette for more details.

Definition of a genomic region of study

The update of the genetic map (Rutgers or any other) add new chromosomic position for which the recombination rate is estimated (interpolated) from the existing map.

The positions where the values are interpolated are given by a bim file or a bgen file. This is just to specify the list of location where potential values of recombination have to be interpolated.

Remarks: in the package a file named data/small_region.bim is given for example. Usually the plink/bim file of the genetic data is used for the haplotype study.

GENOTYPE_DATA = file.path(RESOURCE_ROOT, "small_region.bim")
dir(GENOTYPE_DATA)

Necessary fields when a bim file is used

A text file with no header line, and one line per variant with the following three fields:

The following table illutrate an example of a .bim file. The headers were added for more clarity.

| Chromosome | Variant ID | Base-pair coordinate | | :--------: | :----------: | :-----------------: | | 14 | rs58082782 | 59026166 | | 14 | rs7155836 | 59028006 | | 14 | rs113397864 | 59034994 | | 14 | rs76924950 | 59037440 | | 14 | rs386777953 | 59039893 |

Necessary fields when a bgen file is used

As described by gavinband, a bgen file contains a list of 5 structures:

Here is an example of a bgen file

List of 5
 $ variants:'data.frame':   2 obs. of  6 variables:
  ..$ chromosome       : Factor w/ 1 level "01": 1 1
  ..$ position         : int [1:2] 1001 2000
  ..$ rsid             : Factor w/ 2 levels "RSID_101","RSID_2": 1 2
  ..$ number_of_alleles: int [1:2] 2 2
  ..$ allele0          : Factor w/ 1 level "A": 1 1
  ..$ allele1          : Factor w/ 1 level "G": 1 1
 $ samples : chr [1:500] "sample_001" "sample_002" "sample_003" "sample_004" ...
 $ ploidy  : int [1:2, 1:500] 2 2 2 2 2 2 2 2 2 2 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:2] "RSID_101" "RSID_2"
  .. ..$ : chr [1:500] "sample_001" "sample_002" "sample_003" "sample_004" ...
 $ phased  : logi [1:2] FALSE FALSE
 $ data    : num [1:2, 1:500, 1:3] 0 NA 0.00784 0.02745 0.99608 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:2] "RSID_101" "RSID_2"
  .. ..$ : chr [1:500] "sample_001" "sample_002" "sample_003" "sample_004" ...
  .. ..$ : chr [1:3] "g=0" "g=1" "g=2"

Remarks about the phased data

working in progress

The phased haplotypes in BGEN format.

For the UKB cohorts, the phased haplotypes are available for each chromosome separatly. Three files are needed :

Please refer to this wiki for more details.

HAPLOTYPES = file.path(RESOURCE_ROOT, "HAPLOTYPES/ukb_hap_chr<chr>_v2.bgen")
dir(HAPLOTYPES)


yasmina-mekki/gwhap documentation built on Dec. 31, 2021, 9:19 p.m.