Home

/

GitHub

/

In yasmina-mekki/gwhap: Genome wide haplotype analysis

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

RESOURCE_ROOT = "<YOUR_LOCAL_RESOURCE_FS>"

TO DO

Phased data section
En quoi consiste les phased data
Comment phaser les datas

Genetic maps

Definition

Rutgers genetic maps

The Rutgers Combined Linkage-Physical Map is a high-resolution genetic map that contains interpolated genetic positions for variants from dbSNP ...

The Latest release of Rutgers Map v.3 (June 2012) includes the interpolated positions (Kosambi) of dbSNP Build 137 reference SNPs and UniSTS markers from Build 37.3 (GRCh37 patch 5).

This map is available as ascii files from this link. Get and install a version in "rugers_map_v3" directory.

RUTGERS_GENMAP = file.path(RESOURCE_ROOT, "rutgers_map_v3")
dir(RUTGERS_GENMAP)

You can download it using the gwhap package as well. Please refer to the 'How to use gwhap package' vignette for more details.

Definition of a genomic region of study

The update of the genetic map (Rutgers or any other) add new chromosomic position for which the recombination rate is estimated (interpolated) from the existing map.

The positions where the values are interpolated are given by a bim file or a bgen file. This is just to specify the list of location where potential values of recombination have to be interpolated.

Remarks: in the package a file named data/small_region.bim is given for example. Usually the plink/bim file of the genetic data is used for the haplotype study.

GENOTYPE_DATA = file.path(RESOURCE_ROOT, "small_region.bim")
dir(GENOTYPE_DATA)

Necessary fields when a bim file is used

A text file with no header line, and one line per variant with the following three fields:

Chromosome code or name
Variant identifier
Base-pair coordinate

The following table illutrate an example of a .bim file. The headers were added for more clarity.

| Chromosome | Variant ID | Base-pair coordinate | | :--------: | :----------: | :-----------------: | | 14 | rs58082782 | 59026166 | | 14 | rs7155836 | 59028006 | | 14 | rs113397864 | 59034994 | | 14 | rs76924950 | 59037440 | | 14 | rs386777953 | 59039893 |

Necessary fields when a bgen file is used

As described by gavinband, a bgen file contains a list of 5 structures:

A list of variants with the chromosome, the position, the rsid, the number of alleles, the allele 0 code and the allele 1 code
A list of sample IDs
The ploidy for each sample at each variant
An indicator of whether data is phased or unphased at each variants
The genotype probabilities. A 3d array with the first dimension being the variants, the second the samples, and the third the genotype

Here is an example of a bgen file

List of 5
 $ variants:'data.frame':   2 obs. of  6 variables:
  ..$ chromosome       : Factor w/ 1 level "01": 1 1
  ..$ position         : int [1:2] 1001 2000
  ..$ rsid             : Factor w/ 2 levels "RSID_101","RSID_2": 1 2
  ..$ number_of_alleles: int [1:2] 2 2
  ..$ allele0          : Factor w/ 1 level "A": 1 1
  ..$ allele1          : Factor w/ 1 level "G": 1 1
 $ samples : chr [1:500] "sample_001" "sample_002" "sample_003" "sample_004" ...
 $ ploidy  : int [1:2, 1:500] 2 2 2 2 2 2 2 2 2 2 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:2] "RSID_101" "RSID_2"
  .. ..$ : chr [1:500] "sample_001" "sample_002" "sample_003" "sample_004" ...
 $ phased  : logi [1:2] FALSE FALSE
 $ data    : num [1:2, 1:500, 1:3] 0 NA 0.00784 0.02745 0.99608 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:2] "RSID_101" "RSID_2"
  .. ..$ : chr [1:500] "sample_001" "sample_002" "sample_003" "sample_004" ...
  .. ..$ : chr [1:3] "g=0" "g=1" "g=2"

Remarks about the phased data

working in progress

The phased haplotypes in BGEN format.

For the UKB cohorts, the phased haplotypes are available for each chromosome separatly. Three files are needed :

.bgen : Native binary file format for Oxford statistical genetics tools. For more detail, refer to the section above.
.bgi : bgen corresponding index files
.sample : The sample file lists the order of the samples in the .bgen files

Please refer to this wiki for more details.

HAPLOTYPES = file.path(RESOURCE_ROOT, "HAPLOTYPES/ukb_hap_chr<chr>_v2.bgen")
dir(HAPLOTYPES)

yasmina-mekki/gwhap documentation built on Dec. 31, 2021, 9:19 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

yasmina-mekki/gwhap
Genome wide haplotype analysis

In yasmina-mekki/gwhap: Genome wide haplotype analysis

TO DO

Genetic maps

Definition

Rutgers genetic maps

Definition of a genomic region of study

Necessary fields when a bim file is used

Necessary fields when a bgen file is used

Remarks about the phased data

R Package Documentation

Browse R Packages

We want your feedback!

yasmina-mekki/gwhap Genome wide haplotype analysis

In yasmina-mekki/gwhap: Genome wide haplotype analysis

TO DO

Genetic maps

Definition

Rutgers genetic maps

Definition of a genomic region of study

Necessary fields when a bim file is used

Necessary fields when a bgen file is used

Remarks about the phased data

R Package Documentation

Browse R Packages

We want your feedback!

yasmina-mekki/gwhap
Genome wide haplotype analysis