In yasmina-mekki/gwhap: Genome wide haplotype analysis

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(gwhap)

RESOURCE_ROOT = "<YOUR_LOCAL_RESOURCE_FS>"

Installation of `gwhap`

There are two ways to obtain the gwhap source package:

Using the CRAN

install.packages('gwhap')

Using github

devtools::install_github('yasmina-mekki/gwhap')

The gwhap package implies the use of a version of R (3.6.2 and higher) as well as a certain number libraries.

In order to facilitate the use of it, here is a recipe of a singularity image containing all the necessary. Please refer to the github repo, for more explanation about how to generate it.

devtools::install_github("yasmina-mekki/gwhap_singularity_image")

Create the augmented genetic map

Currently, only threee kind of map are took into account:

Rutgers genetic map v3 (Kosambi)
1000_genome
1000_genome_interpolated

Here is an example using the Rutgers genetic map.

Download the Rutgers genetic map v3 (Kosambi)

download_rutgers_map(RESOURCE_ROOT)

interpolate a new map for SNPs using either:

bim file

snp_physical_positions = "path_to_your_snps_file/snp_file.bim"

bgi/bgen file

snp_physical_positions = "path_to_your_bgen_file/ukb_hap_"

Create the augmented genetic map as follow:

create_augmented_genetic_map(snp_physical_positions=snp_physical_positions, genetic_map_dir='path_to_your_rutgers_map_dir', map_name='rutgers', save_genetic_map=TRUE, output='path_to_your_output_dir')

Blocs creation

Create the blocs

Before creating the blocs, you need to load the map created above

augmented_map_df = get_augmented_genetic_map(augmented_genetic_map_dir='path_to_your_output_dir', chromosomes=1:23)

Create blocs with a single delta

create_blocs(genetics_map=augmented_map_df, delta=1e-3, save_blocs=TRUE, output='path_to_your_output_dir')

Create blocs with multiple values of delta

deltas = c(1e-3, 2.5e-3, 5e-3, 7.5e-3)
dfs_blocks = data.frame()
for(delta in deltas){
  df_blocks_tmp = create_blocs(genetics_map=augmented_map_df, delta=delta, save_blocs=FALSE)
  dfs_blocks <- rbind(dfs_blocks, df_blocks_tmp)
}

Blocs visualization

Assuming that df_blocs is your bloc data frame already loaded

Blocs distribution plot

Blocs distribution for all chromosomes and optionnaly for different values of delta

haplotype_bloc_distribution_per_delta = haplotype_bloc_distribution(df_blocs)
save_plot(haplotype_bloc_distribution_per_delta, paste0('output_path/', 'haplotype_bloc_distribution_per_delta.png'))

Blocs distribution per chromosome and optionnaly for different values of delta

haplotype_bloc_distribution_per_chr = haplotype_bloc_distribution(df_blocs, per_chromosome=TRUE)
save_plot(haplotype_bloc_distribution_per_chr, paste0('output_path/', 'haplotype_bloc_distribution_per_chr.png'))

Karyotype plot

karyotype_plot_obj = karyotype_plot(df_blocs)
save_plot(karyotype_plot_obj, paste0('output_path/', 'karyotype_plot.png'))

Determine the haplotypes

Before determining the haplotypes, you need to load the blocs created above

blocs_df = get_blocs(augmented_genetic_map_dir='path_to_your_output_dir')

Here is an example for one bloc

Set the parameters

chr = 'chr1'
start = 1000
end = 1200
bgnfile = sprintf('bgen_file_path_%s.bgen', chr)
samples_index = c('index_1', 'index_2', 'index_3', 'index_4', 'index_5')
mysamples = c('IID_1', 'IID_2', 'IID_3', 'IID_4', 'IID_5')

Run the determine haplotype function

haplotypes = legacy_determine_haplotypes_per_bloc(chr, start, end, bgnfile, samples_index, mysamples)

Please refer to the genome wide example vignette for more code about how to determine the haplotypes using all the blocs.

Note:

In the bgen file, the IID are annonymized. It is constitued of, among others, a list of sample that contain the anonymized IID indexed. The .sample file is used as a reference table. It contains the IID of the participant indexed. Here, the index can be used as a foreign key to toggle from the anonymized IID to the real ones.

Warning: please notice that .sample file has two header lines. Be sure to remove the second header and starting the index account from the second line.

Test the haplotypes

Set the parameters

X = load('haplotypes_path_chr1.RData')
Y = read_delim(file_path_phenotype.tsv, delim='\t')

Run the test haplotypes function

test_results = lm_test_haplotypes(X, Y, kind='all')

Warning:

One should know that the test haplotype function assume that the Y are residualized. Indeed, it does not take into account the covariates.

yasmina-mekki/gwhap documentation built on Dec. 31, 2021, 9:19 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

yasmina-mekki/gwhap
Genome wide haplotype analysis

In yasmina-mekki/gwhap: Genome wide haplotype analysis

Installation of `gwhap`

Create the augmented genetic map

Blocs creation

Create the blocs

Blocs visualization

Determine the haplotypes

Test the haplotypes

R Package Documentation

Browse R Packages

We want your feedback!

yasmina-mekki/gwhap Genome wide haplotype analysis

In yasmina-mekki/gwhap: Genome wide haplotype analysis

Installation of gwhap

Create the augmented genetic map

Blocs creation

Create the blocs

Blocs visualization

Determine the haplotypes

Test the haplotypes

R Package Documentation

Browse R Packages

We want your feedback!

yasmina-mekki/gwhap
Genome wide haplotype analysis

Installation of `gwhap`