knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(gwhap)

RESOURCE_ROOT = "<YOUR_LOCAL_RESOURCE_FS>"

Installation of gwhap

There are two ways to obtain the gwhap source package:

install.packages('gwhap')
devtools::install_github('yasmina-mekki/gwhap')

The gwhap package implies the use of a version of R (3.6.2 and higher) as well as a certain number libraries.

In order to facilitate the use of it, here is a recipe of a singularity image containing all the necessary. Please refer to the github repo, for more explanation about how to generate it.

devtools::install_github("yasmina-mekki/gwhap_singularity_image")

Create the augmented genetic map

Currently, only threee kind of map are took into account:

Here is an example using the Rutgers genetic map.

Download the Rutgers genetic map v3 (Kosambi)

download_rutgers_map(RESOURCE_ROOT)

interpolate a new map for SNPs using either:

snp_physical_positions = "path_to_your_snps_file/snp_file.bim"
snp_physical_positions = "path_to_your_bgen_file/ukb_hap_"

Create the augmented genetic map as follow:

create_augmented_genetic_map(snp_physical_positions=snp_physical_positions, genetic_map_dir='path_to_your_rutgers_map_dir', map_name='rutgers', save_genetic_map=TRUE, output='path_to_your_output_dir')

Blocs creation

Create the blocs

Before creating the blocs, you need to load the map created above

augmented_map_df = get_augmented_genetic_map(augmented_genetic_map_dir='path_to_your_output_dir', chromosomes=1:23)

Create blocs with a single delta

create_blocs(genetics_map=augmented_map_df, delta=1e-3, save_blocs=TRUE, output='path_to_your_output_dir')

Create blocs with multiple values of delta

deltas = c(1e-3, 2.5e-3, 5e-3, 7.5e-3)
dfs_blocks = data.frame()
for(delta in deltas){
  df_blocks_tmp = create_blocs(genetics_map=augmented_map_df, delta=delta, save_blocs=FALSE)
  dfs_blocks <- rbind(dfs_blocks, df_blocks_tmp)
}

Blocs visualization

Assuming that df_blocs is your bloc data frame already loaded

Blocs distribution plot

haplotype_bloc_distribution_per_delta = haplotype_bloc_distribution(df_blocs)
save_plot(haplotype_bloc_distribution_per_delta, paste0('output_path/', 'haplotype_bloc_distribution_per_delta.png'))
haplotype_bloc_distribution_per_chr = haplotype_bloc_distribution(df_blocs, per_chromosome=TRUE)
save_plot(haplotype_bloc_distribution_per_chr, paste0('output_path/', 'haplotype_bloc_distribution_per_chr.png'))

Karyotype plot

karyotype_plot_obj = karyotype_plot(df_blocs)
save_plot(karyotype_plot_obj, paste0('output_path/', 'karyotype_plot.png'))

Determine the haplotypes

Before determining the haplotypes, you need to load the blocs created above

blocs_df = get_blocs(augmented_genetic_map_dir='path_to_your_output_dir')

Here is an example for one bloc

chr = 'chr1'
start = 1000
end = 1200
bgnfile = sprintf('bgen_file_path_%s.bgen', chr)
samples_index = c('index_1', 'index_2', 'index_3', 'index_4', 'index_5')
mysamples = c('IID_1', 'IID_2', 'IID_3', 'IID_4', 'IID_5')
haplotypes = legacy_determine_haplotypes_per_bloc(chr, start, end, bgnfile, samples_index, mysamples)

Please refer to the genome wide example vignette for more code about how to determine the haplotypes using all the blocs.

Note:

In the bgen file, the IID are annonymized. It is constitued of, among others, a list of sample that contain the anonymized IID indexed. The .sample file is used as a reference table. It contains the IID of the participant indexed. Here, the index can be used as a foreign key to toggle from the anonymized IID to the real ones.

Warning: please notice that .sample file has two header lines. Be sure to remove the second header and starting the index account from the second line.

Test the haplotypes

X = load('haplotypes_path_chr1.RData')
Y = read_delim(file_path_phenotype.tsv, delim='\t')
test_results = lm_test_haplotypes(X, Y, kind='all')

Warning:

One should know that the test haplotype function assume that the Y are residualized. Indeed, it does not take into account the covariates.



yasmina-mekki/gwhap documentation built on Dec. 31, 2021, 9:19 p.m.