knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(gwhap) RESOURCE_ROOT = "<YOUR_LOCAL_RESOURCE_FS>"
gwhap
There are two ways to obtain the gwhap source package:
install.packages('gwhap')
devtools::install_github('yasmina-mekki/gwhap')
The gwhap package implies the use of a version of R (3.6.2 and higher) as well as a certain number libraries.
In order to facilitate the use of it, here is a recipe of a singularity image containing all the necessary. Please refer to the github repo, for more explanation about how to generate it.
devtools::install_github("yasmina-mekki/gwhap_singularity_image")
Currently, only threee kind of map are took into account:
Here is an example using the Rutgers genetic map.
Download the Rutgers genetic map v3 (Kosambi)
download_rutgers_map(RESOURCE_ROOT)
interpolate a new map for SNPs using either:
snp_physical_positions = "path_to_your_snps_file/snp_file.bim"
snp_physical_positions = "path_to_your_bgen_file/ukb_hap_"
Create the augmented genetic map as follow:
create_augmented_genetic_map(snp_physical_positions=snp_physical_positions, genetic_map_dir='path_to_your_rutgers_map_dir', map_name='rutgers', save_genetic_map=TRUE, output='path_to_your_output_dir')
Before creating the blocs, you need to load the map created above
augmented_map_df = get_augmented_genetic_map(augmented_genetic_map_dir='path_to_your_output_dir', chromosomes=1:23)
Create blocs with a single delta
create_blocs(genetics_map=augmented_map_df, delta=1e-3, save_blocs=TRUE, output='path_to_your_output_dir')
Create blocs with multiple values of delta
deltas = c(1e-3, 2.5e-3, 5e-3, 7.5e-3) dfs_blocks = data.frame() for(delta in deltas){ df_blocks_tmp = create_blocs(genetics_map=augmented_map_df, delta=delta, save_blocs=FALSE) dfs_blocks <- rbind(dfs_blocks, df_blocks_tmp) }
Assuming that df_blocs is your bloc data frame already loaded
Blocs distribution plot
haplotype_bloc_distribution_per_delta = haplotype_bloc_distribution(df_blocs) save_plot(haplotype_bloc_distribution_per_delta, paste0('output_path/', 'haplotype_bloc_distribution_per_delta.png'))
haplotype_bloc_distribution_per_chr = haplotype_bloc_distribution(df_blocs, per_chromosome=TRUE) save_plot(haplotype_bloc_distribution_per_chr, paste0('output_path/', 'haplotype_bloc_distribution_per_chr.png'))
Karyotype plot
karyotype_plot_obj = karyotype_plot(df_blocs) save_plot(karyotype_plot_obj, paste0('output_path/', 'karyotype_plot.png'))
Before determining the haplotypes, you need to load the blocs created above
blocs_df = get_blocs(augmented_genetic_map_dir='path_to_your_output_dir')
Here is an example for one bloc
chr = 'chr1' start = 1000 end = 1200 bgnfile = sprintf('bgen_file_path_%s.bgen', chr) samples_index = c('index_1', 'index_2', 'index_3', 'index_4', 'index_5') mysamples = c('IID_1', 'IID_2', 'IID_3', 'IID_4', 'IID_5')
haplotypes = legacy_determine_haplotypes_per_bloc(chr, start, end, bgnfile, samples_index, mysamples)
Please refer to the genome wide example vignette for more code about how to determine the haplotypes using all the blocs.
Note:
In the bgen file, the IID are annonymized. It is constitued of, among others, a list of sample that contain the anonymized IID indexed. The .sample file is used as a reference table. It contains the IID of the participant indexed. Here, the index can be used as a foreign key to toggle from the anonymized IID to the real ones.
Warning: please notice that .sample file has two header lines. Be sure to remove the second header and starting the index account from the second line.
X = load('haplotypes_path_chr1.RData') Y = read_delim(file_path_phenotype.tsv, delim='\t')
test_results = lm_test_haplotypes(X, Y, kind='all')
Warning:
One should know that the test haplotype function assume that the Y are residualized. Indeed, it does not take into account the covariates.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.