gwplotting: gwplotting: A package to simplify plotting genome-wide...

Description Required Packages Common Format Loading Functions Reordering Functions Plotting Functions Linkage Disequilibrium Functions Example

Description

This package contains a set of small functions that perform three main tasks: -Loading files in common data formats -Reordering constituent scaffolds according to chromosome assignments or lengths -Plotting common statistics

Required Packages

This package utilizes a variety of functions from Hadley Wickham's tidyverse package. Update your R and RStudio to the latest versions, then install the latest version of tidyverse.

Common Format

I have tried to make each loading function return a standard format tibble, containing four columns with the following headers: scaf - scaffold name ps - the position on the scaffold (this may be a single site, or the start or midpoint of a window) stat - the statistic value at that position. chr - the chromosome (this is initialized with a unique number for each scaffold for most of the load functions, but is replaced when doing the actual reordering)

Loading Functions

As of Feb 2019, I have written in the following loading functions. Except where noted, they output the Common Format described above. See their help messages for more detail. The function names are fairly explanatory.

load_gemma_gwas - GEMMA GWAS output load_vcftools_stats - Common statistics output by VCFtools load_plink_gwas - PLINK GWAS output load_abbababa - Simon Martin's ABBABABAwindows.py output load_pgw - Simon Martin's popgenWindows.py output

Reordering Functions

There are two ways I usually reorder scaffolds: by length or by assignments to another species' chromosomes. See these function help messages for more details.

reorder_scaffolds - Reorder by assignment to chromosomes reorder_by_scaf_len - Reorder by scaffold lengths (longest - shortest) get_cumulative_positions - Give each site a cumulative position

Plotting Functions

There is one main function, but I will add as I go. This function wraps in get_cumulative_positions before plotting.

plot_genomewide_data - Plot all scaffolds / chromosomes plot_region_data - Plot a specific scaffold / chromosome or part

Linkage Disequilibrium Functions

There are various things that you can do with LD measurements.

load_plink_ld - Pairwise LD output from PLINK calculate_ld_decay - For general patterns of LD decay calculate_windowed_ld - For patterns of LD on small scales

Example

x <- load_gemma_gwas( 'file.assoc.txt.gz', pval = 'p_wald' ) y <- reorder_by_scaf_len( x, 'scaffolds.chromSizes' ) z <- plot_genomewide_data( y, type = 'gwas', 'scaffolds.chromSizes', plotting_column = 'stat' ) tiff( 'myplot.tiff', width = 4, height = 2, units = 'in', res = 600 ) z dev.off()


nwvankuren/genomics-plotting documentation built on April 14, 2021, 1:18 a.m.