hidecan

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(tibble)
library(dplyr)
library(purrr)
library(stringr)
library(hidecan)

Input data

The hidecan package takes as input tibbles (data-frames) of GWAS and DE results or candidate genes. The input data-frames should contain some mandatory columns, depending on the type of data.

A list of example input datasets can be obtained via the get_example_data() function:

x <- get_example_data()

str(x, max.level = 1)

GWAS results

GWAS results should be provided as a tibble or data-frame, with one row per genetic marker. The data-frame should contain at least the following columns:

Any other column present in the data-frame will be ignored. An example of valid input is shown below:

head(x[["GWAS"]])

Differential expression results

DE results should be provided as a tibble or data-frame, with one row per gene. The data-frame should contain at least the following columns:

Any other column present in the data-frame will be ignored. An example of valid input is shown below:

head(x[["DE"]])

(Note that in the example dataset, some genes have missing values in the padj column; this corresponds to genes that have been filtered out via independent filtering in the DESeq2 package).

Candidate genes

A list of candidate genes (e.g. genes previously found associated with a trait of interest based on literature search) can be provided as a tibble or data-frame, with one row per gene. This data-frame can also contain variants of interest (see below). The data-frame should contain at least the following columns:

Any other column present in the data-frame will be ignored. An example of valid input is shown below:

head(x[["CAN"]])

Creating a HIDECAN plot

The hidecan_plot() function creates a HIDECAN plot. It takes as input the data-frames presented above, as well as the score and log2(fold-change) thresholds used to select the significant markers and genes.

In this example, we only show markers with a score above 4, which corresponds to a p-value below $1\times10^{-4}$, and genes with a score above 1.3, which corresponds to a p-value of 0.05. We don't place any threshold on the log2(fold-change) of the genes:

hidecan_plot(
  gwas_list = x[["GWAS"]],          ## data-frame of GWAS results          
  de_list = x[["DE"]],              ## data-frame of DE results              
  can_list = x[["CAN"]],            ## data-frame of candidate genes            
  score_thr_gwas = -log10(0.0001),  ## sign. threshold for GWAS
  score_thr_de = -log10(0.05),      ## sign. threshold for DE
  log2fc_thr = 0                    ## log2FC threshold for DE
)

Note that it is possible to provide only a subset of the possible input data, e.g. only GWAS results and a list of candidate genes:

hidecan_plot(
  gwas_list = x[["GWAS"]],          
  can_list = x[["CAN"]],            
  score_thr_gwas = 4
)

Removing empty chromosomes

By default, the HIDECAN plot shows all chromosomes present in the input data. However, it is possible that some of the chromosomes appear empty, as they do not contain any significant gene or marker, nor any candidate gene. In this case, it is possible to exclude such "empty" chromosomes from the HIDECAN plot, through the remove_empty_chrom argument.

We will demonstrate that by increasing the score threshold applied to the GWAS results, in order to get fewer significant markers. In this case, chromosomes 0, 6, 9 and 10 do not contain any significant marker of gene of interest:

## Chromosomes 0, 6, 9 and 10 are empty
hidecan_plot(
  gwas_list = x[["GWAS"]],          
  can_list = x[["CAN"]],            
  score_thr_gwas = 5
)

By setting the remove_empty_chrom argument to TRUE, these chromosomes will be removed from the plot:

hidecan_plot(
  gwas_list = x[["GWAS"]],          
  can_list = x[["CAN"]],            
  score_thr_gwas = 5,
  remove_empty_chrom = TRUE
)

Selecting chromosomes and genomic positions

It is possible to specify which chromosomes should be represented in the HIDECAN plot, via the chroms argument. For example, with the following command we restrict the plot to chromosomes 7 and 8:

hidecan_plot(
  gwas_list = x[["GWAS"]],                    
  de_list = x[["DE"]],                          
  can_list = x[["CAN"]],                  
  score_thr_gwas = -log10(0.0001),  
  score_thr_de = -log10(0.05),      
  log2fc_thr = 0,
  chroms = c("ST4.03ch07", "ST4.03ch08")
)

We can also "zoom in" on some or all chromosomes, through the chrom_limits argument. To zoom in on all chromosomes at once, we pass to the chrom_limits argument an integer vector of length 2, which gives the lower and upper limits in bp to use. For example here, we focus on the 10-20Mb region of each chromosome:

hidecan_plot(
  gwas_list = x[["GWAS"]],                    
  de_list = x[["DE"]],                          
  can_list = x[["CAN"]],                  
  score_thr_gwas = -log10(0.0001),  
  score_thr_de = -log10(0.05),      
  log2fc_thr = 0,
  chrom_limits = c(10e6, 20e6)
)

Alternatively, we can apply different limits to some of the chromosomes, by passing a named list to the argument. The names of the list should match the chromosomes name, and each element should be an integer vector of length 2 giving the lower and upper limits in bp to use for the corresponding chromosome. For example, we will focus on the 10-20Mb region for chromosome 1, and the 30-40Mb region for chromosome 5, and leave all other chromosomes as is:

hidecan_plot(
  gwas_list = x[["GWAS"]],                    
  de_list = x[["DE"]],                          
  can_list = x[["CAN"]],                  
  score_thr_gwas = -log10(0.0001),  
  score_thr_de = -log10(0.05),      
  log2fc_thr = 0,
  chrom_limits = list("ST4.03ch01" = c(10e6, 20e6),
                      "ST4.03ch05" = c(30e6, 40e6))
)

The two options chroms and chrom_limits can be used together:

hidecan_plot(
  gwas_list = x[["GWAS"]],                    
  de_list = x[["DE"]],                          
  can_list = x[["CAN"]],                  
  score_thr_gwas = -log10(0.0001),  
  score_thr_de = -log10(0.05),      
  log2fc_thr = 0,
  chroms = c("ST4.03ch07", "ST4.03ch08"),
  chrom_limits = list("ST4.03ch07" = c(50e6, 55e6),
                      "ST4.03ch08" = c(45e6, 50e6))
)

Colour genes by log2(fold-change)

By default, in a HIDECAN plot, the points representing both significant markers and DE genes are coloured according to their GWAS/DE score. However, it is possible to colour the DE genes by their log2(fold-change) value instead, by setting the colour_genes_by_score argument to FALSE:

hidecan_plot(
  gwas_list = x[["GWAS"]],          
  de_list = x[["DE"]],              
  can_list = x[["CAN"]],            
  score_thr_gwas = -log10(0.0001),
  score_thr_de = -log10(0.05),
  log2fc_thr = 0,
  colour_genes_by_score = FALSE
)

Genes with a negative log2(fold-change) will be represented with a shade of blue, and genes with a positive log2(fold-change) will be represented with a shade of red.

More than one GWAS, DE or candidate gene list

The hidecan_plot() function can take as an input lists of data-frames for GWAS results, DE results or candidate genes. This way, it is possible to visualise more than one GWAS or DE analyses at once, for example if investigating several traits at once or comparing more than two treatment groups.

For this example, we'll focus on chromosomes 7 and 8 (only for clarity of the plot):

library(dplyr)
library(purrr)
library(stringr)

## Retaining only markers and genes on chromosomes 7 and 8
x_small <- x |> 
  map(~ filter(.x, str_detect(chromosome, "(07|08)")))

We'll create a second data-frame of GWAS results by shuffling the marker scores in the example dataset:

## Creating a second GWAS result tibble by shuffling 
## the marker scores from the original data
gwas_1 <- x_small[["GWAS"]]
gwas_2 <- gwas_1 |> 
  mutate(score = sample(score))

We can pass both GWAS results data-frames to the hidecan_plot() function as a list:

hidecan_plot(
  gwas_list = list(gwas_1, gwas_2),
  score_thr_gwas = -log10(0.0001),
  score_thr_de = -log10(0.05),
  log2fc_thr = 0
)

By default, the two GWAS tracks will be given unique y-axis labels, as can be seen above. It is possible to customise this by naming the elements in the input list:

hidecan_plot(
  gwas_list = list("Trait 1" = gwas_1, 
                   "Trait 2" = gwas_2),
  score_thr_gwas = -log10(0.0001),
  score_thr_de = -log10(0.05),
  log2fc_thr = 0
)

Defining chromosomes length

By default, the hidecan_plot() function calculates the length of the different chromosomes based on the input data, by looking at the maximum position of genes and markers on each chromosome. However, it is also possible to pass on a tibble of chromosome length (in bp) through the chrom_length argument.

library(tibble)

## Chromosomes length as recorded in Ensembl Plants
potato_chrom_length <- c(
  ST4.03ch00 = 45813526,
  ST4.03ch01 = 88663952,
  ST4.03ch02 = 48614681,
  ST4.03ch03 = 62190286,
  ST4.03ch04 = 72208621,
  ST4.03ch05 = 52070158,
  ST4.03ch06 = 59532096,
  ST4.03ch07 = 56760843,
  ST4.03ch08 = 56938457,
  ST4.03ch09 = 61540751,
  ST4.03ch10 = 59756223,
  ST4.03ch11 = 45475667,
  ST4.03ch12 = 61165649
) |> 
  ## turn a named vector into a tibble
  enframe(name = "chromosome",
          value = "length")

head(potato_chrom_length)
hidecan_plot(
  gwas_list = x[["GWAS"]],          
  de_list = x[["DE"]],              
  can_list = x[["CAN"]],
  score_thr_gwas = -log10(0.0001),
  score_thr_de = -log10(0.05),
  log2fc_thr = 0,
  chrom_length = potato_chrom_length
)

Note that in this case we can't really see the difference with the computed chromosome length values.

Controlling the plot properties

The hidecan_plot() function offers several arguments to control different aspects of the HIDECAN plot. For example, it is possible to specify the number of rows or columns the plot should have, through the n_rows and n_cols arguments. Note that only one of these arguments will be considered (n_rows takes precedence):

## Specifying the number of rows
hidecan_plot(
  gwas_list = x[["GWAS"]],          
  de_list = x[["DE"]],              
  can_list = x[["CAN"]],            
  score_thr_gwas = -log10(0.0001),
  score_thr_de = -log10(0.05),
  log2fc_thr = 0,
  n_rows = 3
)
## Specifying the number of columns
hidecan_plot(
  gwas_list = x[["GWAS"]],          
  de_list = x[["DE"]],              
  can_list = x[["CAN"]],            
  score_thr_gwas = -log10(0.0001),
  score_thr_de = -log10(0.005),
  log2fc_thr = 0,
  n_cols = 3
)

In addition, it is possible to:

Viewport error

If you are working on RStudio, you may encounter the following error:

hidecan_plot(gwas_list = x[["GWAS"]],
             de_list = x[["DE"]],
             can_list = x[["CAN"]],
             score_thr_gwas = -log10(0.0001),
             score_thr_de = -log10(0.05),
             log2fc_thr = 0,
             label_size = 2)

This is caused by the plotting window being too small. Try increasing the size of the plotting window in the RStudio console. Alternatively, you can save the plot into an R object, then use ggplot2::ggsave() to save it into a file:

p <- hidecan_plot(
  gwas_list = x[["GWAS"]],
  de_list = x[["DE"]],
  can_list = x[["CAN"]],
  score_thr_gwas = -log10(0.0001),
  score_thr_de = -log10(0.05),
  log2fc_thr = 0,
  label_size = 2
)

ggplot2::ggsave("hidecan_plot.pdf", p, width = 10, height = 10)


Try the hidecan package in your browser

Any scripts or data that you put into this service are public.

hidecan documentation built on Feb. 16, 2023, 6:22 p.m.