knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(tibble) library(dplyr) library(purrr)
library(hidecan) library(tibble) library(dplyr) library(purrr)
The hidecan_plot()
function is really a wrapper that performs a series of steps to go from GWAS/DE results and candidate genes data-frames to a HIDECAN (gg)plot. In this vignette, we present the different steps performed by the wrapper function. This can be useful for users who want more control over the process. We'll work with the example dataset provided with the package
x <- get_example_data() str(x)
Under the hood, the hidecan
package relies on S3 classes, which are really just tibbles with specific columns. The constructors for these S3 classes perform a series of checks and computations to make sure that all of the required columns (such as chromosome, score, position) are present in the data.
GWAS_data
objects through the GWAS_data()
constructor:gwas_data <- GWAS_data(x[["GWAS"]]) class(gwas_data) head(gwas_data)
DE_data
objects through the DE_data()
constructor:de_data <- DE_data(x[["DE"]]) class(de_data) head(de_data)
CAN_data
objects through the CAN_data()
constructor:## CAN_data constructor can_data <- CAN_data(x[["CAN"]]) class(can_data) head(can_data)
These constructors will throw an error if a required column is missing from the input data (e.g. no chromosome column):
gwas_wrong_input <- x[["GWAS"]] |> select(-chromosome) GWAS_data(gwas_wrong_input)
They will also compute marker or gene scores from adjusted p-values if necessary (see the Input data section of the hidecan vignette). For example, for DE results, if we provide a padj
column (with the adjusted p-values of the genes) rather than a score
column, the constructor will compute the score
column based on the padj
column. You can also notice that a position
column is computed based on the start and end of the genes:
## Input tibble head(x[["DE"]]) ## Output of the DE_data constructor head(de_data)
Once the input datasets have been formatted appropriately, they are used to compute the length of the chromosomes present in the data. This is done through the combine_chrom_length()
function, which is applied to a list of GWAS_data
, DE_data
or CAN_data
objects:
chrom_length <- combine_chrom_length(list(gwas_data, de_data, can_data)) chrom_length
The function works by calling for each element in the list the compute_chrom_length()
function. The function, according to whether the input is a tibble of markers (GWAS_data
) or genes (DE_data
or CAN_data
), looks for the maximum value in either the position
column (for markers) or the end
column (for genes).
head(compute_chrom_length(gwas_data), 3) head(compute_chrom_length(de_data), 3)
Next, the GWAS and DE results tibbles are filtered according to a threshold, in order to retain the significant markers or genes. This is done through the apply_threshold()
function. This function has two (rather self-explanatory) arguments: score_thr
and log2fc_thr
.
When applied to a GWAS_data
object, the function filters markers with a score above the value set with score_thr
argument (the log2fc_thr
argument is ignored), and returns an object of class GWAS_data_thr
:
dim(gwas_data) gwas_data_thr <- apply_threshold(gwas_data, score_thr = 4) class(gwas_data_thr) dim(gwas_data_thr) head(gwas_data_thr)
For a DE_data
object, the apply_threshold function filters genes based on both their score and log2(fold-change), and returns an object of class DE_data_thr
:
dim(de_data) de_data_thr <- apply_threshold(de_data, score_thr = 2, log2fc_thr = 0.5) class(de_data_thr) dim(de_data_thr) head(de_data_thr)
Finally, if applied to a CAN_data object, the apply_threshold()
function simply returns the input tibble as an object of class CAN_data_thr
:
dim(can_data) can_data_thr <- apply_threshold(can_data, score_thr = 2, log2fc_thr = 0.5) class(can_data_thr) dim(can_data_thr) head(can_data_thr)
As for the GWAS_data
, DE_data
or CAN_data
objects, the GWAS_data_thr
, DE_data_thr
and CAN_data_thr
objects are really just tibbles.
Finally, the filtered datasets are combined into a list and passed on to the create_hidecan_plot()
function, along with the tibble of chromosome length. This is the function that generates the HIDECAN ggplot
:
create_hidecan_plot( list(gwas_data_thr, de_data_thr, can_data_thr), chrom_length )
This function shares most of its arguments with the hidecan_plot()
wrapper for controlling different aspects of the plot.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.