manhattan_data_preprocess: Preprocess GWAS Result

View source: R/manhattan_data_preprocess.R

manhattan_data_preprocessR Documentation

Preprocess GWAS Result

Description

Preprocesses a result from Genome Wide Association Study before making a manhattan plot. It accepts a data.frame, which at bare minimum should contain a chromosome, position, and p-value. Additional options, such as chromosome color, label column names, and colors for specific variants, are provided here.

Usage

manhattan_data_preprocess(x, ...)

## Default S3 method:
manhattan_data_preprocess(x, ...)

## S3 method for class 'data.frame'
manhattan_data_preprocess(
  x,
  chromosome = NULL,
  signif = c(5e-08, 1e-05),
  pval.colname = "pval",
  chr.colname = "chr",
  pos.colname = "pos",
  highlight.colname = NULL,
  chr.order = NULL,
  signif.col = NULL,
  chr.col = NULL,
  highlight.col = NULL,
  preserve.position = FALSE,
  thin = NULL,
  thin.n = 1000,
  thin.bins = 200,
  pval.log.transform = TRUE,
  chr.gap.scaling = 1,
  ...
)

## S4 method for signature 'GRanges'
manhattan_data_preprocess(
  x,
  chromosome = NULL,
  signif = c(5e-08, 1e-05),
  pval.colname = "pval",
  highlight.colname = NULL,
  chr.order = NULL,
  signif.col = NULL,
  chr.col = NULL,
  highlight.col = NULL,
  preserve.position = FALSE,
  thin = NULL,
  thin.n = 100,
  thin.bins = 200,
  pval.log.transform = TRUE,
  chr.gap.scaling = 1,
  ...
)

Arguments

x

a data frame or any other extension of data frame (e.g. a tibble). At bare minimum, it should contain chromosome, position, and p-value.

...

Additional arguments for manhattan_data_preprocess.

chromosome

a character. This is supplied if a manhattan plot of a single chromosome is desired. If NULL, then all the chromosomes in the data will be plotted.

signif

a numeric vector. Significant p-value thresholds to be drawn for manhattan plot. At least one value should be provided. Default value is c(5e-08, 1e-5)

pval.colname

a character. Column name of x containing p.value.

chr.colname

a character. Column name of x containing chromosome.

pos.colname

a character. Column name of x containing position.

highlight.colname

a character. If you desire to color certain points (e.g. significant variants) rather than color by chromosome, you can specify the category in this column, and provide the color mapping in highlight.col. Ignored if NULL.

chr.order

a character vector. Order of chromosomes presented in manhattan plot.

signif.col

a character vector of equal length as signif. It contains colors for the lines drawn at signif. If NULL, the smallest value is colored black while others are grey.

chr.col

a character vector of equal length as chr.order. It contains colors for the chromosomes. Name of the vector should match chr.order. If NULL, default colors are applied using RColorBrewer.

highlight.col

a character vector. It contains color mapping for the values from highlight.colname.

preserve.position

a logical. If TRUE, the width of each chromosome reflect the number of variants and the position of each variant is correctly scaled? If FALSE, the width of each chromosome is equal and the variants are equally spaced.

thin

a logical. If TRUE, thinPoints will be applied. Defaults to TRUE if chromosome is NULL. Defaults to FALSE if chromosome is supplied.

thin.n

an integer. Number of max points per horizontal partitions of the plot. Defaults to 1000.

thin.bins

an integer. Number of bins to partition the data. Defaults to 200.

pval.log.transform

a logical. If TRUE, the p-value will be transformed to -log10(p-value).

chr.gap.scaling

scaling factor for gap between chromosome if you desire to change it. This can also be set in manhattan_plot

Details

manhattan_data_preprocess gathers information needed to plot a manhattan plot and organizes the information as MPdata S3 object.

New positions for each points are calculated, and stored in the data.frame as "new_pos". By default, all chromosomes will have the same width, with each point being equally spaced. This behavior is changed when preserve.position = TRUE. The width of each chromosome will scale to the number of points and the points will reflect the original positions.

chr.col and highlight.col, maps the data values to colors. If they are an unnamed vector, then the function will try its best to match the values of chr.colname or highlight.colname to the colors. If they are a named vector, then they are expected to map all values to a color. If highlight.colname is supplied, then chr.col is ignored.

While feeding a data.frame directly into manhattan_plot does preprocessing & plotting in one step. If you plan on making multiple plots with different graphic options, you have the choice to preprocess separately and then generate plots.

Value

a MPdata object. This object contains all the necessary components for constructing a manhattan plot.

Examples


gwasdat <- data.frame(
  "chromosome" = rep(1:5, each = 30),
  "position" = c(replicate(5, sample(1:300, 30))),
  "pvalue" = rbeta(150, 1, 1)^5
)

  manhattan_data_preprocess(
  gwasdat, pval.colname = "pvalue", chr.colname = "chromosome", pos.colname = "position",
  chr.order = as.character(1:5)
)


leejs-abv/ggmanh documentation built on Sept. 19, 2024, 10:13 p.m.