binned_manhattan_preprocess: Preprocess GWAS Result for Binned Manhattan Plot
In leejs-abv/ggmanh: Visualization Tool for GWAS Result

View source: R/binned_manhattan_preprocess.R

binned_manhattan_preprocess

R Documentation

Preprocess GWAS Result for Binned Manhattan Plot

Description

Preprocess a result from Genome Wide Association Study before creating a binned manhattan plot. Works similar to manhattan_data_preprocess. Returns a MPdataBinned object. It can be created using a data.frame or a MPdata object. Go to details to read how to use summarise.expression.list.

Usage

binned_manhattan_preprocess(x, ...)

## Default S3 method:
binned_manhattan_preprocess(x, ...)

## S3 method for class 'MPdata'
binned_manhattan_preprocess(
  x,
  bins.x = 10,
  bins.y = 100,
  chr.gap.scaling = 0.4,
  summarise.expression.list = NULL,
  show.message = TRUE,
  ...
)

## S3 method for class 'data.frame'
binned_manhattan_preprocess(
  x,
  bins.x = 10,
  bins.y = 100,
  chr.gap.scaling = 0.4,
  signif = c(5e-08, 1e-05),
  pval.colname = "pval",
  chr.colname = "chr",
  pos.colname = "pos",
  chr.order = NULL,
  signif.col = NULL,
  preserve.position = TRUE,
  pval.log.transform = TRUE,
  summarise.expression.list = NULL,
  ...
)

## S4 method for signature 'GRanges'
binned_manhattan_preprocess(
  x,
  bins.x = 10,
  bins.y = 100,
  chr.gap.scaling = 0.4,
  signif = c(5e-08, 1e-05),
  pval.colname = "pval",
  chr.order = NULL,
  signif.col = NULL,
  preserve.position = TRUE,
  pval.log.transform = TRUE,
  summarise.expression.list = NULL,
  ...
)

Arguments

`x`	a `data.frame` or any other extension of a data frame. It can also be a `MPdata` object.
`...`	Ignored
`bins.x`	an integer. number of blocks to horizontally span the longest chromosome
`bins.y`	an integer. number of blocks to vertically span the plot
`chr.gap.scaling`	a number. scaling factor for the gap between chromosomes
`summarise.expression.list`	a list of formulas to summarise data for each bin. Check details for more information.
`show.message`	a logical. Show warning if `MPdata` directly used. Set to FALSE to suppress warning.
`signif`	a numeric vector. Significant p-value thresholds to be drawn for manhattan plot. At least one value should be provided. Default value is c(5e-08, 1e-5)
`pval.colname`	a character. Column name of `x` containing p.value.
`chr.colname`	a character. Column name of `x` containing chromosome.
`pos.colname`	a character. Column name of `x` containing position.
`chr.order`	a character vector. Order of chromosomes presented in manhattan plot.
`signif.col`	a character vector of equal length as `signif`. It contains colors for the lines drawn at `signif`. If `NULL`, the smallest value is colored black while others are grey.
`preserve.position`	a logical. If `TRUE`, the width of each chromosome reflect the number of variants and the position of each variant is correctly scaled? If `FALSE`, the width of each chromosome is equal and the variants are equally spaced.
`pval.log.transform`	a logical. If `TRUE`, the p-value will be transformed to -log10(p-value).

Details

If x is a data frame or something alike, then it creates a MPdata object first and then builds MPdataBinned S3 object.

x can also be a MPdata object. Be sure to check if thin has been applied because this can affect what's being aggregated such as number of variables in each bin.

Positions of each point relative to the plot are first calculated via manhattan_data_preprocess. Then the data is binned into blocks. bins.x indicates number of blocks allocated to the chromsome with the widest width. The number of blocks for other chromosomes is proportional to the widest chromosome. bins.y indicates the number of blocks allocated to the y-axis. The number may be slightly adjusted to have the block height end exactly at the significance threshold.

Since points are aggregated into bins, users have the choice to freely specify expressions to summarise the data for each bin through summarise.expression.list argument. This argument takes a list of two-sided formulas, where the left side is the name of the new column and the right side is the expression to calculate the column. This expression is then passed to summarise. For example, to calculate the mean, min, max of a column named beta in each bin, summarise.expression.list arument would be

# inside binned_manhattan_preprocess function
summarise.expression.list = list(
  mean_beta ~ mean(beta),
  min_beta ~ min(beta),
  max_beta ~ max(beta)
)

Value

a MPdataBinned object. This object contains necessary components for creating a binned manhattan plot.

Examples

gwasdat <- data.frame(
  "chromosome" = rep(1:5, each = 1500),
  "position" = c(replicate(5, sample(1:15000, 30))),
  "pvalue" = rbeta(7500, 1, 1)^5,
  "beta" = rnorm(7500)
)

tmp <- binned_manhattan_preprocess(
  gwasdat, pval.colname = "pvalue", chr.colname = "chromosome",
  pos.colname = "position", chr.order = as.character(1:5),
  bins.x = 10, bins.y = 50,
  summarise.expression.list = list(
    mean_beta ~ mean(beta, na.rm = TRUE),
    max_abs_beta ~ max(abs(beta), na.rm = TRUE)
  )
)

print(tmp)

leejs-abv/ggmanh documentation built on Sept. 19, 2024, 10:13 p.m.