plot_manhattan: Generate a manhattan plot from snpRdata or a data.frame.

View source: R/plotting_functions.R

plot_manhattanR Documentation

Generate a manhattan plot from snpRdata or a data.frame.

Description

Creates a ggplot-based manhattan plot, where chromosomes/scaffolds/etc are concatenated along the x-axis. Can optionally highlight requested SNPs or those that pass an arbitrary significance threshold and facet plots by defined sample-specific variables such as population.

Usage

plot_manhattan(
  x,
  plot_var,
  window = FALSE,
  facets = NULL,
  chr = "chr",
  bp = "position",
  snp = NULL,
  chr.subfacet = NULL,
  sample.subfacet = NULL,
  significant = NULL,
  suggestive = NULL,
  highlight = "significant",
  highlight_style = "label",
  sig_below = FALSE,
  log.p = FALSE,
  abs = FALSE,
  viridis.option = "plasma",
  viridis.hue = c(0.2, 0.5),
  t.sizes = c(16, 12, 10),
  colors = c("black", "slategray3"),
  rug_data = NULL,
  rug_style = "point",
  rug_label = NULL,
  rug_alpha = 0.3,
  rug_thickness = ggplot2::unit(ifelse(rug_style == "point", 0.03, 6), "npc"),
  lambda_gc_correction = FALSE,
  chr_order = NULL,
  abbreviate_labels = FALSE,
  simplify_output = FALSE
)

Arguments

x

snpRdata or data.frame object containing the data to be plotted.

plot_var

character. A character string naming the statistic to be plotted. For snpRdata, these names correspond to any previously calculated statistics.

window

logical, default FALSE. If TRUE, sliding window averages will instead be plotted. These averages must have first been calculated with calc_smoothed_averages. Ignored if x is a data.frame.

facets

character or NULL, default NULL. Facets by which to break plots, as described in Facets_in_snpR. For non-window stats, the any snp metadata facets will be ignored. Ignored if x is a data.frame.

chr

character, default "chr". Column in either snp metadata or x (for snpRdata or data.frame objects, respectively) which defines the "chromosome" by which SNP positions will be concatenated along the x-axis. If window = TRUE and a snpRdata object, this will be ignored in favor of the SNP specific facet provided to the facets argument.

bp

character, default "bp". Column in either snp metadata or x (for snpRdata or data.frame objects, respectively) which defines the position in bp of each SNP.

snp

character, default NULL. Column in either snp metadata or x (for snpRdata or data.frame objects, respectively) containing snpIDs to use for highlighting. Ignored if no highlighting is requested.

chr.subfacet

character, default NULL. Specific chromosomes to plot. See examples.

sample.subfacet

character, default NULL. Specific sample-specific levels of the provided facet to plot. If x is a data.frame, this can refer to levels of a column titled "subfacet". See examples.

significant

numeric, default NULL. Value at which a line will be drawn designating significant SNPs. If highlight = "significant", SNPs above this level will also be labeled.

suggestive

numeric, default NULL. Value at which a line will be drawn designating suggestive SNPs. If highlight = "suggestive", SNPs above this level will also be labeled.

highlight

character, numeric, or FALSE, default "significant". Controls SNP highlighting. If either "significant" or "suggestive", SNPs above those respective values will be highlighted. If a numeric vector, SNPs corresponding to vector entries will be highlighted. See details.

highlight_style

character, default "label". Highlighting options:

  • label: labels with chr and position.

  • color: Color (word or hex) to color points by.

sig_below

logical, default FALSE. If TRUE, treats values lower than the significance threshold as significant.

log.p

logical, default FALSE. If TRUE, plot variables and thresholds will be transformed to -log.

abs

logical, default FALSE. If TRUE, converts the plot variable to it's absolute value.

viridis.option

character, default "plasma". Viridis color scale option to use for significance lines and SNP labels. See scale_gradient for details.

viridis.hue

numeric, default c(0.2, 0.5). Two values between 0 and 1 listing the hues at which to start and stop on the viridis palette defined by the viridis.option argument. Lower numbers are darker.

t.sizes

numeric, default c(16, 12, 10). Text sizes, given as c(strip.title, axis, axis.ticks).

colors

character, default c("black", "slategray3"). Colors to alternate across chromosomes.

rug_data

data.frame or tbl, default NULL. Data to plot as a rug below the manhattan plot containing columns named to match the chr argument and either the bp argument OR columns named start and end as well as, optionally, a column named to match the rug_label column. Useful for labeling the locations of candidate genes, for example.

rug_style

character, default "point". Options for the style of the rug, ignored if rug_data is not provided. Options:

  • point: standard rug plot with vertical dashes below the plot at the indicated locations. If start and end points are supplied in rug_data, the midpoint will be plotted.

  • ribbon: Ribbons for each point drawn below the plot, from the start to end columns. If the plotted range of x is very large (as in whole-genome or reduced representation sequencing), these may not be visible. A warning will be provided if this may be the case. Sub-setting the input and rug data to a range of interest may help in this case.

rug_label

character, default NULL. Names of additional labeling columns in rug_data, ignored if rug_data is not provided. These will not be directly plotted (since the result is often very messy), but are available as aesthetics in the resulting plot, which can then be examined if something like the ggplotly function from plotly is used. This may change in the future if a clean plotting technique is suggested.

rug_alpha

numeric between 0 and 1, default 0.3. Alpha (transparency) applied to a ribbon-style rug. Ignored if rug_data is not provided or the rug_style is not ribbon.

rug_thickness

numeric or grid-style unit, default ggplot2::unit(ifelse(rug_style == "point", 0.03, 6), "npc"). The height of the rug lines (if rug_style = "point") or ribbon (if rug_style = "ribbon"). Ignored if rug_data is not provided. Use of the unit style of size choice recommended to avoid over-plotting.

lambda_gc_correction

Correct for inflated significance due to population and/or family structure using the \gamma_{GC} approach described in Price et al 2010.

chr_order

character, default NULL. If provided, an ordered vector of chromosome/scaffold/etc names by which to sort output.

abbreviate_labels

numeric or FALSE, default FALSE. If a numeric value, x-axis chromosome names will be abbreviated using abbreviate, with each abbreviated label having the minimum length specified. Helpful when chromosome/scaffold/etc names are very long.

simplify_output

If TRUE, only the ggplot object will be return. This is optimal, since the data is already returned in that object, but is not the default due to backwards consistency with old code.

Details

Unlike most snpR functions, this function works with either a snpRdata object or a data.frame. For snpRdata objects snp-specific or sliding window statistics can be plotted. In both cases, the facet argument can be used to define facets to plot, as described in Facets_in_snpR. For typical stats, name of the snp meta-data column containing chromosome/scaffold information must be supplied to the "chr" argument. For windowed stats, chr is instead inferred from the snp-specific facet used to create the smoothed windows. In both cases, the requested facets must exactly match those used to calculate statistics! If x is a data frame, the "chr" argument must also be given, and the "facets" argument will be ignored.

A column defining the position of the SNP within the chromosome must be provided, and is "position" by default.

Specific snp and chr levels can also be requested using the chr.subfacet and sample.subfacet arguments. See examples. For data.frames, sample.subfacets levels must refer to a column in x titled "subfacet".

Specific snps can be highlighted and annotated. If a significance level is requested, SNPs above this level will be highlighted by default. SNPs above the suggestive line can also be highlighted by providing "suggestive" to the highlight argument. Alternatively, individual SNPs can be highlighted by providing a numeric vector. For snpR data, this will correspond to the SNP's row in the snpRdata object. For data.frames, it will correspond to a ".snp.id" column if it exists, and the row number if not. The label for highlighted SNPs will be either chr_bp by default or given in the column named by the "snp" argument.

Value

A list containing

  • plot: A ggplot manhattan plot.

  • data: Raw plot data.

If simplify_output is FALSE, only the ggplot object is returned.

Author(s)

William Hemstrom

References

Price, A., Zaitlen, N., Reich, D. et al. New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11, 459–463 (2010). https://doi.org/10.1038/nrg2813

Examples

# add a dummy phenotype and run an association test.
x <- stickSNPs[pop = c("ASP", "SMR"), chr = c("groupIX", "groupIV")]
sample.meta(x)$phenotype <- sample(c("A", "B"), nsamps(x), TRUE)
x <- calc_association(x, response = "phenotype", method = "armitage")
plot_manhattan(x, "p_armitage_phenotype", chr = "chr",
               log.p = TRUE)$plot


# other types of stats:
# make some data
x <- calc_basic_snp_stats(x, "pop.chr", sigma = 200, step = 50)

# plot pi, breaking apart by population, keeping only the groupIX
# and the ASP population, with
# significant and suggestive lines plotted and SNPs
# with pi below the significance level labeled.
plot_manhattan(x, "pi", facets = "pop",
               chr = "chr", chr.subfacet = "groupIX",
               sample.subfacet = "ASP",
               significant = 0.05, suggestive = 0.15, sig_below = TRUE)$plot

# plot FST for the ASP/SMR comparison across all chromosomes,
# labeling the first 10 SNPs in x (by row) with their ID
# Note that since this is thie ony comparison, we don't actually need to
# specify it.
plot_manhattan(x, "fst", facets = "pop.chr",
               sample.subfacet = "ASP~SMR", highlight = 1:10,
               chr = "chr", snp = ".snp.id")$plot

# plot sliding-window FST between ASP and SMR
# and between OPL and SMR
plot_manhattan(x, "fst", window = TRUE, facets = c("pop.chr"),
               chr = "chr", sample.subfacet = "ASP~SMR",
               significant = .29, suggestive = .2)$plot

# plot using a data.frame,
# using log-transformed p-values
## grab data
y <- get.snpR.stats(x, "pop", stats = "hwe")$single
## plot
plot_manhattan(y, "pHWE", facets = "subfacet", chr = "chr",
               significant = 0.0001, suggestive = 0.001,
               log.p = TRUE, highlight = FALSE)$plot



# plot with a rug
rug_data <- data.frame(chr = c("groupIX", "groupIV"), start = c(0, 1000000),
                       end = c(5000000, 6000000), gene = c("A", "B"))

# point style, midpoints plotted
plot_manhattan(x, "p_armitage_phenotype", chr = "chr",
               log.p = TRUE, rug_data = rug_data)

# ribbon style
plot_manhattan(x, "p_armitage_phenotype", chr = "chr",
               log.p = TRUE, rug_data = rug_data, rug_style = "ribbon")
               
# with plotly to mouse over information
## Not run: 
plotly::ggplotly(plot_manhattan(x, "p_armitage_phenotype", chr = "chr",
                                log.p = TRUE, rug_data = rug_data, 
                                rug_style = "ribbon", 
                                rug_label = "gene")$plot)


## End(Not run)

hemstrow/snpR documentation built on March 20, 2024, 7:03 a.m.