plot-methods: Plotting functions In UBod/podkat: Position-Dependent Kernel Association Test

Description

Functions for visualizing association test results by means of a Manhattan plot and for visualizing genotypes

Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39``` ```## S4 method for signature 'AssocTestResultRanges,missing' plot(x, y, cutoff=0.05, which=c("p.value", "p.value.adj", "p.value.resampled", "p.value.resampled.adj"), showEmpty=FALSE, as.dots=FALSE, pch=19, col=c("darkgrey", "grey"), scol="red", lcol="red", xlab=NULL, ylab=NULL, ylim=NULL, lwd=1, cex=1, cexXaxs=1, cexYaxs=1, srt=0, adj=c(0.5, 1), ...) ## S4 method for signature 'AssocTestResultRanges,character' plot(x, y, cutoff=0.05, which=c("p.value", "p.value.adj", "p.value.resampled", "p.value.resampled.adj"), showEmpty=FALSE, as.dots=FALSE, pch=19, col=c("darkgrey", "grey"), scol="red", lcol="red", xlab=NULL, ylab=NULL, ylim=NULL, lwd=1, cex=1, cexXaxs=1, cexYaxs=1, srt=0, adj=c(0.5, 1), ...) ## S4 method for signature 'AssocTestResultRanges,GRanges' plot(x, y, cutoff=0.05, which=c("p.value", "p.value.adj", "p.value.resampled", "p.value.resampled.adj"), showEmpty=FALSE, as.dots=FALSE, pch=19, col="darkgrey", scol="red", lcol="red", xlab=NULL, ylab=NULL, ylim=NULL, lwd=1, cex=1, cexXaxs=1, cexYaxs=1, ...) ## S4 method for signature 'GenotypeMatrix,missing' plot(x, y, col="black", labRow=NULL, labCol=NULL, cexXaxs=(0.2 + 1 / log10(ncol(x))), cexYaxs=(0.2 + 1 / log10(nrow(x))), srt=90, adj=c(1, 0.5)) ## S4 method for signature 'GenotypeMatrix,factor' plot(x, y, col=rainbow(length(levels(y))), labRow=NULL, labCol=NULL, cexXaxs=(0.2 + 1 / log10(ncol(x))), cexYaxs=(0.2 + 1 / log10(nrow(x))), srt=90, adj=c(1, 0.5)) ## S4 method for signature 'GenotypeMatrix,numeric' plot(x, y, col="black", ccol="red", lwd=2, labRow=NULL, labCol=NULL, cexXaxs=(0.2 + 1 / log10(ncol(x))), cexYaxs=(0.2 + 1 / log10(nrow(x))), srt=90, adj=c(1, 0.5)) ## S4 method for signature 'GRanges,character' plot(x, y, alongGenome=FALSE, type=c("r", "s", "S", "l", "p", "b", "c", "h", "n"), xlab=NULL, ylab=NULL, col="red", lwd=2, cexXaxs=(0.2 + 1 / log10(length(x))), cexYaxs=1, frame.plot=TRUE, srt=90, adj=c(1, 0.5), ...) ```

Arguments

 `x` an object of class `AssocTestResultRanges`, `GenotypeMatrix`, or `GRanges` `y` a character string, `GRanges` object, or factor `cutoff` significance threshold `which` a character string specifying which p-values to plot; if “p.value” (default), raw p-values are plotted. Other options are “p.value.adj” (adjusted p-values), “p.value.resampled” (resampled p-values), and “p.value.resampled.adj” (adjusted resampled p-values). If the requested column is not present in the input object `x`, the function stops with an error message. `showEmpty` if `FALSE` (default), p-values of regions that did not contain any variants are omitted from the plot. `as.dots` if `TRUE`, p-values are plotted as dots/characters in the center of the genomic region. If `FALSE` (default), p-values are plotted as lines stretching from the starts to the ends of the corresponding genomic regions. `pch` plotting character used to plot a single p-value, ignored if `as.dots=FALSE`; see `points` for details. `col` plotting color(s); see details below `scol` color for plotting significant p-values (i.e. the ones passing the significance threshold) `lcol` color for plotting the significance threshold line `xlab` x axis label; if `NULL` (default) or `NA`, `plot` makes an automatic choice `ylab` y axis label; if `NULL` (default) or `NA`, `plot` makes an automatic choice `ylim` y axis limits; if `NULL` (default) or `NA`, `plot` makes an automatic choice; if user-specified, `ylim` must be a two-element numeric vector with the first element being 0 and the second element being a positive value. `lwd` line thickness; in Manhattan plots, this parameter corresponds to the thickness of the significance threshold line. When plotting genotype matrices along with continuous traits, this is the thickness of the line that corresponds to the trait. `cex` scaling factor for plotting p-values; see `points` for details. `labRow,labCol` row and column labels; set to `NA` to switch labels off; if `NULL`, rows are labeled by sample names (`rownames(x)`) and columns are labeled by variant names (`names(variantInfo(x))`). `cexXaxs,cexYaxs` scaling factors for axes labels `ccol` color of the line that plots the continuous trait along with a genotype matrix `srt` rotation angle of text labels on horizontal axis (see `text` for details); ignored if standard numerical ticks and labels are used. `adj` adjustment of text labels on horizontal axis (see `text` for details); ignored if standard numerical ticks and labels are used. `alongGenome` plot along the genome or per region (default); see details below. `type` type of plot; see `plot.default` for details. Additionally, the type “r” is available (default) which plots horizontal lines along the regions of `x`. `frame.plot` whether or not to frame the plotting area (see `plot`; default: `TRUE`) `...` all other arguments are passed to `plot`.

Details

If `plot` is called for an `AssocTestResultRanges` object without specifying the second argument `y`, a so-called Manhattan plot is produced. The x axis corresponds to the genome on which the `AssocTestResultRanges` `x` is based and the y axis shows absolute values of log-transformed p-values. The `which` argument determines which p-value is plotted, i.e. raw p-values, adjusted p-values, resampled p-values, or adjusted resampled p-values. The `cutoff` argument allows for setting a significance threshold above which p-values are plotted in a different color (see above).

The optional `y` argument can be used for two purposes: (1) if it is a character vector containing chromosome names (sequence names), it can be used for specifying a subset of one or more chromosomes to be plotted. (2) if `y` is a `GRanges` object of length 1 (if longer, `plot` stops with an error), only the genomic region corresponding to `y` is plotted.

The `col` argument serves for specifying the color for plotting insignificant p-values (i.e. the ones above the significance threshold); if the number of colors is smaller than the number of chromosomes, the vector is recycled. If `col` is a single color, all insignificant p-values are plotted in the same color. If `col` has two elements (like the default value), the insignificant p-values of different chromosomes are plotted with alternating colors. It is also possible to produce density plots of p-values by using semi-transparent colors (see, e.g., `rgb` or `hsv` for information on how to use the `alpha` channel).

If `plot` is called for a `GenotypeMatrix` object `x` and no `y` argument, the matrix is visualized in a heatmap-like fashion, where two major alleles are displayed in white, two minor alleles are displayed in the color passed as `col` argument, and the heterozygotous case (one minor, one major) is displayed in the color passed as `col` argument, but with 50% transparency. The arguments `cexYaxs` and `cexXaxs` can be used to change the scaling of the axis labels.

If `plot` is called for a `GenotypeMatrix` object `x` and a factor `y`, then the factor `y` is interpreted as a binary trait. In this case, the rows of the genotype matrix `x` are reordered such that rows/samples with the same label are plotted next to each other. Each such group can be plotted in a different color. For this purpose, a vector of colors can be passed as `col` argument.

If `plot` is called for a `GenotypeMatrix` object `x` and a numeric vector `y`, then the vector `y` is interpreted as a continuous trait. In this case, the rows of the genotype matrix `x` are reordered according to the trait vector `y` and the genotype matrix is plotted as described above. The trait `y` is superimposed in the plot in color `ccol` and with line width `lwd`. If the null model has been trained with covariates, it also makes sense to plot the genotype against the null model residuals, since these are exactly the values that the genotypes were tested against.

If `plot` is called for a `GRanges` object `x` and a character string `y`, then `plot` checks whether `x` has a metadata column named `y`. If it exists, this column is plotted against the regions in `x`. If `alongGenome` is `FALSE` (which is the default), the regions in `x` are arranged along the horizontal plot axis with equal widths and in the same order as contained in `x`. If the regions in `x` are named, then the names are used as axis labels and the argument `cexXaxs` can be used to scale the font size of the names. If `alongGenome` is `TRUE`, the metadata column is plotted against genomic positions. The knots of the curves are then positioned at the positions given in the `GRanges` object `x`. For types “s”, “S”, “l”, “p”, “b”, “c”, and “h”, knots are placed in the middle of the genomic regions contained in `x` if they are longer than one base. For type “r”, regions are plotted as lines exactly stretching between the start and end coordinates of each region in `x`.

Value

returns an invisible numeric vector of length 2 containing the y axis limits

Author(s)

Ulrich Bodenhofer bodenhofer@bioinf.jku.at

References

http://www.bioinf.jku.at/software/podkat

`AssocTestResultRanges`, `GRanges`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45``` ```## load genome description data(hgA) ## partition genome into overlapping windows windows <- partitionRegions(hgA) ## load genotype data from VCF file vcfFile <- system.file("examples/example1.vcf.gz", package="podkat") Z <- readGenotypeMatrix(vcfFile) ## plot some fraction of the genotype matrix plot(Z[, 1:25]) ## read phenotype data from CSV file (continuous trait + covariates) phenoFile <- system.file("examples/example1log.csv", package="podkat") pheno <-read.table(phenoFile, header=TRUE, sep=",") ## train null model with all covariates in data frame 'pheno' nm.log <- nullModel(y ~ ., pheno) ## perform association test res <- assocTest(Z, nm.log, windows) res.adj <- p.adjust(res) ## plot results plot(res) plot(res, cutoff=1.e-5, as.dots=TRUE) plot(res.adj, which="p.value.adj") plot(res.adj, reduce(windows[3:5]), which="p.value.adj") ## filter regions res.adj.f <- filterResult(res.adj, filterBy="p.value.adj") ## plot genotype grouped by target sel <- which(overlapsAny(variantInfo(Z), reduce(res.adj.f))) plot(Z[, sel], factor(pheno\$y)) plot(Z[, sel], residuals(nm.log), srt=45) ## compute contributions contrib <- weights(res.adj.f, Z, nm.log) contrib[[1]] ## plot contributions plot(contrib[[1]], "weight.raw") plot(contrib[[1]], "weight.contribution", type="b", alongGenome=TRUE) ```