plot_region: Plot p-values in regional genomic context

Description Usage Arguments Details Value Author(s)

Description

plot_region reads p-value data e.g. from association analysis and prepares a regional plot of a given chromosomal region of interest.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
plot_region(
  region,
  region_ext = 50000,
  title = NULL,
  data = list(),
  lines_pvalue_threshold = NULL,
  variant2highlight = "centered",
  EFFECT2highlight = c(green = "splice", red = "miss", red = "frame", red =
    "start|stop"),
  recombination.rate = NULL,
  biomaRt,
  hgnc.symbols.only = TRUE,
  LNCipedia = NULL,
  gene.color.coding = c(lightgreen = "pseudogene", brown = "snRNA", forestgreen =
    "ncRNA|antisense", orange = "miRNA", darkblue = "protein_coding"),
  numberOfRowsForGenePlotting = "auto",
  plot.protein.domains = NULL,
  cex.plot = 1.1,
  cex.legend = 1,
  gene.scale.factor = 2
)

Arguments

region

Character with region of interest of form chr1:20000-30000 or chr1:20000.

region_ext

numeric with region size extension in bp to plot. Half of the extension is added to both sides of the given region.

title

character with title to be used in plot.

data

named list of dataframes containing p-vales to be plotted. Required columns of each data frame are "CHR", "POS" and "P". An additionally column "EFFECT" with functional characterisation of the locus may be given optionally.

lines_pvalue_threshold

named character with p-values to be plotted as threshold lines. Line color is given by vector names (e.g. lines_pvalue_threshold = c(blue=0.05, red=0.01)).

variant2highlight

character vector with variants to be highlighted as filled symbols. For this, an additionally column ID is required within data. If the vector contains color names or effect names, all according variants are also highlighted. Numbers >=1 are interpreted as BP position to highlight while numbers <1 are interpreted as p-value threshold with all SNPs highlighted with p < threshold. If variant2highlight = "centered", the centered SNP is highlighted if available (applicable if region is of form "chr1:20000"). Vertical lines are added for the highlighted SNPs. Omitted if NULL.

EFFECT2highlight

named character vector with regular expressions (name = color, value = regexp) in order of priority low to high. Regular expressions is case insensitive. If an EFFECT column with functional annotation is given within a dataset, variants with functional annotation corresponding to these expressions are highlighted by colors given as vector names. If no EFFECT column is given, exonic SNPs can be highlighted according to overlapping gene exons. For this, an EFFECT column is created if not yet existing and exonic as well as the respective gene biotype is appended to the entries of the EFFECT column for exonic SNPs.

recombination.rate

character with path to file or to folder containing recombination rates to be plotted. Alternatively, a dataframe object can be supplied. Omitted if NULL.

biomaRt

biomaRt object to be used for gene annotation. If NULL, biomaRt annotation is skipped.

hgnc.symbols.only

logical. If TRUE, only Ensemble genes plotted with annotated HGNC Symbol. If FALSE, non-annotated genes in the plot are labeled with Ensembl gene id if available.

LNCipedia

character with path to LNCipedia bed file to plot lncRNA genes. Omitted if NULL.

gene.color.coding

named character vector with regular expressions for gene biotype color coding (name = color, value = regexp) in order of priority low to high, i.e. if multiple biotypes available per gene, the last biotype in the vector is used. Regular expressions are case insensitive. gray = "other" is appended to the vector for all remaining biotypes not found by the reg exp.

numberOfRowsForGenePlotting

numeric number of rows used for plotting genes. If "auto", function determines appropriate number of rows itself.

plot.protein.domains

named character vector with file path to protein domain annotation data for a selected gene (Omitted if NULL). Vector names are used as gene name of the selected gene (e.g. GeneXY = "filepath_to_protein_data_of_GeneXY"). Domains are plotted as symbols below the respective gene. The protein length is scaled to length of the plotted gene. Domain positions and width are scaled accordingly. Arrows indicate the respective genomic start and stop positions for each domain. The respective txt-file may be generated by the function makeDomainsFromExons and contains the following columns:

  • BP_start: genomic start position for corresponding protein domain AA position.

  • BP_end: genomic end position for corresponding protein domain AA position.

  • feature_length: domain/feature length in AA. Used for domain scaling in the plot.

  • protein_length: total protein length in AA. Used for domain scaling in the plot.

  • domain_name_plot: domain/feature name to be plotted.

  • symbol_plot (optional): Shape to be used for plotting (either "ellipse", "rectangle" or "circle"). Default is "ellipse".

  • domain_height_extension (optional): height factor for symbol height. These factors are scaled respective to each other. Default is 1.

  • domain_color (optional): color for domain symbol and name to be plotted. Default is "black".

  • label_pos (optional): label position in the domain plot may be adjusted in case of overlapping labels (Values of 1, 2, 3 and 4 indicate positions below, left, above and right of the domain center coordinates). Default is 3.

  • assignArrows2Gene (optional): Indicate if arrows shall be plotted from genomic coordinates to protein domain (default is TRUE). May be set to FALSE for very short features within other domains, e.g. "active site".

cex.plot

numeric character extension plot axes.

cex.legend

numeric character extension plot legends.

gene.scale.factor

numeric extension factor used for gene and exon plotting.

Details

Up to 5 dataframes can be committed in data and are plotted in one diagram. If functional information for variants is available, respective variants which fulfill the regular expression in EFFECT2highlight are highlighted by color. Additionally, variants given in variant2highlight are highlighted by filled symbols, text annotation and vertical lines (e.g. for the leading SNP of interest). If given, recombination rates for that region are added to the plot using a separate y-axis. Gene information for the specified region is downloaded from biomaRt and/or LNCipedia and is plotted beneath the diagram. Genes can be selected to include corresponding protein domain data for plotting. Modified graphical parameters are resetted at the end of the function. Nevertheless, this function can not be used par(mfrow()) for multiple plots.

Value

no value returned. Figure is plotted in the current graphics device.

Author(s)

Frank Ruehle


frankRuehle/systemsbio documentation built on Sept. 14, 2020, 1:18 a.m.