plot_outlier_ideogram: Plot an Ideogram for BGC Outlier Genes

View source: R/plot_outlier_ideogram.R

plot_outlier_ideogramR Documentation

Plot an Ideogram for BGC Outlier Genes

Description

For this function to work, you must have a closely related reference genome and must run minimap2 externally (tested with v2.17). minimap2 maps assembly scaffolds to a reference genome. This includes all the unplaced assembly scaffolds as well as the transcriptome combined into one assembly fasta file. minimap2 should be run with the –cs and -N 100 options with default PAF file output format. You will need to adjust the asm value to suit how closely related your reference genome is (i.e. sequence divergence). See the minimap2 documentation for more info: https://github.com/lh3/minimap2. Then, you need to run PAFScaff on the PAF file output from minimap2 (https://github.com/slimsuite/pafscaff). PAFScaff parses the minimap2 output and improves the mapping. The scaffolds.tdt.file value is the *.scaffolds.tdt output file from PAFScaff, and you need it for this function. Ref chromosomes should be named integers in PAFScaff (1, 2, etc.). This funtion also depends on previous output from get_bgc_outliers() and join_bgc_gff(). See ?get_bgc_outliers and ?join_bgc_gff. You should run BGC and get_bgc_outliers() separately for both transcriptome-aligned data and all other scaffolded loci. Then run join_bgc_gff() on only the transcriptome-aligned data. Both are required as input to make the ideograms. You can also see https://cran.r-project.org/web/packages/RIdeogram/index.html for more info on plotting the RIdeograms.

Usage

plot_outlier_ideogram(
  prefix,
  outliers.genes,
  outliers.full.scaffolds,
  pafInfo,
  plotDIR = "./plots",
  both.outlier.tests = FALSE,
  both.outlier.tests.genes = FALSE,
  overlap.zero = TRUE,
  overlap.zero.genes = TRUE,
  qn.interval = TRUE,
  qn.interval.genes = TRUE,
  missing.chrs = NULL,
  miss.chr.length = NULL,
  gene.size = 5e+05,
  other.size = 1e+05,
  convert_svg = "pdf",
  colorset1 = c("#4575b4", "#ffffbf", "#d73027"),
  colorset2 = c("#4575b4", "#ffffbf", "#d73027"),
  chrnum.prefix = NULL,
  genes.only = FALSE,
  linked.only = FALSE
)

Arguments

prefix

Prefix for output files

outliers.full.scaffolds

List containing outlier data from get_bgc_outliers(). See ?get_bgc_outliers. This must be loci aligned to full scaffolds

pafInfo

Path to *.scaffolds.tdt file output from PAFScaff

plotDIR

Directory to save output plots

both.outlier.tests

Boolean; If TRUE, scaffold outliers must meet both the overlap.zero and qn.interval criteria

both.outlier.tests.genes

Boolean; If TRUE, gene outliers must meet both the overlap.zero.genes and qn.interval.genes criteria

overlap.zero

Boolean; If TRUE, scaffold outliers are SNPs whose credible interval does not contain zero

overlap.zero.genes

Boolean; If TRUE, gene outliers are SNPs whose credible interval does not contain zero

qn.interval

Boolean; If TRUE, scaffold outliers fall outside the quantile interval qn/2 and 1-qn/2

qn.interval.genes

Boolean; If TRUE, gene outliers fall outside the quantile interval qn/2 and 1-qn/2

missing.chrs

If specified, must be character vector of missing chromosome names. Chromosome numbers should be prefixed with "chr". I.e., c("chr3", "chr6"). If some chromosomes don't get plotted, use this option

gene.size

Adjust the size for each outlier transcriptome gene on the ideogram. If the loci appear too small or large on the ideogram, adjust just this parameter

other.size

Adjust the size for each outlier scaffold gene on the ideogram

convert_svg

Device to convert SVG output plot. Default is pdf, but you can use png or other commonly used devices

colorset1

Vector of colors for RIdeogram alpha heatmap. Default is the same as the RIdeogram defaults

colorset2

Vector of colors for RIdeogram beta heatmap. Default is the same as the RIdeogram defaults

chrnum.prefix

Prefix for chromosome numbers on ideaogram plot

genes.only

Boolean; If TRUE, only include known genes on ideogram

linked.only

Boolean; If TRUE, only include non-genes on ideogram

outliers.genes.annotated

List containing gene outlier data from get_bgc_outliers(). See ?get_bgc_outliers. This must be outliers from a transcriptome alignment

missing.chr.length

Vector of integer lengths (in bp) of missing chromosomes. Must also be specified if missing.chrs is used. Vector must also be the same length as missing.chrs

Details

Function to plot outlier BGC loci as heatmaps on chromosome ideograms.

Value

Data.frame containing reference info for gene outliers.

Examples

plot_outlier_ideogram(prefix = "population1",
                      outliers.genes.annotated = outliers.genes.annotated,
                      outliers.full.scaffolds = outliers.full.scaffolds,
                      pafInfo = "./population1.scaffolds.tdt",
                      plotDIR = "./plots",
                      both.outlier.tests = TRUE,
                      missing.chrs = c("chr11", "chr21", "chr25"),
                      miss.chr.length = c(4997863, 1374423, 1060959),
                      gene.size = 1e6,
                      other.size = 5e5,
                      convert_svg = "png",
                      colorset1 = c("#4575b4", "#ffffbf", "#d73027"),
                      colorset2 = c("#fc8d59", "#ffffbf", "#91bfdb")

btmartin721/ClineHelpR documentation built on Oct. 15, 2024, 5:05 a.m.