gene_map_plot: Plot a gene map

View source: R/gene_map_plot.R

gene_map_plotR Documentation

Plot a gene map

Description

Uses base positions information to make a linear plot of genetic features to produce a gene map. Separates into "gene features" (plotted as large blocks on the main chromosome) and "extra features" (plotted as small bars offset from the main chromosome).

Usage

gene_map_plot(
  mapDT,
  genome_len = NULL,
  gene_colour = NULL,
  gene_type = c("gene", "rRNA"),
  extra_type = c("tRNA", "D-loop"),
  plot_xmin = 0,
  plot_xmax = NULL,
  plot_ymax = 5,
  extra_ypos = 3,
  gene_txt_size = 4,
  extra_txt_size = 4,
  font = "Arial",
  gene_border = NA,
  gene_size = 2
)

Arguments

mapDT

Data.table: genetic feature information. Requires the columns:

  1. $NAME = Character, the name of the genetic feature.

  2. $TYPE = Character, the type of genetic feature.

  3. $STRAND = Integer, the starnd, either 1 or -1.

  4. $START = Integer, the starting base position.

  5. $END = Integer, the ending base position.

genome_len

Integer: the genome length. Default = NULL. If unspecified, will be assigned the final base pair of the last genetic feature in mapDT.

gene_colour

Character: a vector of colours to plot genes. Each item is a colour, with the gene accessible through names(gene_colour). See Details for parameterisation.

gene_type

Character: a vector of values present in mapDT$TYPE that will be plotted as large coloured bars. For plotting purposes, "gene features". Default = c('gene', 'rRNA'). See Details for parameterisation.

extra_type

Character: a vector of values present in mapDT$TYPE that will be plotted as small grey bars. For plotting purposes, "extra features". Default = c('tRNA', 'D-loop'). See Details for parameterisation.

plot_xmin

Numeric: a single value, the minimum x-axis limit. Default is 0.

plot_xmax

Numeric: a single value, the maximum x-axis limit. Default is the genome length, as per genome_len.

plot_ymax

Numeric: a single value, the maximum y-axis limit. See Details for parameterisation.

extra_ypos

Numeric: a single value, the starting y-axis position for extra features. See Details for parameterisation.

gene_txt_size

Integer: a single value, the size for gene feature labels. Default is 4.

extra_txt_size

Integer: a single value, the size for extra feature labels. Default is 4.

font

Character: a single value, the font family to use. Default is 'Arial'.

gene_border

Character: a single value, the colour for borders around gene features. Default is NA, no border.

gene_size

Numeric: a single value, the thickness of borders around gene features, if a colour if specified in gene_border. Default is 1.

Details

There are two major features plotted, "gene features" and "extra features". These names are just for convention: gene features are plotted as large coloured bars in center of the plot on the main "chromosome", whereas extra features are plotted as small grey bars above/below the gene features, offset from the main chromosome. Anything could be plotted as a gene or extra feature, and these are specified through gene_type and extra_type.

The name of the genetic feature being plotted is the value of mapDT$NAME. This value is effectively evaluated as a mathematical expression to allow italics for gene names and mixed formatting in gene names. The internal function call is the evaluation of values by geom_text(..., parse=TRUE) and geom_text_repel(..., parse=TRUE).

The value of mapDT$STRAND dictates the position of the coloured bars. A value of 1 places "genes" on the top of the genomic strand, whereas a value of -1 places "genes" below the genomic strand.

The colour of the gene features is specified through gene_colour as a named vector. If there are two genes, 'COX1' and 'COX2', specification of their colours can be done like so: c(COX1='pink', COX2='blue'). If colours are not specified, one colour is automatically assigned to each unique "gene".

The value of extra_ypos specifies that distance of the extra features from the gene features. Set larger if things are looking squashed. Additionally, plot_ymax sets the maximal plotting area, so set this value larger if things are not fitting well.

Value

Returns a gg object.

Examples

library(genomalicious)

# Create a link to raw external datasets in genomalicious
genomaliciousExtData <- paste0(find.package('genomalicious'), '/extdata')

# Read in a GENBANK file of the Bathygobius cocosensis mitogenome
gbk.read <- mitoGbk2DT(paste(genomaliciousExtData, 'data_Bcocosensis.gbk', sep='/'))
head(gbk.read)

# Subset out the "CDS" types and plot genes, rRNA, tRNA, and D-loop.
# Rename rRNAs for nicer plotting. Because $NAME is evaluated by the
# expression() function, it is useful to put single quotations around characters
to have them read as characters internally by gene_map_plot().
gbk.read[TYPE!='CDS'] %>%
.[NAME=='12S ribosomal RNA', NAME:='12S'] %>%
.[NAME=='16S ribosomal RNA', NAME:='16S'] %>%
.[, NAME:=paste0("'", NAME, "'")] %>%
gene_map_plot(mapDT=., genome_len=16692, extra_txt_size=3)

# Plot just the COX genes and the D-loop as "gene features" with
# custom colours and a border. Again, not the use of single quotes nested in
double quotes, which will match up to the edited gene $NAME column below.
gene.col.vec <- c(
"'COX1'"='royalblue',
"'COX2'"='firebrick3',
"'COX3'"='mediumpurple2',
"'CYTB'"='plum3',
"'D-loop'"='grey40')

# Subset focal genes, add quotes to ensure characters are parsed as characters.
gbk.read[NAME %in% c('COX1','COX2','COX3','CYTB','D-loop')] %>%
.[, NAME:=paste0("'", NAME, "'")] %>%
  gene_map_plot(
    mapDT=., genome_len=16692,
    gene_type=c('gene', 'D-loop'), gene_colour=gene.col.vec,
    extra_type=NULL, gene_border='black')

# It is possible to parse characters without the double quotes, but note how
# the '-' character in 'D-loop' has been parsed as a minus symbol.
gbk.read[NAME %in% c('COX1','COX2','COX3','CYTB','D-loop')] %>%
  gene_map_plot(
    mapDT=., genome_len=16692, gene_type=c('gene', 'D-loop'),
    extra_type=NULL, gene_border='black')


j-a-thia/genomalicious documentation built on Oct. 19, 2024, 7:51 p.m.