plot_logprob: Produce GenePlots of the results from 'calc_logprob'.

View source: R/geneplot_plots.R

plot_logprobR Documentation

Produce GenePlots of the results from calc_logprob.

Description

Produce GenePlots of the results obtained from running calc_logprob, or replot the results obtained from running geneplot.

Usage

plot_logprob(
  logprob_results,
  plot_type = switch(as.character(length(refpopnames)), `2` = "twopop", "manypop"),
  plot_bars = F,
  colvec = NA,
  shapevec = NA,
  mark_impute = F,
  txt = "points",
  use_legend = T,
  legend_pos = "bottomleft",
  xyrange = NULL,
  orderpop = NULL,
  axispop = NULL,
  axis_labels = NULL,
  short_axis_labels = F,
  grayscale_quantiles = F,
  dim1 = 1,
  dim2 = 2,
  layout_already_set = F,
  cexpts = 1.4
)

Arguments

logprob_results

A data frame containing the results of the GenePlot calculations, as obtained by running geneplot or calc_logprob. See those functions for details.

plot_type

(default NULL) Can be used to specify "twopop" or "manypop" plots. Defaults to "twopop" for 2 reference pops (i.e. 2 pops listed in refpopnames in the call to geneplot or calc_logprob) and "manypop" for >2 reference pops.

plot_bars

(default FALSE) Specify what type of plot to use for >2 reference populations. FALSE (default) plots PCA of the outputs fromcalc_logprob i.e. runs PCA the log-genotype-probabilities for all the reference pops and plots two of the PCs (by default, PC1 and PC2). TRUE plots multiple bar charts, one per reference pop, with all individuals as bars, coloured according to their original pop. For the bar plots, individuals that are in one of the reference pops are ordered according to their Log-Genotype-Probability with respect to their own pop, and that ordering is then used to display them in all the other bar plots as well, so that all the bar plots show the individuals in the same order.

colvec

(default=rep(RColorBrewer::brewer.pal(12,"Paired")[c(1:10,12)], npop)[1:npop]) Vector of colours for plotting. The colours correspond to populations specified in the order of c(refpopnames, includepopnames). Thus the first element of colvec corresponds to the first element of refpopnames; the last element of colvec corresponds to the last element of includepopnames. Colours can be specified using rgb objects, hexadecimal codes, or any of the R colour names (see http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf for a PDF of R colours).

shapevec

Vector of shapes for the plotting points. These are named shapes from the following list: "Circle", "Square", "Diamond", "TriangleUp", "TriangleDown", "OpenSquare", "OpenCircle", "OpenTriangleUp", "Plus", "Cross", "OpenDiamond", "OpenTriangleDown", "Asterisk" which correspond to the following pch values for R plots: 21, 22, 23, 24, 25, 0, 1, 2, 3, 4, 5, 6, 8. Do not use the numbers, use the words, which will be automatically converted within plot_logprob into the appropriate codes. The elements of shapevec correspond to the populations specified in the order of c(refpopnames, includepopnames). Thus the first element of shapevec corresponds to the first element of refpopnames; the last element of shapevec corresponds to the last element of includepopnames. Defaults to the list above, looping through as many times as required for all the populations.

mark_impute

(default FALSE) Boolean, indicates whether to mark individuals with missing data using asterisks.

txt

(default "points") Defines whether to plot individuals as points on the GenePlot ("points"), or whether to display the name of their population ("pop"), subpopulation ("subpop"), or ID ("id") as text. For "subpop", must have 'subpop' as one of the columns in the dat input to calc_logprob. Then 'subpop' will automatically be included in the logprob_results object.

use_legend

(default TRUE) Plot the legend (or FALSE for don't plot the legend).

legend_pos

(default "bottomleft") Define where to plot the legend, uses the same position labels as in the legend function e.g. "topright".

xyrange

(default NULL) Specify the xyrange as a vector, will be the same range for both axes. Default is slightly wider than the range of the calculated Log-Genotype-Probabilities for all individuals in the plot.

orderpop

Specify the plotting order for the populations. E.g. if orderpop=c("Pop4", "Pop2"), then points for individuals from Pop4 will be plotted first, then individuals from Pop2 will be plotted over the top of them, etc. Default is NULL, in which case populations are plotted in order of size, so the population with the largest number of points is plotted at the bottom, and the population with the smallest number of individuals/points is plotted over the top, so as not to be obscured.

axispop

is used when length(refpopnames) == 2 i.e. when plot_type="twopop". It is of the form axispop=c(x="Pop1", y="Pop2"), meaning that the Pop1 reference population will be plotted on the x axis, and the Pop2 reference population will be plotted on the y axis. Default is NULL, which plots the first population in refpopnames on the x axis and the second population in refpopnames on the y axis.

axis_labels

(default NULL) Used for plots with 2 reference pops. Character vector, 2 elements, can be used to specify more readable axis labels. Defaults to the 'pop' labels in logprob_results.

short_axis_labels

(default FALSE) Used for plots with 2 reference pops. FALSE (default) gives full-length axis labels of the form "Log10 genotype probability for population Pop1" TRUE gives short-form axis labels of the form "LGP10 for population Pop1"

grayscale_quantiles

(default FALSE) Used for plots with 2 reference pops. FALSE (default) plots the quantile lines using colvec colours TRUE plots the quantile lines in gray (as the default colours can be quite pale, the grayscale quantile lines can be easier to see than the default coloured ones).

dim1

(default 1) Used for plots with more than 2 reference pops, when plot_bars=FALSE. Specifies which principal component should be plotted on x-axis.

dim2

(default 2) Used for plots with more than 2 reference pops, when plot_bars=FALSE. Specifies which principal component should be plotted on y-axis.

layout_already_set

(default=FALSE) Boolean, used for plots with more than 2 reference pops, when plot_bars=TRUE. Indicates whether the layout command, for arranging plots, has already been called by a higher-level function (or if the par(mfrow) command has been called earlier). TRUE prevents the layout command within plot_logprob_barplot from clashing with the higher-level command.

cexpts

(default 1.4) Specify the size of the points in the plot.

Value

Displays the plot.

Author(s)

Log-Genotype-Probability calculations based on the method of Rannala and Mountain (1997) as implemented in GeneClass2, updated to allow for individuals with missing data and to enable accurate calculations of quantiles of the Log-Genotype-Probability distributions of the reference populations. See McMillan and Fewster (2017) for details.

References

McMillan, L. and Fewster, R. "Visualizations for genetic assignment analyses using the saddlepoint approximation method" (2017) Biometrics.

Rannala, B., and Mountain, J. L. (1997). Detecting immigration by using multilocus genotypes. Proceedings of the National Academy of Sciences 94, 9197–9201.

Piry, S., Alapetite, A., Cornuet, J.-M., Paetkau, D., Baudouin, L., and Estoup, A. (2004). GENECLASS2: A software for genetic assignment and first-generation migrant detection. Journal of Heredity 95, 536–539.


lfmcmillan/geneplot documentation built on Nov. 27, 2024, 1:35 a.m.