pairsGBSR: Draw a scatter plot of a pair of specified statistics

View source: R/PlotFunctions.R

pairsGBSRR Documentation

Draw a scatter plot of a pair of specified statistics

Description

Draw a scatter plot of a pair of specified statistics

Usage

pairsGBSR(
  x,
  stats1 = "dp",
  stats2 = "missing",
  target = "marker",
  size = 0.5,
  alpha = 0.8,
  color = c(Marker = "darkblue", Sample = "darkblue"),
  fill = c(Marker = "skyblue", Sample = "skyblue"),
  smooth = FALSE
)

Arguments

x

A GbsrGenotypeData object.

stats1

A string to specify statistics to be drawn.

stats2

A string to specify statistics to be drawn.

target

Either or both of "marker" and "sample", e.g. target = "marker" to draw a histogram only for SNPs.

size

A numeric value to specify the dot size of a scatter plot.

alpha

A numeric value [0-1] to specify the transparency of dots in a scatter plot.

color

A named vector "Marker" and "Sample" to specify border color of bins in the histograms.

fill

A named vector "Marker" and "Sample" to specify fill color of bins in the histograms.⁠stats = "geno⁠ only requires "Ref", "Het" and "Alt", while others uses the value named "Marker".

smooth

A logical value to indicate whether draw a smooth line for data points. See also ggplot2::stat_smooth().

Details

You can draw a scatter plot of per-marker and/or per-sample summary statistics specified at stats1 and stats2. The "stats1" and "stats2" arguments can take the following values:

  • "missing""Proportion of missing genotype calls.",

  • "het""Proportion of heterozygote calls.",

  • "raf""Reference allele frequency.",

  • "dp""Total read counts.",

  • "ad_ref""Reference allele read counts.",

  • "ad_alt""Alternative allele read counts.",

  • "rrf""Reference allele read frequency.",

  • "mean_ref""Mean of reference allele read counts.",

  • "sd_ref""Standard deviation of reference allele read counts.",

  • "median_ref""Quantile of reference allele read counts.",

  • "mean_alt""Mean of alternative allele read counts.",

  • "sd_alt""Standard deviation of alternative allele read counts.",

  • "median_alt""Quantile of alternative allele read counts.",

  • "mq""Mapping quality.",

  • "fs""Phred-scaled p-value (strand bias)",

  • "qd""Variant Quality by Depth",

  • "sor""Symmetric Odds Ratio (strand bias)",

  • "mqranksum""Alt vs. Ref read mapping qualities",

  • "readposranksum""Alt vs. Ref read position bias",

  • "baseqranksum""Alt Vs. Ref base qualities",

To draw scatter plots for "missing", "het", "raf", you need to run countGenotype() first to obtain statistics. Similary, "dp", "ad_ref", "ad_alt", "rrf" requires values obtained via countRead(). "mq", "fs", "qd", "sor", "mqranksum", "readposranksum", and "baseqranksum" only work with target = "marker", if your data contains those values supplied via SNP calling tools like GATK.

Value

A ggplot object.

Examples

# Load data in the GDS file and instantiate a [GbsrGenotypeData] object.
gds_fn <- system.file("extdata", "sample.gds", package = "GBScleanR")
gds <- loadGDS(gds_fn)

# Summarize genotype count information to be used in `pairsGBSR()`
gds <- countGenotype(gds)

# Draw scatter plots of missing rate vs heterozygosity.
pairsGBSR(gds, stats1 = "missing", stats2 = "het")

# Close the connection to the GDS file
closeGDS(gds)



tomoyukif/GBScleanR documentation built on April 27, 2024, 9:06 a.m.