knitr::opts_chunk$set(echo = TRUE, fig.width = 7, fig.height = 5)
Load example DO data from web.
For convenience, we attach package 'ggplot2' for the autoplot
function.
Functions from 'qtl2' are explicitly referenced with prefix qtl2::
.
library(qtl2ggplot) library(ggplot2)
Download 'qtl2' cross2
object.
DOex <- qtl2::read_cross2( file.path( "https://raw.githubusercontent.com/rqtl", "qtl2data/master/DOex", "DOex.zip"))
With multiple alleles, it is useful to examine an additive allele model. Download pre-calculated allele probabilities (~5 MB) as follows:
tmpfile <- tempfile() file <- paste0("https://raw.githubusercontent.com/rqtl/", "qtl2data/master/DOex/DOex_alleleprobs.rds") download.file(file, tmpfile) apr <- readRDS(tmpfile) unlink(tmpfile)
Alternatively, calculate these directly.
pr <- qtl2::calc_genoprob(DOex, error_prob=0.002) apr <- qtl2::genoprob_to_alleleprob(pr)
Genome allele scan.
scan_apr <- qtl2::scan1(apr, DOex$pheno)
Summary of peaks.
qtl2::find_peaks(scan_apr, DOex$pmap)
New summary method:
summary(scan_apr, DOex$pmap)
The basic plot of genome scan,
plot(scan_apr, DOex$pmap)
and the grammar of graphics (ggplot2
) version.
autoplot(scan_apr, DOex$pmap)
Subset to chr 2.
DOex <- DOex[,"2"] apr <- subset(apr, chr = "2")
scan_apr <- qtl2::scan1(apr, DOex$pheno)
qtl2::find_peaks(scan_apr, DOex$pmap)
plot(scan_apr, DOex$pmap)
autoplot(scan_apr, DOex$pmap)
coefs <- qtl2::scan1coef(apr, DOex$pheno)
New summary
method:
summary(coefs, scan_apr, DOex$pmap)
plot(coefs, DOex$pmap, 1:8, col = qtl2::CCcolors)
autoplot(coefs, DOex$pmap)
Plot allele effects over LOD scan.
plot(coefs, DOex$pmap, 1:8, col = qtl2::CCcolors, scan1_output = scan_apr)
autoplot(coefs, DOex$pmap, scan1_output = scan_apr, legend.position = "none")
Examine just some of the founder effects, without centering.
plot(coefs, DOex$pmap, c(5,8), col = qtl2::CCcolors[c(5,8)])
autoplot(coefs, DOex$pmap, c(5,8))
autoplot(coefs, DOex$pmap, c(5,8), facet = "geno")
plot(coefs, DOex$pmap, 4:5, col = qtl2::CCcolors[4:5], scan1_output = scan_apr)
autoplot(coefs, DOex$pmap, 4:5, scan1_output = scan_apr, legend.position = "none")
For SNP association mapping, be sure to use the genotype allele pair probabilities pr
rather than the additive model allele probabilities apr
.
Download pre-calculated genotype probabilities (~19 MB) and subset to Chr 2.
tmpfile <- tempfile() file <- paste0("https://raw.githubusercontent.com/rqtl/", "qtl2data/master/DOex/DOex_genoprobs.rds") download.file(file, tmpfile) pr <- readRDS(tmpfile) unlink(tmpfile) pr <- subset(pr, chr = "2")
Or, alternatively, calculate directly using the subsetted DOex
.
pr <- qtl2::calc_genoprob(DOex, error_prob=0.002)
Download SNP information from web.
filename <- file.path("https://raw.githubusercontent.com/rqtl", "qtl2data/master/DOex", "c2_snpinfo.rds") tmpfile <- tempfile() download.file(filename, tmpfile, quiet=TRUE) snpinfo <- readRDS(tmpfile) unlink(tmpfile)
Or alternatively, use query
function approach.
snpdb_file <- system.file("extdata", "cc_variants_small.sqlite", package="qtl2") query_variant <- qtl2::create_variant_query_func(snpdb_file) snpinfo <- query_variant("2", 96.5, 98.5)
The SNP routines in 'qtl2ggplot' can distinguish SNP variants artificially add type
to snpinfo
with about 20% DEL
to show how variants get plotted.
variants <- c("snp","indel","SV","INS","DEL","INV") snpinfo$type <- factor( sample( c(sample(variants[-1], 5000, replace = TRUE), rep("snp", nrow(snpinfo) - 5000))), variants)
Perform SNP association mapping.
It is possible to use qtl2::scan1snps
instead, which bundles these three routines,
but we want to have the SNP probabilities for later use.
snpinfo <- qtl2::index_snps(DOex$pmap, snpinfo) snppr <- qtl2::genoprob_to_snpprob(pr, snpinfo) scan_snppr <- qtl2::scan1(snppr, DOex$pheno)
Plot results.
plot(scan_snppr, snpinfo, drop_hilit = 1.5)
autoplot(scan_snppr, snpinfo, drop_hilit = 1.5)
Plot just subset of distinct SNPs
plot(scan_snppr, snpinfo, show_all_snps=FALSE, drop_hilit = 1.5)
autoplot(scan_snppr, snpinfo, show_all_snps=FALSE, drop_hilit = 1.5)
Highlight the top snps (with LOD within 1.5 of max). Show as open circles of size 1.
plot(scan_snppr, snpinfo, drop_hilit=1.5, cex=1, pch=1)
autoplot(scan_snppr, snpinfo, drop_hilit=1.5, cex=2)
SNP assocation mapping is more useful with plots that emphasized the strain distribution pattern (SDP),
which separate out SNPs based on their SDP and plot the top patterns.
For instance sdp = 52
corresponds to pattern ABDGH:CEF
. That is, the SNP genotype "AA"
resulting from qtl2::genoprob_to_snpprob
applied to pr
corresponds to any of the 36 allele pairs with the two alleles drawn from the reference (ref
) set of ABDGH
(15 pairs: AA, AB, AD, AG, AH, BB, BD, BG, BH, DD, DG, DH, GG, GH, HH
), "BB"
has two alleles from the alternate (alt
) set CEF
(6 pairs: CC, CE, CF, EE, EF, FF
), and "AB"
has one from each for the heterogeneous (het
) set (15 pairs: AC, AE, ..., HF
).
There are 255 possible sdp
s, but only a few (4 in our example) that need be examined carefully. One can think of these as a subset of markers for
genome scan, where interest is only in those SNPS following a particular sdp
; as with genome scans, we can fill in for missing data.
That is, only a few SNPs may show a particular pattern, but key differences might be seen nearby if we impute SNPs of the same pattern.
Here we highlight SDPs in SNPs within 3 of max; connect with lines.
autoplot(scan_snppr, snpinfo, patterns="all", drop_hilit=3, cex=2)
Highlight only top SDP patterns in SNPs.
autoplot(scan_snppr, snpinfo, patterns="hilit", drop_hilit=3, cex=2)
Looking at all SNPS is more useful than just focusing on mapped SNPs.
autoplot(scan_snppr, snpinfo, patterns="hilit", drop_hilit=3, cex=2, show_all_snps = FALSE)
Download Gene info for DOex from web via RDS.
filename <- file.path("https://raw.githubusercontent.com/rqtl", "qtl2data/master/DOex", "c2_genes.rds") tmpfile <- tempfile() download.file(filename, tmpfile, quiet=TRUE) gene_tbl <- readRDS(tmpfile) unlink(tmpfile)
Or alternatively use query
function approach.
dbfile <- system.file("extdata", "mouse_genes_small.sqlite", package="qtl2") query_genes <- qtl2::create_gene_query_func(dbfile, filter="(source=='MGI')") gene_tbl <- query_genes("2", 96.5, 98.5)
Plot genes. These can be aligned with the SNP association map or SDP scans.
qtl2::plot_genes(gene_tbl, xlim = c(96,99))
ggplot_genes(gene_tbl)
Plot routines (except scan patterns for now) can accommodate multiple phenotypes. At present, it is best to stick to under 10. In the preambl of this document, a second phenotype, asin
, was artifically created for illustration purposes.
Create artificial second phenotype as arcsic sqrt of first one.
DOex$pheno <- cbind(DOex$pheno, asin = asin(sqrt(DOex$pheno[,1] / 100)))
Redo genome allele scans on both phenotypes.
scan_apr <- qtl2::scan1(apr, DOex$pheno)
qtl2::find_peaks(scan_apr, DOex$pmap)
Similar summary using new summary
method:
summary(scan_apr, DOex$pmap)
plot(scan_apr, DOex$pmap, 1) plot(scan_apr, DOex$pmap, 2, add = TRUE, col = "red")
autoplot(scan_apr, DOex$pmap, 1:2)
autoplot(scan_apr, DOex$pmap, 1:2, facet="pheno", scales = "free_x", shape = "free_x")
Redo SNP scans on both phenotypes.
scan_snppr <- qtl2::scan1(snppr, DOex$pheno)
Using new summary
method.
The summary includes a range (min
and max
) for pos
, as there could be multiple SNPs across a range of positions.
summary(scan_snppr, DOex$pmap, snpinfo)
Plot results.
plot(scan_snppr, snpinfo, lodcolumn=1, cex=1, pch=1, drop_hilit = 1.5) plot(scan_snppr, snpinfo, lodcolumn=2, cex=1, pch=1, drop_hilit = 1.5)
autoplot(scan_snppr, snpinfo, 1:2, facet="pheno", drop_hilit = 1.5)
plot(scan_snppr, snpinfo, lodcolumn=1, cex=1, pch=1, show_all_snps = FALSE, drop_hilit = 1.5) plot(scan_snppr, snpinfo, lodcolumn=2, cex=1, pch=1, show_all_snps = FALSE, drop_hilit = 1.5)
autoplot(scan_snppr, snpinfo, 1:2, show_all_snps = FALSE, facet="pheno", cex=2, drop_hilit = 1.5)
Note that in the autoplot
(using qtl2ggplot
), the hilit
points for the second trait are fewer than with the plot
(using package 'qtl2'). This is because the maxlod
for the faceted autoplot
is across both traits, and the other points for the second trait are too low.
autoplot(scan_snppr, snpinfo, 2, show_all_snps = FALSE, facet="pheno", cex=2, drop_hilit = 1.5)
Distinguish high values by color but leave others gray.
autoplot(scan_snppr, snpinfo, 1:2,show_all_snps = FALSE, facet_var = "pheno", drop_hilit = 2, col=8, col_hilit=1:2, cex=2) + geom_hline(yintercept = max(scan_snppr) - 2, col = "darkgrey", linetype = "dashed")
autoplot(scan_snppr, snpinfo, 2, patterns = "all", cex=2, drop_hilit=2)
autoplot(scan_snppr, snpinfo, 1:2, patterns = "all", cex=2, facet = "pheno", drop_hilit=3)
autoplot(scan_snppr, snpinfo, 1:2, patterns = "hilit", cex=2, drop_hilit=3, facet = "pheno", scales = "free")
autoplot(scan_snppr, snpinfo, 1:2, patterns = "hilit", show_all_snps = TRUE, cex=2, drop_hilit=3, facet = "pattern")
(peaks <- qtl2::find_peaks(scan_apr, DOex$pmap, drop = 1.5))
qtl2::plot_peaks(peaks, DOex$pmap)
ggplot_peaks(peaks, DOex$pmap)
out <- listof_scan1coef(apr, DOex$pheno, center = TRUE)
New summary method:
summary(out, scan_apr, DOex$pmap)
ggplot2::autoplot(out, DOex$pmap, scales = "free")
summary(out, scan_apr, DOex$pmap)
This last section shows some very noisy images of coefficients for the 36 allele pairs. Generally, these will not be useful unless the cross is quite large. See also package 'qtl2pattern'.
QTL effects for 36 allele pair model. Note that they are quite unstable, and the 36 allele pair max LOD is far from the peak for the additive (haplotype) model. Only showing effects with at least one E
allele. Plots are truncated at +/-100 for viewability. Note also that 'qtl2ggplot' routines have some centering built in.
Find coefficients for 36 allele pair genome scan.
coefs36 <- qtl2::scan1coef(pr, DOex$pheno)
All 36 allele pair QTL effects.
plot(coefs36, DOex$pmap, 1:36, col = 1:36, ylim=c(-100,100))
autoplot(coefs36, DOex$pmap, ylim=c(-100,100), colors = NULL, legend.position = "none")
The autoplot
is centered by default (so mean across all alleles is mean of trait) to make coefficient plots easier to view. This can be turned off with the hidden center
option.
autoplot(coefs36, DOex$pmap, ylim=c(-100,100), center = FALSE, colors = NULL, legend.position = "none")
Only 8 allele pair QTL effects that contain E
.
tmp <- qtl2ggplot:::modify_object(coefs36, coefs36[, stringr::str_detect(dimnames(coefs36)[[2]], "E")]) autoplot(tmp, DOex$pmap, ylim=c(-100,100))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.