Part 1: Overview

To search for candidate genes you need four objects.

  1. gff - the gene model position dataset
  2. markerBp - the basepair position of markers
  3. cross - the QTL cross object used to identify QTL
  4. interval - the numeric confidence interval of the QTL (chr, lower ci, upper ci)
With these data, one can cull lists of genes to those within a QTL interval

To infer the potential of a candidate gene you need at least one of 7 datasets

  1. vcf - the polymorphisms between parents, in "vcf" format. It is optimal to have this annotated by snpEff or similar.
  2. parentGeneExp - results of differential expression analysis between parents
  3. cisEQtl - a list of genes with cis-eQTL
  4. methyl - dataset containing the degree of methylation for each gene
  5. geneDescr - Gene descriptions
  6. GO - GO annotations
  7. geneExp - gene expression of the mapping population
With these data, one can infer whether a gene is likely to contain the causal QTN(s)

Part 2: Getting set up

knitr::opts_chunk$set(echo = TRUE, fig.width = 7, fig.height = 7, warning = FALSE)
library(knitr)
To start, you need the qtlTools package. Get it from github.
library(devtools)
install_github("jtlovell/qtlTools")
library(qtlTools)
Load the multitrait data from R/qtl
data(multitrait)
Create some fake physical positions of the markers allowing for low recombination in the middle of the chromosomes (as would be expected in the pericentromeric region)
map<-pullMap(multitrait)
map$bp<-0
for(i in unique(map$chr)){
  n<-sum(map$chr==i)
  p<-sin((1:n/n)*pi)
  map$bp[map$chr==i]<-cumsum(p*1000000)
}
Create a fake gff file
gff<-data.frame(chr = rep(paste0("scaffold_",1:5),each = 200),
   feature = rep("gene",1000),
   start = rep(seq(from = 0, to = max(map$bp), length = 200), 5),
   end = rep(seq(from = 0, to = max(map$bp), length = 200))+1000,
   strand = rep("+",1000),
   attribute = paste0("gene",1:1000,";","gene",1:1000,".1"), stringsAsFactors=F)

Part 3: Infer the physical position of the genes, using the position of the markers

geneCM<-findGenecM(cross = multitrait, marker.info = map, gff = gff,
   gffCols = c("chr","feature","start","end","strand","attribute"))
Plots showing the bp/cM patterns
par(mfrow=c(3,2))
for(i in unique(map$chr)){
  with(geneCM[geneCM$chr==i,], plot(pos,bp, col="grey", 
                                main = "cM and bp positions of genes and markers",
                                ylab = "physical position (bp)",
                                xlab = "mapping position (cM)"))
  with(map[map$chr==i,], points(pos,bp, col=i, pch = 19, cex=.8))
}

Part 4: Find genes in the interval

Make qtl intervals
s1<-scanone(multitrait, method="hk", pheno.col=1)
perm<-scanone(multitrait, n.perm=100, method="hk",pheno.col=1, verbose=FALSE)
cis<-calcCis(cross = multitrait, s1.output=s1, perm.output=perm, drop=5)

par(mfrow = c(1,1))
plot(s1)
segmentsOnPeaks(multitrait, s1.output=s1, calcCisOutput = cis, int.y = 13.1)
Pull out genes in the intervals
candGenes<-findGenesInterval(findGenecM.output = geneCM, calcCis.output = cis)
print(candGenes)

Part 5: next steps

There are a number of approaches to define how likely any gene is to be the candidate.
  1. Genes with non-synonymous SNPs
  2. Genes with cis-eQTL (Lowry et al. 2013, Plant Cell)
  3. Genes with annotations similar to the trait of interest
  4. Covariance of expression and trait of interest in mapping population (Lovell et al. 2015, Plant Cell)
  5. Causal Inference testing


jtlovell/qtlTools documentation built on May 20, 2019, 3:14 a.m.