vcf2eqtl: Function to detect eQTLs using vcf and expression matrix from...

Description Usage Arguments Details Value Author(s) References Examples

View source: R/vcf2eqtl.R

Description

A simple pipeline that takes a freebayes vcf file of biallelic SNPs and an expression matrix in DESeq format and uses two different tests to detect eQTLs. First, we test for allelic imbalance in heterozygous individuals by fitting a beta- binomial distribution for imbalance at each locus and using a likelihood ratio test for deviation from 50-50. Next, we use limma/voom to log transform and calculate precision weights for expression values and use lm to test for associations between genotype and expression. P values are calculated using Stouffer's method. Fst and Fst outliers (OutFLANK), Differential expression between populations (DESeq2), and the proportion of population difference in expression explained by eQTLs can be calculated.

Usage

1
vcf2eqtl(vcf, expr, pops = NULL, minHet = 3, mc.cores = 1, alpha = 0.05, calculateFst = T, testDE = F, all3 = T, hweFilter = T, hweAlpha = 0.05, covariates = NULL, propExplained = T, withinPop = T, transcripts = NULL, keepSamples = NULL)

Arguments

vcf

A freebayes-generated vcf of biallelic SNPs from RNA-Seq reads mapped to a transcriptome reference

expr

An expression matrix, same format as DESeq. Rows are contigs, Columns are samples (same order as in vcf), rownames are contig names

pops

A factor vector of which population each sample belongs to

minHet

Minimum number of heterozygotes to test for allelic imbalance

mc.cores

Number of cores to use for multithreading

alpha

FDR for calling eQTLs and differential expression

calculateFst

Boolean; do you want to calculate Fst?

testDE

Boolean; test for differential expression between pops? Uses DeSeq2 LRT

all3

Boolean; require all three genotypes to test for eQTL at a site?

hweFilter

Boolean; filter out sites out of Hardy-Weinberg equilibrium?

hweAlpha

HWE P value in at least one population to filter out site

covariates

Matrix of covariates to include in association test

propExplained

Boolean; calculate proportion of population difference in expression explained by eQTL?

withinPop

Boolean; do association tests within populations to account for population structure?

transcripts

Boolean; list of transcripts that each site falls into; useful if you mapped to a genome. However, if you mapped to a genome, better tools may exist (e.g. WASP) to test for eQTLs.

keepSamples

Vector of sample names to retain for analysis, assumes that your pops vector corresponds to post-filtered samples. Does not change sample order for retained samples.

Details

This function is useful for finding eQTLs among biallelic SNPs in RNA-Seq data mapped to a transcriptome reference, and is most useful for less-resouced non-model organisms. If you have a good genome and phased data, more powerful methods like WASP will likely outperform this function.

Value

res

data frame of SNPs with eQTL status and other info (as specified) like Fst etc

snpContigExpr

voom-transformed expression for contig associated with each test SNP

genos

012 matrix of genotypes

AImat

matrix of allele observations for ref and alt within heterozygotes

AIdat

vector of which columns in AImat are ref and alt

globalFst

globalFst estimate, if calculated

pops

population vector

DEres

results from DESeq2, if calculated

Author(s)

Noah Rose

References

Genomic responses to environmental variation and change in the ocean. PhD Dissertation, Noah Rose, Stanford University.

Examples

1
2
3
4
5
expr <- read.delim('expr.txt',row.names=1)
vcf <- 'biallelic_snps.vcf'
metadata<-read.delim('meta.txt')
pops=metadata$pops
vcf2eqtl(vcf,expr,pops)

noahrose/vcf2eqtl documentation built on May 23, 2019, 9:30 p.m.