GeneFeST: Bayesian calculation of gene-specific FST from genomic SNP...

Description Usage Arguments Details Value References Examples

View source: R/slim.R

Description

The method is based on the work from Beaumont and Balding (2004) where they introduce a FST-based hierarchical Bayesian model to detect loci that are subject to selection. In this Bayesian approach they use a logistic regression model to distinguish between locus-specific effects like selection and population-specific effects which are shared by all loci (e.g effects caused by migration rates) (Riebler, 2008). Foll and Gaggiotti (2008) extended this work using a reversible jump MCMC (Green, 1995) which enables testing the hypothesis that a locus is subject to selection; a very similar approach was developed in parallel by Riebler & Stefan (2008). The method is implemented in a software named BayeScan (http://cmpg.unibe.ch/software/BayeScan/). The new method introduced here is a modification of BayeScan (see details).

Usage

1
2
GeneFeST(input,GROUP=FALSE,nb.pilot=20,pilot.runtime=500,main.runtime=5000,
type=1,only.pilots=FALSE,h.average.P=0.2,h.step.width=1,mcmc.diag=FALSE,h=TRUE)

Arguments

input

textfile or an R-object returned from getBayes() provided by the R-package PopGenome

GROUP

SNP groups

nb.pilot

number of pilot runs

pilot.runtime

length of pilot runs

main.runtime

length of main runs

type

1: one alpha one group, 2: most extrem alpha to label group

only.pilots

only pilot runs are performed

h.average.P

the expected probability that a gen is under selection

h.step.width

step width of the heuristic, see details !

mcmc.diag

the input for the R-package coda is returned after the pilot runs, Note: 'coda' is required !

h

if h=FALSE the h.average.P should be set like the prior.odds variable suggested by Foll & Gaggiotti

Details

Type of measurenments:

type 1:
Our type 1 method considers all SNPs separately but is restricted to generate exactly one alpha for each gene (or group of SNPs). This approach assumes that all SNPs observed in one gene share the same genetic effect. The type 1 method is default !

type 2:
To loosen up the condition that all SNPs in a gene are forced by the same selective pressure we provide an alternative approach which considers all SNPs separately, but sets no restrictions on the alphas of each SNP in a group. Basically this is exactly what BayeScan does in case of SNP data (Foll and Gaggiotti, 2008). However, to interpret a gene we label the gene with the most extreme alpha value in the corresponding group of SNPs. To calculate the posterior probabilities we again interpret the whole group of SNPs (genes).

Heuristic:
We recognized that the type 1 method has good power to distinguish between balancing and positive selection but produces rapid increasing posterior P-values. Instead of correcting this with normalization, e.g with the use of empirical functions, we introduce here a heuristic which controls the number of included alphas during the jump model. A "h.average.P" variable takes the expected probability that a locus is under selection into account. We keep that value variable in order to adjust the average number of alphas included in the model during the reversible jump until an user-defined ratio (expected fraction of loci under selection=alpha included) is reached. The underlying algorithm logarithmically decreases the probability that a gene is subject to selection while it stays 100 iterations at each state to ensure good differentiation at the full P-value scale [0,1]. The step width can be tuned by the user. The ratio (n.genes.included/n.genes) is printed during the jump model. If h=FALSE the parameter h.average.P is treated like the prior.odds variable in BayeScan and should be set accordingly.

Value

returned value is an object of class "BAYESRETURN"

————————————————————–
Following Slots will be filled
————————————————————–

pilot.alpha

alpha effects after pilot runs

pilot.beta

beta effects after pilot runs

pilot.var_alpha

variance of pilot alphas

pilot.fst

fst values after pilt runs

pilot.P.norm

alphas are fitted to a normal distribution after the pilot runs

pilot.Q.norm

Q values out of the normal distribution

post.alpha

posterior alpha

post.fst

posterior fst

post.beta

posterior beta

post.P

posterior P-values

post.Q

posterior Q-values

region.names

names of region

References

[1] Foll M and OE Gaggiotti (2008). A genome scan method to identify selected loci appropriate for both dominant and codominant markers: A Bayesian perspective. Genetics 180: 977-993

[2] Beaumont M, Balding D. 2004. Identifying adaptive genetic divergence among populations from genome scans.Molecular Ecology. 13:969-980.

[3] Riebler A, Held L, Stephan W. 2008. Bayesian variable selection for detecting adaptive genomic differences among populations. Genetics 178: 1817-1829

[4] Green PJ. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711-732.

Examples

1
2
3
4
5
6
7
8
9
# Example files can be found in the subdirectory "data".
# results <- GeneFeST(input="snps.txt", GROUP="groups.txt",prior.odds=0.1)
# Using the R-package PopGenome to generate the input
# install.packages("PopGenome")
# library(PopGenome)
# GENOME.class <- readData("Alignments")# Alignments is a folder 
# GENOME.class <- set.populations(GENOME.class, list(c(....),c(....)))
# input        <- getBayes(GENOME.class, snps=TRUE)
# results      <- GeneFeST(input)

GeneFeST documentation built on May 29, 2017, 7:49 p.m.