GeneFeST: Bayesian calculation of gene-specific FST from genomic SNP...
In GeneFeST: Bayesian calculation of gene-specific FST from genomic SNP data

Description Usage Arguments Details Value References Examples

View source: R/slim.R

The method is based on the work from Beaumont and Balding (2004) where they introduce a FST-based hierarchical Bayesian model to detect loci that are subject to selection. In this Bayesian approach they use a logistic regression model to distinguish between locus-specific effects like selection and population-specific effects which are shared by all loci (e.g effects caused by migration rates) (Riebler, 2008). Foll and Gaggiotti (2008) extended this work using a reversible jump MCMC (Green, 1995) which enables testing the hypothesis that a locus is subject to selection; a very similar approach was developed in parallel by Riebler & Stefan (2008). The method is implemented in a software named BayeScan (http://cmpg.unibe.ch/software/BayeScan/). The new method introduced here is a modification of BayeScan (see details).

1 2	GeneFeST(input,GROUP=FALSE,nb.pilot=20,pilot.runtime=500,main.runtime=5000, type=1,only.pilots=FALSE,h.average.P=0.2,h.step.width=1,mcmc.diag=FALSE,h=TRUE)

`input`	textfile or an R-object returned from getBayes() provided by the R-package PopGenome
`GROUP`	SNP groups
`nb.pilot`	number of pilot runs
`pilot.runtime`	length of pilot runs
`main.runtime`	length of main runs
`type`	1: one alpha one group, 2: most extrem alpha to label group
`only.pilots`	only pilot runs are performed
`h.average.P`	the expected probability that a gen is under selection
`h.step.width`	step width of the heuristic, see details !
`mcmc.diag`	the input for the R-package coda is returned after the pilot runs, Note: 'coda' is required !
`h`	if `h=FALSE` the `h.average.P` should be set like the `prior.odds` variable suggested by Foll & Gaggiotti

Type of measurenments:

type 1:
Our type 1 method considers all SNPs separately but is restricted to generate exactly one alpha for each gene (or group of SNPs). This approach assumes that all SNPs observed in one gene share the same genetic effect. The type 1 method is default !

type 2:
To loosen up the condition that all SNPs in a gene are forced by the same selective pressure we provide an alternative approach which considers all SNPs separately, but sets no restrictions on the alphas of each SNP in a group. Basically this is exactly what BayeScan does in case of SNP data (Foll and Gaggiotti, 2008). However, to interpret a gene we label the gene with the most extreme alpha value in the corresponding group of SNPs. To calculate the posterior probabilities we again interpret the whole group of SNPs (genes).

Heuristic:
We recognized that the type 1 method has good power to distinguish between balancing and positive selection but produces rapid increasing posterior P-values. Instead of correcting this with normalization, e.g with the use of empirical functions, we introduce here a heuristic which controls the number of included alphas during the jump model. A "h.average.P" variable takes the expected probability that a locus is under selection into account. We keep that value variable in order to adjust the average number of alphas included in the model during the reversible jump until an user-defined ratio (expected fraction of loci under selection=alpha included) is reached. The underlying algorithm logarithmically decreases the probability that a gene is subject to selection while it stays 100 iterations at each state to ensure good differentiation at the full P-value scale [0,1]. The step width can be tuned by the user. The ratio (n.genes.included/n.genes) is printed during the jump model. If h=FALSE the parameter h.average.P is treated like the prior.odds variable in BayeScan and should be set accordingly.

returned value is an object of class "BAYESRETURN"

————————————————————–
Following Slots will be filled
————————————————————–

`pilot.alpha`	alpha effects after pilot runs
`pilot.beta`	beta effects after pilot runs
`pilot.var_alpha`	variance of pilot alphas
`pilot.fst`	fst values after pilt runs
`pilot.P.norm`	alphas are fitted to a normal distribution after the pilot runs
`pilot.Q.norm`	Q values out of the normal distribution
`post.alpha`	posterior alpha
`post.fst`	posterior fst
`post.beta`	posterior beta
`post.P`	posterior P-values
`post.Q`	posterior Q-values
`region.names`	names of region

[1] Foll M and OE Gaggiotti (2008). A genome scan method to identify selected loci appropriate for both dominant and codominant markers: A Bayesian perspective. Genetics 180: 977-993

[2] Beaumont M, Balding D. 2004. Identifying adaptive genetic divergence among populations from genome scans.Molecular Ecology. 13:969-980.

[3] Riebler A, Held L, Stephan W. 2008. Bayesian variable selection for detecting adaptive genomic differences among populations. Genetics 178: 1817-1829

[4] Green PJ. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711-732.

# Example files can be found in the subdirectory "data".
# results <- GeneFeST(input="snps.txt", GROUP="groups.txt",prior.odds=0.1)
# Using the R-package PopGenome to generate the input
# install.packages("PopGenome")
# library(PopGenome)
# GENOME.class <- readData("Alignments")# Alignments is a folder 
# GENOME.class <- set.populations(GENOME.class, list(c(....),c(....)))
# input        <- getBayes(GENOME.class, snps=TRUE)
# results      <- GeneFeST(input)