ggcline | R Documentation |
Estimate genomic cline parameters, credible intervals and p values.
ggcline(
data.prep.object,
esth.object,
esth.colname = "h_posterior_mode",
test.subject = "locus",
poolv = FALSE,
poolcentre = FALSE,
include.Source = FALSE,
return.likmeans = FALSE,
read.data.precols,
fix.subject.v = FALSE,
fix.value.v,
fix.subject.centre = FALSE,
fix.value.centre,
plot.test.subject = NULL,
plot.col = NULL,
plot.ylim = c(-5, 5),
plot.pch.v.centre = c(1, 3),
prior.logitcentre = c(0, sqrt(50)),
prior.logv = c(0, sqrt(10)),
nitt,
burnin,
start.v = NULL,
start.centre = NULL,
init.var.v = NULL,
init.var.centre = NULL,
init.cov.vcentre = NULL,
print.k = 50
)
data.prep.object |
Name of the |
esth.object |
Name of the |
esth.colname |
Name of the column in the |
test.subject |
Character string. Name of the field identifying subjects for genomic cline estimation. Default “locus”. |
poolv |
Logical. Whether to pool the rate parameter |
poolcentre |
Logical. Whether to pool the |
include.Source |
Logical. Whether to include the parental reference individuals in genomic cline
estimation. Default |
return.likmeans |
Logical. Whether to return a table of likelihood calculations. Required for applying the widely applicable
information criterion to model comparison using |
read.data.precols |
A |
fix.subject.v |
Logical default, otherwise a character vector. List of test subjects (normally locus names)
for which you wish to fix rather than estimate the rate parameter |
fix.value.v |
Logical default, otherwise a numeric vector of length 1 or the same length as |
fix.subject.centre |
Logical default, otherwise a character vector. List of test subjects (normally locus names)
for which you wish to fix rather than estimate the |
fix.value.centre |
Logical default, otherwise a numeric vector of length 1 or the same length as |
plot.test.subject |
Character vector. List of up to 3 test subjects for which you wish to plot the posterior distribution values in real time.
Rate parameter |
plot.col |
Character vector. List of up to 3 colours to plot the |
plot.ylim |
Numeric vector. Upper and lower plot limits of the y axis. Default |
plot.pch.v.centre |
Numeric vector. Size of data points for the two parameters. Default |
prior.logitcentre |
Numeric vector. Mean and sd of the normally distributed prior for logit( |
prior.logv |
Numeric vector. Mean and sd of the normally distributed prior for ln( |
nitt |
The total number of MCMC iterations including burnin. 5,000 is recommended, 10,000 if one parameter is pooled. |
burnin |
Numeric. The number of burnin MCMC iterations. 2,000 is recommended, 5,000 if one parameter is pooled. |
start.v |
Numeric vector. Optional starting |
start.centre |
Numeric vector. Optional starting |
init.var.v |
Numeric. Optional starting variance of the log( |
init.var.centre |
Optional starting variance of the logit( |
init.cov.vcentre |
Optional starting covariance of the log( |
print.k |
The iteration is printed to screen on multiples of this number. Default is |
ggcline
assumes the posterior of ln(v)
is normally distributed, and the results for this parameter are
the inverse logs of the estimated mean and upper and lower 95 percent credible intervals of ln(v
). v
is always positive and higher
values for v
indicate steeper clines, with v=1
being the null value. centre
represents the hybrid index
at which allele frequencies are half way between those of the parents. It ranges between 0 and 1 and the null value is 0.5. It
is estimated as logit(centre
), and the posterior mean and credible intervals are inverse logit-transformed back to the
original scale.
While Fitzpatrick (2013) describes parameters, v
(the relative cline slope) and u
(the relative cline position),
the parameter u
is difficult to interpret as its range scales to v
. Noting that for the hybrid index itself,
u=0
and v=1
, the cline centre
(hybrid index value for which allele frequencies are half way between those of
the parents: m
in Fitzpatrick's notation) for individual loci has the relationship logit(centre)=u/v
. centre
is easier to interpret, and estimating it rather than u
improves MCMC efficiency; hence I estimate centre
rather than u
.
If both parameters are fixed to the null or other values, only nitt=2
and burnin=0
are required.
For poolv
and poolcentre
, the options are no pooling (FALSE
) or pooling across all
samples in the data set (TRUE
). So if for example you wish to obtain a single pooled estimate of v
across two transects
for each locus individually (i.e. not the same parameter estimate across all loci),
the loci must be run individually by subsetting the data.prep.object
to only include one locus in each run, using split_data_prep
.
list with at least two components, gc
, a data.table
and data.frame
with the estimated
genomic cline parameters, and test.subject
, a character scalar. The optional output likmeans
is needed downstream
when model comparison is carried out with compare.models
.
gc
contains the test.subject
and following fields:
locus |
the locus name, or another chosen test.subject. |
S0.prop_1 |
allele frequency of the S1 allele in the S0 source population. |
S1.prop_1 |
allele frequency of the S1 allele in the S1 source population. |
exp_mean_log_v |
the best posterior estimate of v (inverse-logged mean of the posterior normal distribution for ln(v)). |
v_lower_95 |
v lower 95 percent credible interval (inverse-logged 2.5 percent quantile of the posterior normal distribution for ln(v)). |
v_upper_95 |
v upper 95 percent credible interval (inverse-logged 97.5 percent quantile of the posterior normal distribution for ln(v)). |
v_pvalue |
Bayesian p value for v (2*Quantile of the null value (log(1)) given the posterior normal distribution for ln(v)). |
invlogit_mean_logit_centre |
best posterior estimate of centre (inverse logit-transformed mean of the posterior normal distribution for logit(centre)). |
centre_lower_95 |
centre lower 95 percent credible interval (inverse logit-transformed 2.5 percent quantile of posterior logit(centre)). |
centre_upper_95 |
centre upper 95 percent credible interval (inverse logit-transformed 97.5 percent quantile of posterior logit(centre)). |
centre_pvalue |
Bayesian p value for centre (2*Quantile of the null value (logit(0.5)) given the posterior normal distribution for logit(centre)). |
mean_log_v |
posterior mean v on the normally distributed latent scale (ln(v)). |
var_log_v |
posterior variance of v on the normally distributed latent scale (ln(v)). |
mean_logit_centre |
posterior mean centre on the normally distributed latent scale (logit(centre)). |
var_logit_centre |
posterior variance of centre on the normally distributed latent scale (logit(centre)). |
cov_log_v_logit_centre |
posterior covariance between the multivariate normally distributed ln(v) and (logit(centre)). |
npar |
number of parameters for each test subject. |
sum_npar |
sum number of parameters across all test subjects. |
Richard Ian Bailey richardianbailey@gmail.com
Fitzpatrick, B. M. (2013). Alternative forms for genomic clines. Ecology and evolution, 3(7), 1951-1966.
## Not run:
###########################################################
#Read in and prepare data, and run hybrid index estimation#
###########################################################
dat=read.data("RB_Italy_ggcline_precol_headers_haploidND2.csv",nprecol=2,MISSINGVAL=NA,NUMINDS=569);
prepdata=data.prep(data=dat$data,loci=dat$loci,alleles=dat$alleles,S0=c("Kralove","Oslo"),S1="LesinaSPANISH",
precols=dat$precols,AF.CIoverlap = FALSE);
hindlabel=esth(data.prep.object=prepdata$data.prep,read.data.precols=dat$precols,include.Source=TRUE,
nitt=3000,burnin=1000);
#################################################################################################################
#The full set of arguments for ggcline including all defaults (comments indicate which entries are not defaults)#
#################################################################################################################
gc=ggcline(
data.prep.object=prepdata$data.prep, #Needs an entry#
esth.object=hindlabel, #Needs an entry#
esth.colname="h_posterior_mode", #Default. Can be replaced with "beta_mean" from the esth object#
test.subject = "locus",
poolv = FALSE,
poolcentre = FALSE,
include.Source = TRUE, #Default is FALSE#
return.likmeans = TRUE, #Default is FALSE#
read.data.precols=dat$precols, #Needs an entry#
fix.subject.v = FALSE,
fix.value.v,
fix.subject.centre = FALSE,
fix.value.centre,
plot.test.subject = c("A2ML1_SNP1","GTF2H2"), #Plots are just to check the MCMC is working#
plot.col = c("orange","cyan"), #Plots are just to check the MCMC is working#
plot.ylim = c(-3, 5), #Plots are just to check the MCMC is working#
plot.pch.v.centre = c(1, 3), #Plots are just to check the MCMC is working#
prior.logitcentre = c(0, sqrt(50)), #Default#
prior.logv = c(0, sqrt(10)), #Default#
nitt=5000, #Needs an entry, 5000 should be sufficient for standard per-locus estimates#
burnin=2000, #Needs an entry, 2000 should be sufficient for standard per-locus estimates#
start.v = NULL,
start.centre = NULL,
init.var.v = NULL,
init.var.centre = NULL,
init.cov.vcentre = NULL,
print.k = 50
)
#############################################################################################
#Estimate a single v parameter across a whole chromosome, but a unique centre for each locus#
#############################################################################################
#First, add a column to prepdata$data.prep indicating the chromosome for each marker.
chrom=fread("markerchr.csv");#This contains only the loci retained in prepdata$data.prep after filtering#
setkey(prepdata$data.prep,locus);setkey(chrom,locus);
prepdata$data.prep=prepdata$data.prep[chrom];
#Make a data.prep object for the Z chromosome markers only.
prepZ=prepdata$data.prep[chrom=="Z"]
#First run ggcline for chrZ-only without pooling, for later model comparison.
gc_unpooled_z=ggcline(data.prep.object=prepZ,esth.object=hindlabel,include.Source = TRUE,
return.likmeans = TRUE, #***Important to set this to TRUE waic for model comparison***#
read.data.precols=dat$precols, nitt=5000,burnin=2000)
#Run again but pooling v across all loci. The MCMC is less efficient with pooling, so using longer nitt and burnin.
gc_poolv_Z=ggcline(data.prep.object=prepZ,esth.object=hindlabel,
poolv = TRUE, #***Set to TRUE***#
include.Source = TRUE,
return.likmeans = TRUE, #***Important to set this to TRUE for model comparison***#
read.data.precols=dat$precols,
nitt=10000, #Use a longer post-burnin (5000 in this case)#
burnin=5000 #Use a longer burnin#
)
########################################################################
#Set up model comparisons by fixing one or both parameters (no pooling)#
########################################################################
#Run ggcline with both parameters fixed to their null values. This can then be compared with the first run above (gc).
gc_v_centre_fixed=ggcline(data.prep.object=prepdata$data.prep,esth.object=hindlabel,include.Source = TRUE,
return.likmeans = TRUE, #***Important to set this to TRUE for model comparison***#
read.data.precols=dat$precols,
fix.subject.v = gc$gc$locus, #***Character vector of locus names, here taken from the first ggcline run above***#
fix.value.v = 1, #***The null parameter value on the data scale***#
fix.subject.centre = gc$gc$locus, #***Vector of locus names***#
fix.value.centre = 0.5, #***The null parameter value on the data scale***#
nitt=2, #MCMC estimation not needed when both parameters are fixed for all test subjects, only nitt=2 is necessary to avoid errors#
burnin=0 #MCMC estimation not needed when both parameters are fixed for all test subjects#
)
#The comparison would be cleaner if we only fixed one parameter. For example, v.
gc_v_fixed=ggcline(data.prep.object=prepdata$data.prep,esth.object=hindlabel,include.Source = TRUE,
return.likmeans = TRUE, #***Important to set this to TRUE for model comparison***#
read.data.precols=dat$precols,
fix.subject.v = gc$gc$locus, #***Vector of locus names***#
fix.value.v = 1, #***The null parameter value on the data scale (could also be fixed to some other value)***#
#fix.subject.centre = gc$gc$locus, #Not fixing centre this time#
#fix.value.centre = 0.5, #Not fixing centre this time#
nitt=5000,
burnin=2000
)
#This can now be compared with either gc or gc_v_centre_fixed.
##################################################################
#Prepare to compare parameters estimated at differing resolutions#
##################################################################
#Run ggcline using chromosome as test.subject rather than locus. We added the column "chrom" to prepdata$data.prep earlier.
gc_by_chr=ggcline(data.prep.object=prepdata$data.prep,esth.object=hindlabel,
test.subject = "chrom", #***Changed from default***#
include.Source = TRUE,return.likmeans = TRUE,read.data.precols=dat$precols,nitt=5000,burnin=2000)
#The column "chrom" is absent from the 'gc' results object created above, and must first be added to both gc$gc and gc$likmeans.
chrom=fread("markerchr.csv");#This contains only the loci retained in prepdata$data.prep#
setkey(gc$gc,locus);setkey(chrom,locus);setkey(gc$likmeans,locus);
gc$gc=gc$gc[chrom];
gc$likmeans=gc$likmeans[chrom];
#Now model comparison can be carried out at the chromosome level, between gc and gc_by_chr.
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.