esth | R Documentation |
Bayesian hybrid index estimation.
esth(
data.prep.object,
read.data.precols,
test.subject = "INDLABEL",
test.subject.compare = NULL,
include.Source = TRUE,
return.likmeans = FALSE,
fix.subject = FALSE,
fix.value = FALSE,
plot.ind = NULL,
plot.col = NULL,
nitt,
burnin,
init.var = 0.002,
init.var2 = NULL,
prior = c(0.5, 0.5),
print.k = 50
)
data.prep.object |
Name of the |
read.data.precols |
A |
test.subject |
Character string. Name of the field identifying subjects for hybrid index estimation. Default “INDLABEL”. |
test.subject.compare |
Character string. Name of another field in the data analysis table, such as ‘POPID’, with which
you wish to compare the fit of the current test subject using either AIC or waic. Used to compare model fits at differing resolutions,
such as individual-level versus population-level, with the subsequent comparison being made at the lower resolution. Only needs to be
declared when running the higher resolution model, and produces a warning if the declared field is at a higher resolution than the test subject.
A second run of |
include.Source |
Logical. Whether to estimate hybrid indices for test subjects declared as parental reference. Default is |
return.likmeans |
Logical. Whether to return a table of likelihood calculations. Required for |
fix.subject |
Logical default, or a character vector. Which, if any, test subjects to fix at a specific hybrid index value. If set |
fix.value |
Logical default, otherwise a numeric vector of length 1 or the same length and order as the character vector declared using
|
plot.ind |
Character vector. List of up to 6 test subjects for which you wish to plot the posterior distribution values in real time.
Default |
plot.col |
Character vector. List of up to 6 colours to plot the |
nitt |
Numeric. The total number of MCMC iterations including burnin. At least 2000, preferably 3000, is recommended. |
burnin |
Numeric. The number of burnin MCMC iterations. Typically, 1000 is sufficient. |
init.var |
Numeric. Starting value for the variance of the parameter proposal distribution, applied for the first 100 iterations. Default is 0.002, which works well with tested data sets. |
init.var2 |
Numeric. Variance of the proposal distribution to be applied after the first 100 iterations, by which time the mcmc should be
sampling the posterior distribution. The default entry is |
prior |
Numeric vector. Values of |
print.k |
Numeric. The iteration is printed to screen at multiples of this number. Default is |
esth
estimates hybrid index using the likelihood formulae of Buerkle (2005), with the addition of a
prior. The default is Jeffrey's prior (beta distribution with shape1=shape2=0.5
), which testing suggests
is an improvement over a uniform prior (shape1=shape2=1
).
esth
excludes any pre-marker columns from the output with finer resolution than the declared test.subject
,
and hence has the same number of rows as the number of unique test.subject
values.
Set return.likmeans=TRUE
if you intend to carry out model comparison based on the widely applicable information criterion (waic).
list containing hi
, test.subject
, init.var2
, and likmeans
(optional). init.var2
is numeric and
is included in case the user wants to try a different, possibly smaller, value, for test subjects for which mcmc failed. likmeans
contains the information needed to calculate waic. hi
is a data.table
and data.frame
containing all pre-marker
columns that do not have finer resolution than the declared test.subject
, as well as the following fields:
Source |
whether the sample is from source 0 ('S0'), source 1 ('S1') or is a test individual ('TEST'). |
h_posterior_mode |
the hybrid index estimate (mode of the beta-distributed posterior). |
h_cred_int_lower |
lower 95 percent credible interval (2.5 percent quantile of the posterior beta distribution). |
h_cred_int_upper |
upper 95 percent credible interval (97.5 percent quantile of the posterior beta distribution). |
beta_mean |
mean of the posterior beta distribution (included for convenience). |
beta_var |
variance of the posterior beta distribution (included for convenience). |
beta_shape1 |
beta shape1 parameter estimate for the posterior beta distribution (included for convenience). |
beta_shape2 |
beta shape2 parameter estimate for the posterior beta distribution (included for convenience). |
npar_compare |
see compare.models. Absent unless test.subject.compare is set. |
Richard Ian Bailey richardianbailey@gmail.com
Buerkle, C. A. (2005). Maximum likelihood estimation of a hybrid index based on molecular markers. Molecular Ecology Notes, 5(3), 684-687.
## Not run:
#Use the sparrow data with loci filtered by parental allele frequency credible interval overlap.
dat=read.data("RB_Italy_ggcline_precol_headers_haploidND2.csv",nprecol=2,MISSINGVAL=NA,NUMINDS=569)
prepdata=data.prep(data=dat$data,loci=dat$loci,alleles=dat$alleles,S0=c("Kralove","Oslo"),S1="LesinaSPANISH",precols=dat$precols,AF.CIoverlap = FALSE)
###############################################
#In the presence of parental reference samples#
###############################################
hindlabel=esth(
data.prep.object=prepdata$data.prep,
read.data.precols=dat$precols,
include.Source=TRUE, #Leave at default TRUE if you want hybrid indices for the parental reference individuals#
plot.ind = c("PD07-254","PD07-160","PD07-159","PI07-110","PI08-498","PH08-285"), #posterior sampling for up to 6 test subjects can be plotted in real time#
plot.col = c("blue","green","cyan","purple","magenta","red"),
nitt=3000, #Testing suggests nitt=3000,burnin=1000 are plenty for accurate posterior estimation#
burnin=1000
)
#There will be a warning if any NAs are produced for the best posterior estimate (h_posterior_mode).
###################################################
#esth in the absence of parental reference samples#
###################################################
#This requires a marker.info file with correct column names and containing parental allele frequencies.
#First create this file.
dat=read.data("RB_Italy_ggcline_precol_headers_haploidND2.csv",nprecol=2,MISSINGVAL=NA,NUMINDS=569);
prepdata=data.prep(data=dat$data,loci=dat$loci,alleles=dat$alleles,S0=c("Kralove","Oslo"),S1="LesinaSPANISH",
precols=dat$precols,AF.CIoverlap = FALSE,return.locus.table = TRUE);
sparrowMarkers=prepdata$locus.data[,.(locus,S0_allele,S1_allele,S0.prop_0,S1.prop_0)];
setnames(sparrowMarkers,c("S0_allele","S1_allele","S0.prop_0","S1.prop_0"),c("refAllele","alternateAllele","S0.prop_r","S1.prop_r"));
#Save it to the working directory.
fwrite(sparrowMarkers,"sparrowMarkers.csv")#By default data.table's "fwrite" saves as a comma-separated file, whatever the file extension#
#Now run the workflow without specifying parental reference samples.
dat=read.data(file="RB_Italy_ggcline_precol_headers_haploidND2.csv",nprecol=2,NUMINDS=569,MISSINGVAL=NA,marker.info.file="sparrowMarkers.csv",sourceAbsent=TRUE);
#Run data.prep filtering for allele frequency difference.
prepdata= data.prep(
data=dat$data,
loci=dat$loci,
sourceAbsent = TRUE, #***Required when no parental reference samples are declared***#
marker.info=dat$marker.info,
alleles=dat$alleles,
precols=dat$precols,
min.diff = 0.2
)
#esth can now be run as above.
hindlabelx=esth(data.prep.object=prepdata$data.prep,read.data.precols=dat$precols,nitt=3000,burnin=1000)
#################################################################
#Estimating a hybrid index per locus across all TEST individuals#
#################################################################
#Also known as locus-specific ancestry.
#Use a dataset including parental reference samples, and then run per-locus hybrid index estimation on the 'TEST'
#individuals only (i.e. excluding S0 and S1). No plots this time.
dat=read.data("RB_Italy_ggcline_precol_headers_haploidND2.csv",nprecol=2,MISSINGVAL=NA,NUMINDS=569);
prepdata=data.prep(data=dat$data,loci=dat$loci,alleles=dat$alleles,S0=c("Kralove","Oslo"),S1="LesinaSPANISH",precols=dat$precols,AF.CIoverlap = FALSE);
hindlabel=esth(
data.prep.object=prepdata$data.prep,
read.data.precols=dat$precols,
test.subject="locus", #Switch from estimating h-index per individual (INDLABEL, the default) to per locus#
include.Source=FALSE, #Exclude the parental reference individuals (which are included by default) to get the h-index per locus for TEST individuals only#
nitt=3000,
burnin=1000
)
##############################################################################
#Running esth on one of the files produced by the 'split_data_prep' function##
##############################################################################
dat=read.data("RB_Italy_ggcline_precol_headers_haploidND2.csv",nprecol=2,MISSINGVAL=NA,NUMINDS=569);
prepdata=data.prep(data=dat$data,loci=dat$loci,alleles=dat$alleles,S0=c("Kralove","Oslo"),S1="LesinaSPANISH",precols=dat$precols,AF.CIoverlap = FALSE);
#Produce one data analysis file per individual and save to WD.
split_data_prep(data.prep.object=prepdata$data.prep,splitBy="INDLABEL",keepN=1);
#Read in the first sub-file and estimate hybrid index.
prep=fread("prepdata_INDLABEL_1.csv");
hindlabel=esth(data.prep.object=prep,read.data.precols=dat$precols,nitt=3000,burnin=1000)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.