calc_AIC: Function to calculate AIC from the best posterior parameter...
In ribailey/gghybrid: Evolutionary Analysis of Hybrids and Hybrid Zones

calc_AIC

R Documentation

Function to calculate AIC from the best posterior parameter estimate(s) for an esth or ggcline object.

Description

Function to calculate AIC from the best posterior parameter estimate(s) for an esth or ggcline object.

Usage

calc_AIC(
  data.prep.object,
  esth.object = NULL,
  esth.colname = "h_posterior_mode",
  ggcline.object = NULL,
  ggcline.pooled = FALSE,
  test.subject
)

Arguments

`data.prep.object`	Name of the `data.prep` object produced by the `data.prep` function.
`esth.object`	Name of the `esth` object.
`esth.colname`	Name of the column in the `esth.object` with the best posterior hybrid index estimates. Default ‘h_posterior_mode’.
`ggcline.object`	Name of the `ggcline` object. Only needed when calculating AIC for a genomic cline model.
`ggcline.pooled`	Logical. If set to TRUE, calculates a single AIC for the whole dataset. Useful when a comparison is being made between a model with one of the genomic cline parameters pooled across the whole dataset and another model. Default FALSE.
`test.subject`	The test subject for which you wish to make a comparison between models.

Details

calc_AIC calculates the AIC for one or more test subjects for a single model, using the likelihoods for the best posterior parameter estimates.

For hybrid index model comparison, calc_AIC is set up to compare models of differing resolution, for example, for each of a set of populations, comparing AIC for a single hybrid index estimated at the population level to separate hybrid index estimates per individual within the population.

For genomic cline model comparisons there are further options. Comparisons can be made between models run at the same resolution but with differing combinations of parameters fixed versus estimated, or parameters fixed at different values. When one or both cline parameters are pooled across the dataset in one model, it can be compared at the level of the whole dataset with models with no or different pooling.

Value

A data.table and data.frame with the name of the test subject (when there is more than one) plus following fields:

`ml`	The model maximum likelihood for each test subject.
`npar`	The number of parameters for each test subject.
`AIC`	AIC for each test subject.

Author(s)

Richard Ian Bailey richardianbailey@gmail.com

Examples


## Not run: 
############################################################################################################
#Hybrid index example: compare the fit between running esth at the individual level versus population level#
############################################################################################################

#This will result in a separate model comparison for each POPID (the level with lower resolution).

#Use the sparrow dataset, as in other examples.
dat=read.data("RB_Italy_ggcline_precol_headers_haploidND2.csv",nprecol=2,MISSINGVAL=NA,NUMINDS=569);
prepdata=data.prep(data=dat$data,loci=dat$loci,alleles=dat$alleles,S0=c("Kralove","Oslo"),S1="LesinaSPANISH",precols=dat$precols,AF.CIoverlap = FALSE);

#First run esth with test.subject="INDLABEL" (default). 
#'return.likmeans=TRUE' is not needed to calculate AIC; it is only necessary to calculate waic using 'compare.models'.
#For the higher resolution test subject only, we also need to declare what lower resolution test subject we're going to compare with.
hindlabel=esth(
 data.prep.object=prepdata$data.prep,
 read.data.precols=dat$precols,
 test.subject = "INDLABEL",                                  #default#
 test.subject.compare = "POPID",                             #***the test.subject you want to compare with the current test.subject***#
 include.Source=TRUE,
 nitt=3000,
 burnin=1000
)

#Then run esth again, with test.subject="POPID". No need to declare 'test.subject.compare' when running the lower resolution model.
hindlabel2=esth(
 data.prep.object=prepdata$data.prep,
 read.data.precols=dat$precols,
 test.subject = "POPID",                                     #***Changed to "POPID"***#
 include.Source=TRUE,
 nitt=3000,
 burnin=1000
)

#Now compare models using AIC.
#AIC is calculated per declared test subject, so for direct comparison use the lower-resolution test subject to calculate AIC for both models.
aic_INDLABEL=calc_AIC(
 data.prep.object=prepdata$data.prep,
 esth.object=hindlabel,
 test.subject="POPID"              #Make this the same for both models for direct comparison of fit#
);

aic_POPID=calc_AIC(
 data.prep.object=prepdata$data.prep,
 esth.object=hindlabel2,
 test.subject="POPID"              #Make this the same for both models for direct comparison of fit#
)

#Now join the tables and calculate the difference in AIC.
setkey(aic_INDLABEL,POPID);setkey(aic_POPID,POPID);
aic_both=aic_INDLABEL[aic_POPID];#The columns named 'i...' refer to the second table, aic_POPID#
aic_both[,aicdiff:=AIC - i.AIC]#Positive values favour the less complex model (a single pooled hybrid index for each populations)#

#Show the results in order of aicdiff for each POPID. Those at the top of the ordered table more strongly favour the individual-level model.
aic_both[order(aicdiff)]

#################################
#################################
#Genomic cline model comparisons#
#################################
#################################

##############################################################
#Parameter v pooled versus unpooled across a whole chromosome#
##############################################################

#Calculate a single AIC for the sparrow Z chromosome for two ggcline models run in the examples under ggcline, 
#one with v pooled and one unpooled.
aic_unpooled_z=calc_AIC(
 data.prep.object=prepZ,
 esth.object=hindlabel,
 ggcline.object=gc_unpooled_z,
 test.subject="locus",
 ggcline.pooled=TRUE
)

aic_poolv_Z=calc_AIC(
 data.prep.object=prepZ,
 esth.object=hindlabel,
 ggcline.object=gc_poolv_Z,
 test.subject="locus",
 ggcline.pooled=TRUE
)

#Calculate the difference in AIC. Negative favours the more complex model, with v unpooled.
aic_unpooled_z$AIC - aic_poolv_Z$AIC

#############################################################################################################
#At the individual locus level, compare a model with v and centre estimated versus a model with v fixed at 1#
#############################################################################################################

#Compare the ggcline objects gc (no fixing) and gc_v_fixed created in the ggcline examples.
aic_v_fixed=calc_AIC(
 data.prep.object=prepdata$data.prep,
 esth.object=hindlabel,
 ggcline.object=gc_v_fixed,
 test.subject="locus",
 ggcline.pooled=FALSE
)

aic_no_fixed=calc_AIC(
 data.prep.object=prepdata$data.prep,
 esth.object=hindlabel,
 ggcline.object=gc,
 test.subject="locus",
 ggcline.pooled=FALSE
)

#Now join the tables and calculate the difference in AIC for each locus.
setkey(aic_no_fixed,locus);setkey(aic_v_fixed,locus);
aic_both=aic_no_fixed[aic_v_fixed];#The columns named 'i...' refer to the second table, aic_v_fixed#
aic_both[,aicdiff:=AIC - i.AIC]#Positive values favour the less complex model (v fixed at 1)#

#Show the results in order of aicdiff for each POPID. Those at the top of the ordered table more strongly favour the individual-level model.
aic_both[order(aicdiff)]

#########################################################
#Comparing differing resolutions at the lower resolution#
#########################################################

#Compare the ggcline objects gc (1 cline per locus) and gc_by_chr (1 cline per chromosome) created in the ggcline examples.

#If the column "chrom" is absent from the 'gc' results object and the prepdata$data.prep object, it must first be added.
chrom=fread("markerchr.csv")#This contains only the loci retained in prepdata$data.prep#

setkey(prepdata$data.prep,locus);setkey(chrom,locus);
prepdata$data.prep=prepdata$data.prep[chrom]

setkey(gc$gc,locus);setkey(chrom,locus);setkey(gc$likmeans,locus);
gc$gc=gc$gc[chrom];
gc$likmeans=gc$likmeans[chrom]

#Now run AIC for each gc object.
aic_gc=calc_AIC(
 data.prep.object=prepdata$data.prep,
 esth.object=hindlabel,
 ggcline.object=gc,
 test.subject="chrom",           #Use the lower resolution test subject#
 ggcline.pooled=FALSE
)

aic_gc_by_chr=calc_AIC(
 data.prep.object=prepdata$data.prep,
 esth.object=hindlabel,
 ggcline.object=gc_by_chr,
 test.subject="chrom",           #Use the lower resolution test subject#
 ggcline.pooled=FALSE
)

#Now join the tables and calculate the difference in AIC for each chromosome.
setkey(aic_gc,chrom);setkey(aic_gc_by_chr,chrom);
aic_both=aic_gc[aic_gc_by_chr];#The columns named 'i...' refer to the second table, aic_gc_by_chr#
aic_both[,aicdiff:=AIC - i.AIC]#Positive values favour the less complex model (single cline for the whole chromosome)#

#Show the results in order of aicdiff for each POPID. Those at the top of the ordered table more strongly favour the individual locus level model.
aic_both[order(aicdiff)]

## End(Not run)

ribailey/gghybrid documentation built on Feb. 2, 2024, 12:53 a.m.