An Introduction to CAMer package

CAMer package (Continuous Admixture Modeler) does Continuous Admixture Modeling (CAM) and related summary based on the result of iMAAPs. It introduces three new S3 classes, CAM.single, CAM and CAM.conclusion, and some corresponding methods. It also contains some utility functions and two simulated data sets (CGF_50 and GA_I) for illustration.


Computation

Single LD Decay Curve

The function singleCAM() does CAM for a single LD decay curve. For example, let's use the CGF_50 data set (the admixture proportion for population 1 ($m_1$) is 0.3) to do CAM with the most ancient generation concerned being 70 (T=70L) and core models being HI, CGF1, CGF2 and GA (isolation=FALSE):

library(CAMer)
data(CGF_50)
d<-CGF_50$Distance
Z<-CGF_50$Combined_LD
fit<-singleCAM(d=d,Z=Z,m1=0.3,T=70L,isolation=FALSE)
fit

where parameter d corresponds to genetic distance and parameter Z corresponds to an LD decay curve.

One can also specify the file path of the .log file containing the information of m1 in argument m1=.

Here the class of fit is CAM.single, and it has its own method for print(). fit$summary is a more comprehensive data frame containing the data frame printed.

Parallel computation is also supported provided that parallel package or snow package is installed. For newer versions of R (>=2.14.0), parallel is in R-core. If only snow is available, it is recommended to library it before using the parallel computing functionality.

See the help page of singleCAM() for more examples.

Multiple LD Decay Curves (.rawld File)

The function CAM() does CAM for a .rawld file with multiple LD decay curve. Parallel computation is also supported. For example, let's use the GA data set ((the admixture proportion for population 1 ($m_1$) is 0.3) with the most ancient generation concerned being 150 (T=150L) and core models being HI, CGF1-I, CGF2-I and GA-I (isolation=TRUE by default), without using parallel computation for the four models for each LD decay curve (single.parallel=FALSE):

data(GA_I)
fit<-CAM(rawld=GA_I,m1=0.3,T=150L,LD.parallel=TRUE,single.parallel=FALSE)
#Usually, one only needs to pass the paths to the .rawld file and to the .log file to CAM():
#fit<-CAM(rawld="path/to/GA_I.rawld",m1="path/to/GA_I.log",T=150L,LD.parallel=TRUE,single.parallel=FALSE)
fit

One can also specify the file path of the .rawld file in argument rawld= and the file path of the .log file containing the information of m1 in argument m1=.

Here the class of fit is CAM, and it has its own method for print() and plot(). fit$summary is a more comprehensive data frame containing the data frame printed. A CAM object has an element named CAM.list consisting of the CAM.single objects for each LD decay curve.

Parallel computation is also supported as in the example, provided that parallel package or snow package is installed. For newer versions of R (>=2.14.0), parallel is in R-core. If only snow is available, it is recommended to library it before using the parallel computing functionality.

See help page of CAM() for more examples and details.


Summary Plots

A new method of plot() for CAM class is introduced in this package (plot.CAM()). This function generates three plots in a device. The plot on the top left is the estimated time intervals/points for the four models. The color depth of segments/points corresponds to how many intervals/points covers this part in Jackknives. The deeper the color, the more estimates from Jackknives cover this part. The plot on the top right is the boxplot of msE for the four models. The third plot shows the fitting of four models to Combined_LD in the .rawld file. The numbers after model names in the legend are quasi-F values of the four models for Combined_LD. For example, let's plot the previous result:

plot(fit)

One can also run plot(fit,"GA_I.pdf") to plot to a .pdf file, which is recommended.

To change the colors of models, one can pass a $3 \times 4$ matrix of colors:

plot(fit,model.cols=matrix(c("pink","red","pink",
                             "lightseagreen","green","green",
                             "skyblue","blue","blue",
                             "yellow","orange","orange"),ncol=4))

See help page of plot.CAM() for more details.


Draw Conclusions on Best Model(s)

The function conclude.model() can draw conclusions on which models are the significantly best ones and find their estimated time intervals/points. It takes a "CAM" class object or its summary table as input. For example, let's find out the best model(s) from the previous CAM analysis:

conclusion<-conclude.model(fit)
conclusion<-conclude.model(fit$summary)
conclusion

The function returns an object of CAM.conclusion class, which has a special method for print().

Note that this function only selects the significantly best model(s), i.e. the one(s) that are significantly the closest to what is observed. It does NOT check if the best model(s) are credible or not. The user should check the quasi-F value ans msE in the summary table or plot of a "CAM" class object for this purpose.

See the help page of conclude.model() for further information.


Miscellany

Construct a Simple CAM object

Sometimes maybe only the summary table of an object of CAM class is saved. The function construct.CAM() can construct a simple CAM object given the original .rawld file, the summary table of the original CAM object and the admixture proportion of population 1 $m_1$, which can be passed to plot.CAM() function and conclude.model() function. For example, let's "save" the summary table of the previous result (fit$summary), then use this function to construct a CAM class object and do some further analysis from it:

summarydata<-fit$summary
rm(fit)
fit<-construct.CAM(rawld=GA_I,m1=0.3,dataset=summarydata)
fit
plot(fit)
conclude.model(fit)

Reconstruct Fitted LD Decay Curves

One may want to get the fitted LD decay curves. The function reconstruct.fitted() takes a CAM.single class object and returns a list containing the best-fit curves for the four models. It can take the CAM.single class objects in the constructed a CAM class object from construct.CAM() as input. For example, let's use the CAM class object just constructed and reconstruct the fitted curves:

fitted<-reconstruct.fitted(fit$CAM.list[[1]])
str(fitted)

HI Modle for Single LD Decay Curve

The function singleHI() does time inference, of HI model only, for a single LD decay curve. The algorithm is the same as the HI model part of singleCAM(). For example, let's use the Combined LD in the CGF_50 data set and use only HI as the core model:

fit<-singleHI(d,Z,m1=0.3,T=70L)
fit

This function also returns an object of CAM.single class, and can be passed to reconstruct.fitted():

fitted<-reconstruct.fitted(fit)
str(fitted)

It is recommended to use this function when only HI model is concerned. See the help page of singleHI() for further details.

HI Model for Multiple LD Decay Curves (.rawld File)

The function HI() does time inference, of HI model only, for a .rawld file. The algorithm is the same as the HI model part of CAM(). For example, let's again use the GA_I data set with the most ancient generation concerned being 150 (T=150L), but this time only HI is the core model:

fit<-HI(GA_I,m1=.3,T=150L)
fit

The output is also an object of CAM class. However, it should NOT be passed to plot(), and its summary table should NOT be passed to construct.CAM().

It is recommended to use this function when only HI model is concerned. See the help page of HI() for further details.



QIU-Hongxiang-David/CAMer documentation built on Nov. 13, 2021, 5:15 p.m.