CAM: Continuous Admixture Modeling (CAM)
In QIU-Hongxiang-David/CAMer: Continuous Admixture Modeler

Description Usage Arguments Details Value Note See Also Examples

View source: R/CAMer.R

Estimate admixture time intervals/points for HI, CGF1(-I), CGF2(-I) and GA(-I) respectively for all Ld decay curves in a .rawld file.

1
2
3

CAM(rawld, m1, T = 500L, isolation = TRUE, fast.search = TRUE,
  max.duration = 150L, LD.parallel = TRUE, LD.clusternum,
  single.parallel = isolation && !fast.search, single.clusternum = 4L)

`rawld`	a string representing the path of the .rawld file or a data frame read from the .rawld file by `read.table`. The .rawld file should be the output of `MALDmef`.
`m1`	the admixture proportion of population 1 or the path of the .log file containing this information. If `m2` is the admixing proportion of population 2, then `m1+m2=1`. The .log file should be the output of `MALDmef`.
`T`	the most ancient generation to be searched. Defaults to 500.
`isolation`	`TRUE` if the models used for fitting are HI, CGF1-I, CGF2-I and GA-I; `FALSE` if the models used for fitting are HI, CGF1, CGF2 and GA. Defaults to `TRUE`.
`fast.search`	only used when `isolation=TRUE`. `TRUE` to use the fast searching algorithm, which sometimes gives slightly wider time intervals than the slow searching algorithm. Defaults to `TRUE`.
`max.duration`	Defaults to 150. See "Details".
`LD.parallel`	a logical expression indicating whether each LD decay curve should be computed parallely. Defaults to `TRUE`.
`LD.clusternum`	the number of clusters in parallel computation. See "Details".
`single.parallel`	a logical expression. See "Details".
`single.clusternum`	the number of clusters in parallel computation. Defaults to 4 for the four models. Used if `single.parallel=TRUE`.

max.duration is only used when isolation=TRUE and fast.search=FALSE. The maximal duration of admixture n to be considered as possible. Smaller values can make the slow searching algorithm faster. If max.duration>T, it will be set to be T.

LD.clusternum is used if LD.parallel=TRUE. If not specified, it is set to be the number of LD decay curves in the .rawld file.

single.parallel indicates whether parallel computation should be used when computing a single LD decay curve. Defaults to TRUE if isolation=TRUE,fast.search=FALSE and FALSE otherwise.

The .rawld file should include exactly one column nameed "Distance" in Morgan, exactly one column named "Combined_LD", several columns named "Jack?" representing Jackknives where ? is a number and exactly one column named "Fitted" representing the fitted LD decay curve using the previous method. This function fits "Combined_LD" and all Jackknives using all models. See singleCAM for further details of fitting algorithm for each admixture induced LD (ALD) decay curve decay curve.

If the last entry of Distence in the .rawld file is greater than 10, a warning of unit will be given.

If the estimated time intervals/points cover T, a warning of too small T is given. The user should re-run the function with a larger T so that optimal time intervals/points can be reached.

Require parallel or snow package installed if LD.parallel=TRUE or single.parallel=TRUE. For newer versions of R (>=2.14.0), parallel is in R-core. If only snow is available, it is recommended to library it before using the parallel computing funcationality. When only snow is available, it will be require-d and hence the search path will be changed; if parallel is available, it will be used but the search path will not be changed. One may go to https://cran.r-project.org/src/contrib/Archive/snow/ to download and install older versions of snow if the version of R is too old. If neither of the packages is available but LD.parallel=TRUE or single.parallel=TRUE, the function will compute sequentially with messages.

Be aware that when the computational cost is small (e.g. isolation=FALSE or T=20L,isoaltion=TRUE,fast.search=FALSE,max.duration=10L), using parallel computation for single LD decay curves can result in longer computation time.

an object of S3 class "CAM". A list CAM.list consisting of some basic information of function call, N objects of "CAM.single" class (where N is the number of LD decay curves in the .rawld file), the fitted Ld decay curve fitted by the previous method (up to some truncation according to max.index) and a summary table containing the parameter estimates for each model and each curve, and diagnostic statistics msE and quasi-F. For details of "CAM.single" class, see singleCAM.

There is a special method of plot and print for this class.

If the input of m1 is the .log file path, there should not be any "=" in the names of populations. If there are, the function may not be able to execute normally, and the user should check the .log file and input m1 as a number manually.

When LD.parallel=TRUE or single.parallel=TRUE, it is not recommended to terminate the execution of the function. If parallel package is available, it is said that setDefaultCluster from parallel can be used to remove the registered cluster, but real experiments do not support this; fortunately, these unused clusters will be removed automatically later, but with warnings. If only snow package is available, according to http://homepage.stat.uiowa.edu/~luke/R/cluster/cluster.html, "don't interrupt a snow computation". The ultimate method to close the unused clusters is probably to quit the R session.

Do care about memory allocation, especially when both LD.parallel=TRUE and single.parallel=TRUE.

It is possible that this function opens several nodes but non of them is computing, and hence the execution does not stop, especially when both LD.parallel=TRUE and single.parallel=TRUE. The cause has not been identified yet. The current solution is to terminate the function by hand and re-run the function with fewer cores (e.g. set single.parallel=FALSE).

construct.CAM, reconstruct.fitted, conclude.model

data(GA_I)

#fit models with isolation=FALSE.
fit<-CAM(GA_I,m1=0.3,T=150L,isolation=FALSE)
fit
## Not run: 
plot(fit) #may not be able to display

## End(Not run)
#Bad fitting indicates isolation=TRUE should be tried.
fit<-CAM(GA_I,m1=0.3,T=150L,isolation=TRUE)
fit
fit$summary
## Not run: 
plot(fit) #may not be able to display
plot(fit,"plot.pdf") #plot to a .pdf file

## End(Not run)

data(CGF_50)
fit<-CAM(CGF_50,0.3,20L,isolation=FALSE,LD.parallel=FALSE) #with warnings
fit

## Not run: 
#passing a file path to the argument `rawld=`
fit<-CAM("CGF_50.rawld",0.3)

## End(Not run)