While coala
primiary focuses on simulation of data, it also offers
to calculate summary statsitcs from real data. This is particularly useful
when comparing the statistics of real and simulated data.
Rather than offering functions to import data directly, coala can convert
the internal formats of other R packages into its own format. Currently, the
PopGenome
package is supported, but we plan to support ape
and pegas
in the
future.
PopGenome
provides functions for reading various data formats, including vcf
and fasta
. Please refer to its documentation for detailed instructions. As an example,
we will read sequence data from a short fasta file that is included in coala:
suppressPackageStartupMessages(library(PopGenome)) fasta <- system.file("example_fasta_files", package = "coala") data_pg <- readData(fasta, progress_bar_switch = FALSE) data_pg <- set.outgroup(data_pg, c("Individual_Out-1", "Individual_Out-2")) individuals <- list(paste0("Individual_1-", 1:5), paste0("Individual_2-", 1:5)) data_pg <- set.populations(data_pg, individuals)
Here the sequences originate from two population and an outgroup. The outgroup is required for most summary statistics.
We can now convert data_pg
using the as.segsites
function:
library(coala) segsites <- as.segsites(data_pg)
Next, we calculate summary statistics using calc_sumstats_from_data
:
model <- coal_model(c(5, 5, 2), 1, 25) + feat_mutation(5) + feat_outgroup(3) + sumstat_sfs(population = 1) stats <- calc_sumstats_from_data(model, segsites) stats$sfs
Alternatively, it is also possible to pass the data_pg
object directly to
calc_sumstats_from_data
:
stats <- calc_sumstats_from_data(model, data_pg) stats$sfs
Please refer to the help pages for as.segsites
and calc_sumstats_from_data
for additional information.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.