# Here, we set default options for our markdown file
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 5,
  fig.height = 5
)
# Change the way tibble prints so only prints 5 extra columns
options(tibble.max_extra_cols = 5)

library(ggplot2)
library(coiaf)

Data structure

The algorithms developed in this package require an input data set containing the population-level minor allele frequency (PLMAF), the within-sample minor allele frequency (WSMAF), and the within-sample coverage across a set of loci. We note that while our package leverages the population-level and within-sample minor allele frequencies to run, user may also input the population-level and within-sample allele frequencies of the reference allele. The package has built-in capabilities to convert these values to the allele frequencies for the minor allele.

The example real data set included with this package contains a matrix with the WSAFs of multiple samples across several loci, with the sample represented by the rows and the locus represented by the columns. The first 5 rows and 3 columns of the example real data set included with this package is shown below:

print(example_real_data[1:5, 1:3])

Given this information, we may determine the PLAF by averaging the WSAF of all samples across each locus, as follows:

plaf <- colMeans(example_real_data, na.rm = TRUE)

With the WSAF and PLAF, we can generate an input data frame. However, as our algorithms work on a per sample basis, we must generate a list of input data frames:

input_data <- purrr::map(seq_len(nrow(example_real_data)), function(i) {
  tibble::tibble(wsmaf = example_real_data[i, ], plmaf = plaf) %>%
    tidyr::drop_na()
})

Estimate the COI

With the input data set now generated, to run estimate the COI, users can use the compute_coi() or optimize_coi() function, depending on whether a discrete or continuous value of the COI is desired. Below we illustrate estimating the discrete COI:

# Estimate the COI of a single sample
optimize_coi(input_data[[1]], data_type = "real")

# Estimating the COI of multiple samples
purrr::map_dbl(input_data, ~ optimize_coi(.x, data_type = "real"))

The estimation functions will return the estimated COI. In some cases, additional information will also be returned.

Data visualization

We recommend exploring the ggplot2 package to plot results. The Graph Gallery is a beautiful website with graphs and demos that may provide some inspiration.



bailey-lab/coiaf documentation built on April 26, 2023, 6:32 p.m.