list_diploid_params: Collect essential data values before mixture proportion...

View source: R/data_conversion.R

list_diploid_paramsR Documentation

Collect essential data values before mixture proportion estimation

Description

Takes all relevant information created in previous steps of data conversion pipeline, and combines into a single list which serves as input for further calculations

Usage

list_diploid_params(
  AC_list,
  I_list,
  PO,
  coll_N,
  RU_vec,
  RU_starts,
  alle_freq_prior = list(const_scaled = 1)
)

Arguments

AC_list

a list of allele count matrices; output from a_freq_list

I_list

a list of genotype vectors; output from allelic_list

PO

a vector of collection (population of origin) indices for every individual in the sample, in order identical to I_list

coll_N

a vector of the total number of individuals in each collection, in order of appearance in the dataset

RU_vec

a vector of collection indices, sorted by reporting unit

RU_starts

a vector of indices, designating the first collection for each reporting unit in RU_vec

alle_freq_prior

a one-element named list specifying the prior to be used when generating Dirichlet parameters for genotype likelihood calculations. The name of the list item determines the type of prior used, with options "const", "scaled_const", and "empirical". If "const", the listed number will be taken as a constant added to the count for each allele, locus, and collection. If "scaled_const", the listed number will be divided by the number of alleles at a locus, then added to the allele counts. If "empirical", the listed number will be multiplied by the relative frequency of each allele across all populations, then added to the allele counts.

Details

Genotypes represented in I_list are converted into a single long vector, ordered by locus, individual, and gene copy, with NA values represented as 0s. Similarly, AC_list is unlisted to AC, ordered by locus, collection, and allele. DP is a list of Dirichlet priors for likelihood calculations, created by adding the values calculated from alle_freq_prior to each allele sum_AC and sum_DP are the summed allele values for each locus of their parent vectors, ordered by locus and collection.

Value

list_diploid_params returns a list of the information necessary for the calculation of genotype likelihoods in MCMC:

L, N, and C represent the number of loci, individual genotypes, and collections, respectively. A is a vector of the number of alleles at each locus, and CA is the cumulative sum of A. coll, coll_N, RU_vec, and RU_starts are copied directly from input.

I, AC, sum_AC, DP, and sum_DP are vectorized versions of data previously represented as lists and matrices; indexing macros use L, N, C, A, and CA to access these vectors in later Rcpp-based calculations.

Examples

example(allelic_list)
PO <- as.integer(factor(ale_long$clean_short$collection))
coll_N <- as.vector(table(PO))

Colls_by_RU <- dplyr::count(ale_long$clean_short, repunit, collection) %>%
   dplyr::filter(n > 0) %>%
   dplyr::select(-n)
 PC <- rep(0, length(unique((Colls_by_RU$repunit))))
 for(i in 1:nrow(Colls_by_RU)) {
   PC[Colls_by_RU$repunit[i]] <- PC[Colls_by_RU$repunit[i]] + 1
 }
RU_starts <- c(0, cumsum(PC))
RU_vec <- as.integer(Colls_by_RU$collection)
param_list <- list_diploid_params(ale_ac, ale_alle_list, PO, coll_N, RU_vec, RU_starts)


benmoran11/rubias documentation built on Feb. 1, 2024, 10:52 p.m.