tcf2param_list: Generate MCMC parameter list from two-column genetic data &...

View source: R/data_conversion.R

tcf2param_listR Documentation

Generate MCMC parameter list from two-column genetic data & print summary

Description

This function is a wrapper for all steps to create the parameter list necessary for genotype log-likelihood calculation from the starting two-column genetic data

Usage

tcf2param_list(
  D,
  gen_start_col,
  samp_type = "both",
  alle_freq_prior = list(const_scaled = 1),
  summ = T,
  ploidies
)

Arguments

D

A data frame containing two-column genetic data, preceded by metadata. The header of the first genetic data column in each pair lists the locus name, the second is ignored. Locus names must not have spaces in them! Required metadata includes a column of unique individual identifiers named "indiv", a column named "collection" designating the sample groups, a column "repunit" designating the reporting unit of origin of each fish, and a "sample_type" column denoting each individual as a "reference" or "mixture" sample. No NAs should be present in metadata

gen_start_col

The index (number) of the column in which genetic data starts. Columns must be only genetic data after genetic data starts.

samp_type

the sample groups to be include in the individual genotype list, whose likelihoods will be used in MCMC. Options "reference", "mixture", and "both"

alle_freq_prior

a one-element named list specifying the prior to be used when generating Dirichlet parameters for genotype likelihood calculations. Valid methods include "const", "scaled_const", and "empirical". See ?list_diploid_params for method details.

summ

logical indicating whether summary descriptions of the formatted data be provided

ploidies

a named vector of ploidies (1 or 2) for each locus. The names must the the locus names.

Details

In order for all steps in conversion to be carried out successfully, the dataset must have "repunit", "collection", "indiv", and "sample_type" columns preceding two-column genetic data. If summ == TRUE, the function prints summary statistics describing the structure of the dataset, as well as the presence of missing data, enabling verification of proper data conversion.

Value

tcf2param_list returns the output of list_diploid_params, after the original dataset is converted to a usable format and all relevant values are extracted. See ?list_diploid_params for details

Examples

# after adding support for haploid markers we need to pass
# in the ploidies vector.  These markers are all diploid...
locnames <- names(alewife)[-(1:16)][c(TRUE, FALSE)]
ploidies <- rep(2, length(locnames))
names(ploidies) <- locnames
ale_par_list <- tcf2param_list(alewife, 17, ploidies = ploidies)


benmoran11/rubias documentation built on Feb. 1, 2024, 10:52 p.m.