kasp_consensus: Generate consensus calls for KASP marker data

View source: R/kasp_consensus.R

kasp_consensusR Documentation

Generate consensus calls for KASP marker data

Description

Generate consensus calls for KASP marker data

Usage

kasp_consensus(
  kasp,
  geno_col,
  data_cols,
  ambiguous = "strict",
  out_format = "KASP"
)

Arguments

kasp

A dataframe containing KASP-formatted marker data. This should consist of at least one column with genotype names, and additional columns containing "X:X" and "Y:Y" for the two homozygous states, "X:Y" or "Y:X" for the heterozgous states, and "NO CALL" or "NOCALL" for missing data.

geno_col

String identifying which column in kasp contains genotype names

data_cols

Character vector identifying which columns in kasp will be retained and treated as marker data columns

ambiguous

String consisting of either "strict" or "lax". If "strict", marker calls with uncertainty (followed by a question mark) will be converted to missing data. If "lax", uncertain calls will be converted to their putative call (e.g. "X:X?" will be converted to "X:X")

out_format

String consisting of either "KASP", "VCF", or "numeric". If "KASP", output will be in same format as the input. IF "VCF", output will be encoded as "0/0" and "1/1" for the homozgous states, "0/1" for the heterozygous state, and "./." for missing data. If "numeric", output will be encoded as -1 and 1 for homozygous states, 0 for the heterozygous state, and NA for missing data.

Details

kasp_consensus generates consensus calls for KASP marker data when some lines have been genotyped multiple times. It does this using a simple algorithm in which markers are first converted to numeric format, and then subsequently summed for each genotype. Note that it cannot differentiate between randomly missing data and true null alleles. The input data must be reasonably well-formatted. The function can handle a variety of different formats (e.g. "x:x", "X:X", "NO CALL", "nocall"), as well as uncertain calls (e.g. "X:X?") but other more human-friendly data input will simply be changed to missing data, with a warning that data has been converted to NA by coercion.

Value

Data frame with consensus calling performed, encoded as specified by out_format


etnite/bwardr documentation built on Jan. 6, 2023, 7:12 a.m.