tcf2long: Convert Two-Column Genetic Data to Long Format

Description Usage Arguments Value Examples

View source: R/data_conversion.R

Description

Takes a data frame consisting of metadata followed by paired columns of genetic data, with each column in the pair representing a gene copy at a locus. Returns a list of two data frames: one with genetic data condensed into one column, and the other with two-column structure intact, but with cleaned allele names.

Usage

1
tcf2long(D, gen_start_col)

Arguments

D

A data frame containing two-column genetic data, optionally preceded by metadata. The header of the first genetic data column in each pair lists the locus name, the second is ignored. Locus names must not have spaces in them!

gen_start_col

The index (number) of the column in which genetic data starts. Columns must be only genetic data after genetic data starts.

Value

tcf2long returns a list of two data frames: in the first, "long", the rightmost column is the genetic data. Two new columns, "locus" and "gene copy", duplicate the original column name provided in the first of each pair, and designate copies "a" and "b", respectively. Metadata is duplicated as necessary for each locus. The second, "clean_short", replicates the input dataset, but with column names replaced by "(locus name) a" and "(locus name) b" in each pair. In other words the locus name has an "a" or a "b" added to it after a space.

Examples

1
2
3
4
5
6
7
## Convert the alewife dataset for further processing
# the data frame passed into this function must have had
# character collections and repunits converted to factors
reference <- alewife
reference$repunit <- factor(reference$repunit, levels = unique(reference$repunit))
reference$collection <- factor(reference$collection, levels = unique(reference$collection))
ale_long <- tcf2long(reference, 17)

rubias documentation built on Feb. 10, 2022, 1:06 a.m.