freqDatabase: Allele frequency database

freqDatabaseR Documentation

Allele frequency database

Description

Functions for reading, setting and extracting allele frequency databases, in either "list" format, "merlin" format or "allelic ladder" format.

Usage

getFreqDatabase(x, markers = NULL, format = c("list", "ladder"))

setFreqDatabase(x, database, format = c("list", "ladder"), ...)

readFreqDatabase(
  filename = NULL,
  df = NULL,
  format = c("list", "ladder", "merlin"),
  fixNames = FALSE,
  scale1 = FALSE,
  verbose = TRUE,
  ...
)

writeFreqDatabase(x, filename, markers = NULL, format = c("list", "ladder"))

Arguments

x

A ped object, or a list of such.

markers

A character vector (with marker names) or a numeric vector (with marker indices).

format

Either "list", "ladder" or "merlin" (only in readFreqDatabase()).

database

Either a list or matrix/data frame with allele frequencies, or a file path (to be passed on to readFreqDatabase()).

...

Optional arguments passed on to read.table(), e.g. sep = "\t" if the file is tab separated.

filename

The path to a text file containing allele frequencies either in "list" or "allelic ladder" format.

df

A data frame of allele frequencies in either "list" or "allelic ladder" format. This can be supplied instead of filename.

fixNames

A logical, by default FALSE. If TRUE all marker names are converted to upper case, and all periods and space characters are replaced with "_" (underscore).

scale1

A logical, by default FALSE. If TRUE, all frequency vectors are scaled to ensure that it sums to 1.

verbose

A logical.

Details

A frequency database in "list" format is a list of numeric vectors; each vector named with the allele labels, and the list itself named with the marker names.

Text files containing frequencies in "list" format should look as follows, where "M1" and "M2" are marker names, and "a1","a2",... are allele labels (which may be characters or numeric, but will always be converted to characters):

M1
a1 0.2
a2 0.5
a3 0.3

M2
a1 0.9
a2 0.1

In "merlin" format, used by the software MERLIN (Abecasis et. al, 2002), the same frequency data would be presented as follows:

M M1
A a1 0.2
A a2 0.5
A a3 0.3
M M2
A a1 0.9
A a2 0.1

A database in "allelic ladder" format is rectangular, i.e., a numeric matrix (or data frame), with allele labels as row names and markers as column names. NA entries correspond to unobserved alleles.

Value

  • getFreqDatabase: either a list (if format = "list") or a data frame (if format = "ladder").

  • readFreqDatabase: a list of named numeric vectors.

  • setFreqDatabase: a modified version of x.

See Also

setLocusAttributes(), setMarkers(), setAlleles().

Examples

loc1 = list(name = "m1", afreq = c(a = .1, b = .9))
loc2 = list(name = "m2", afreq = c("1" = .2, "10.2" = .3, "3" = .5))
x = setMarkers(singleton(1), locus = list(loc1, loc2))
db = getFreqDatabase(x)
db

y = setFreqDatabase(x, database = db)
stopifnot(identical(x, y))

# The database can also be read directly from file
tmp = tempfile()
write("m1\na 0.1\nb 0.9\n\nm2\n1 0.2\n3 0.5\n10.2 0.3", tmp)

z = setFreqDatabase(x, database = tmp)
stopifnot(all.equal(x, z))


magnusdv/pedtools documentation built on April 9, 2024, 7:35 a.m.