read.pedfile: Read a pedfile as '"SnpMatrix"' object

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/pedfile.R

Description

Reads diallelic data in linkage "pedfile" format, with one line of data per sample (subject) containing six mandatory fields followed by pairs of fields, one pair for each locus, giving the two alleles observed.

Usage

1
read.pedfile(file, n, snps, which, split = "\t| +", sep = ".", na.strings = "0", lex.order = FALSE)

Arguments

file

The input pedfile. This may be (but need not be) gzipped

n

(Optional) The number of lines of data to be read. If not supplied the pedfile is read once and rewound to determine how many lines it contains

snps

(Optional) Either a character vector giving the names of the loci, or a single character variable giving the name of a locus information file from which these can be read. This file is assumed to be white-space delimited with one line per locus and no header line. If this argument is not supplied, locus names are generated as a numerical sequence, prefixed by locus and a separator character

which

(Optional) If locus names are to be read from a file, this argument should specify which column contains the names. If not supplied, the first column giving unique locus names is used

split

A "regexp" specifying how the input pedfile will be split into fields. The default value specifies either a TAB character or one or more spaces

sep

The separator character used in constructing row and column names of the output SnpMatrix object

na.strings

One or more strings to be set to NA. Any field taking one of these values will be set to NA

lex.order

If TRUE, then alleles will be allocated to internal 1 and 2 values in lexographic order. Otherwise they are converted in the order in which they are encountered when reading the file (the default setting)

Details

Row names for the output SnpMatrix object and for the accompanying subject description dataframe are taken as the pedigree identifiers, when these provide the required unique identifiers. When these are duplicated, an attempt is made to use the pedigree-member identifiers instead but, when these too are duplicated, row names are obtained by concatenating, with a separator character, the pedigree and pedigree-member identifiers.

Value

A list, comprising

genotypes

The output genotype data as an object of class "SnpMatrix". If either the pedigree or pedigree-member identifiers in the ped file are not duplicated, these are used for the row names of the output object. Otherwise these two fields are concatenated, separated by sep

fam

A dataframe containing the first six fields in the pedfile. The row names will correspond with those of the SnpMatrix

map

A dataframe giving the alleles at each locus. If locus names were obtained from a dataframe read from an existing file, then the allele information is simply appended to this frame. Otherwise a new dataframe is created. The row names will correspond with the column names of the SnpMatrix

Note

This function is written entirely in R and may not be particularly fast. However, it imposes no restrictions on the allele codes recognized.

Homozygous genotypes may be represented in the input file either (a) by coding both alleles to the same value, or (b) setting the second allele to "missing" (as specified by the missing.allele argument). No special provision is made to read XSnpMatrix objects; such data should first be read as a SnpMatrix and then coerced to an XSnpMatrix using new or as.

Author(s)

David Clayton dc208@cam.ac.uk

See Also

SnpMatrix-class, XSnpMatrix-class

Examples

1
2
3
##
## No example supplied yet
##

Example output

Loading required package: survival
Loading required package: Matrix

snpStats documentation built on Nov. 8, 2020, 10:59 p.m.