Description Usage Arguments Details Value Note Author(s) See Also Examples
This function reads SNP genotype data from a file in which each line
refers to a single genotype call. Replaces the earlier function
read.snps.long
.
1 2 3 4 5 |
file |
Name(s) of file(s) to be read (can be gzipped) |
samples |
Either a vector of sample identifiers, or the number of samples to be read. If a single file is to be read and this argument is omitted, the file will be scanned initially and all samples will be included |
snps |
Either a vector of SNP identifiers, or the number of SNPs to be read. If a single file is to be read and this argument is omitted, the file will be scanned initially and all SNPs will be included |
fields |
A named vector giving the locations of the required fields. See Details below |
split |
A regular expression specifying how the input line will be split into fields. The default value specifies separation of fields by a TAB character, or by one or more blanks |
gcodes |
When the genotype is read as a single field, this argument specifies how it is handled. See Details below. |
no.call |
The string which indicates "no call" for either a genotype or (when the genotype is read as two allele fields) an allele |
threshold |
A vector of length 2 giving the lower and higher acceptable limits for the confidence score |
lex.order |
If |
verbose |
If |
Each line on the input file represents a single call and is split into
fields using the function strsplit
. The required fields are
extracted according to the fields
argument. This must
contain the locations of the sample and snp identifier
fields and either the location of a genotype field or the
locations of two allele fields.
If the samples
and snps
arguments contain vectors of
character strings, a SnpMatrix
is created with these row and
column names and the genotype values are "cherry-picked" from the input
file. If either, or both, of these arguments are specified simply as
numbers, then these
numbers determine the dimensions of the SnpMatrix
created. In this case samples and/or SNPs are included in the
SnpMatrix
on a first-come-first-served basis. If either
or both of these arguments are omitted, a preliminary scan of the input file
is carried out to find the missing sample and/or SNP identifiers.
In this scan,
when a sample or SNP identifier differs from that in the previous
line, but is identical to one previously found, then all the relevant
identifiers are assumed to have been found. This implies that
the file must be sorted, in some consistent order,
by sample and by SNP (although either one of these may vary fastest).
If the genotype is to be read as a single field, the genotype
element of the fields
argument must be set to the appropriate
value, and the allele.A
and allele.B
elements should be
set to NA
. Its handling is controlled
by the gcodes
argument. If this is missing or NA
, then
the genotype is assumed to be represented by a two-character field,
the two characters representing the two alleles. If gcodes
is
a single string, then it is assumed to contain
a regular expression which will split the genotype field into two allele
fields. Otherwise, gcode
must be an array of length three,
specifying the three genotype codes in the order "AA", "AB", "BB".
If the two alleles of the genotype are to be read from two separate
fields, the genotype
element should be set to NA
and the
allele.A
and allele.B
elements set to the appropriate
values. The gcode
argument should be missing or set to NA
.
If the genotype is read as a single field matching one of three
specified codes, the function returns an object of class
SnpMatrix
. Otherwise it returns a list whose first element is the
SnpMatrix
object and whose second element is a dataframe
containing the allele codes, with the SNP identifiers as row names. Note
that allele codes only occur in this file if they occur in a genotype
which was accepted. Thus, monomorphic SNPs have allele.B
coded as
NA
, and SNPs which never pass confidence score filters have both
alleles coded as NA
.
Unlike read.snps.long
,
this function is written entirely in R and may not be particularly
fast. However, it imposes no restrictions on the allele codes
recognized.
Homozygous genotypes are assumed to be represented in the input file
by coding both alleles to the same value. No special provision is made
to read XSnpMatrix
objects; such data should first be read as a SnpMatrix
and then
coerced to an XSnpMatrix
using new
or as
.
David Clayton dc208@cam.ac.uk
SnpMatrix-class
, XSnpMatrix-class
1 2 3 | ##
## No example supplied yet
##
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.