read.snps.long.old: Read SNP input data in "long" format (old version)

Description Usage Arguments Details Value Note Author(s) References See Also

View source: R/indata.R

Description

This function reads SNP genotype data and creates an object of class "snp.matrix" or "X.snp.matrix". Input data are assumed to be arranged as one line per SNP-call (without any headers). This function can read gzipped files.

Usage

1
2
3
read.snps.long.old(file, chip.id, snp.id, codes, female,
                          conf = 1, threshold = 0.9, drop=FALSE,
                          sorted=FALSE, progress=interactive())

Arguments

file

Name of file containing the input data. Input files which have been compressed by the gzip utility are recognized

chip.id

Array of type "character" containing (unique) identifiers for the chips, samples, or subjects for which calls are to be read. Other samples in the input data will be ignored

snp.id

Array of type "character" containing (unique) identifiers of the SNPs for which data will be read. Again, further SNPs in the input data will be ignored

codes

For autosomal SNPs, an array of length 3 giving the codes for the three genotypes, in the order homozygous(AA), heterozygous(AB), homozygous(BB). For X SNPs, an additional two codes for the male genotypes (AY and BY) must be supplied. All other codes will be treated as "no call". The default codes are "0", "1", "2" [,"0", "2"]

female

If the data to be read refer to SNPs on the X chromosome, this argument must be supplied and should indicate whether each row of data refers to a female (TRUE) or to a male (FALSE). The output object will then be of class "X.snp.matrix".

conf

Confidence score. See details

drop

If TRUE, any rows or columns without genotype calls will be dropped from the output matrix. Otherwise the full matrix, with rows and columns defined by the chip.id and snp.id arguments, will be returned

threshold

Acceptance threshold for confidence score

sorted

Is input file already sorted into the correct order (see details)?

progress

If TRUE, progress will be reported to the standard output stream

Details

Data are assumed to be input with one line per call, in free format:

<chip-id> <snp-id> <code for genotype call> [<confidence>] ...

Currently, any fields following the first three (or four) are ignored. If the argument sorted is TRUE, the file is assumed to be sorted with snp-id as primary key and chip-id as secondary key using the current locale. The rows and columns of the returned matrix will also be ordered in this manner. If sorted is set to FALSE, then an algorithm which avoids this assumption is used. The rows and columns of the returned matrix will then be in the same order as the input chip_id and snp_id vectors. Calls in which both id fields match elements in the chip.id and snp.id arguments are read in, after (optionally) checking that the level of confidence achieves a given threshold. Confidence level checking is controlled by the conf argument. conf=0 indicates that no confidence score is present and no checking is done. conf>0 indicates that calls with scores above threshold are accepted, while conf<0 indicates that only calls with scores below threshold should be accepted.

The routine is case-sensitive and it is important that the <chip-id> and <snp-id> match the cases of chip.id and snp.id exactly.

Value

An object of class snp.matrix.

Note

If more than one instance of any combination of chip_id element and snp_id element passes the confidence threshold, the called to be used is decided by the following rules:

  1. 1Any call trumps "no-call"

  2. 2In the event of call conflict, "no-call" is returned

Use of sorted=TRUE is usually discouraged since the alternative algorithm is safer and, usually, not appreciably slower. However, if the input file is to be read multiple times and there is a reasonably close correspondence between cells of the matrix to be returned and lines of the input file, the sorted option can be faster.

This function has been replaced by the more flexible function read.snps.long.

Author(s)

David Clayton david.clayton@cimr.cam.ac.uk and Hin-Tak Leung

References

http://www-gene.cimr.cam.ac.uk/clayton

See Also

snp.matrix-class, X.snp.matrix-class


chopsticks documentation built on Nov. 8, 2020, 7:51 p.m.