read.snps.long.old: Read SNP input data in "long" format (old version)

Description Usage Arguments Details Value Note Author(s) References See Also

View source: R/indata.R


This function reads SNP genotype data and creates an object of class "snp.matrix" or "X.snp.matrix". Input data are assumed to be arranged as one line per SNP-call (without any headers). This function can read gzipped files.


read.snps.long.old(file,,, codes, female,
                          conf = 1, threshold = 0.9, drop=FALSE,
                          sorted=FALSE, progress=interactive())



Name of file containing the input data. Input files which have been compressed by the gzip utility are recognized

Array of type "character" containing (unique) identifiers for the chips, samples, or subjects for which calls are to be read. Other samples in the input data will be ignored

Array of type "character" containing (unique) identifiers of the SNPs for which data will be read. Again, further SNPs in the input data will be ignored


For autosomal SNPs, an array of length 3 giving the codes for the three genotypes, in the order homozygous(AA), heterozygous(AB), homozygous(BB). For X SNPs, an additional two codes for the male genotypes (AY and BY) must be supplied. All other codes will be treated as "no call". The default codes are "0", "1", "2" [,"0", "2"]


If the data to be read refer to SNPs on the X chromosome, this argument must be supplied and should indicate whether each row of data refers to a female (TRUE) or to a male (FALSE). The output object will then be of class "X.snp.matrix".


Confidence score. See details


If TRUE, any rows or columns without genotype calls will be dropped from the output matrix. Otherwise the full matrix, with rows and columns defined by the and arguments, will be returned


Acceptance threshold for confidence score


Is input file already sorted into the correct order (see details)?


If TRUE, progress will be reported to the standard output stream


Data are assumed to be input with one line per call, in free format:

<chip-id> <snp-id> <code for genotype call> [<confidence>] ...

Currently, any fields following the first three (or four) are ignored. If the argument sorted is TRUE, the file is assumed to be sorted with snp-id as primary key and chip-id as secondary key using the current locale. The rows and columns of the returned matrix will also be ordered in this manner. If sorted is set to FALSE, then an algorithm which avoids this assumption is used. The rows and columns of the returned matrix will then be in the same order as the input chip_id and snp_id vectors. Calls in which both id fields match elements in the and arguments are read in, after (optionally) checking that the level of confidence achieves a given threshold. Confidence level checking is controlled by the conf argument. conf=0 indicates that no confidence score is present and no checking is done. conf>0 indicates that calls with scores above threshold are accepted, while conf<0 indicates that only calls with scores below threshold should be accepted.

The routine is case-sensitive and it is important that the <chip-id> and <snp-id> match the cases of and exactly.


An object of class snp.matrix.


If more than one instance of any combination of chip_id element and snp_id element passes the confidence threshold, the called to be used is decided by the following rules:

  1. 1Any call trumps "no-call"

  2. 2In the event of call conflict, "no-call" is returned

Use of sorted=TRUE is usually discouraged since the alternative algorithm is safer and, usually, not appreciably slower. However, if the input file is to be read multiple times and there is a reasonably close correspondence between cells of the matrix to be returned and lines of the input file, the sorted option can be faster.

This function has been replaced by the more flexible function read.snps.long.


David Clayton and Hin-Tak Leung


See Also

snp.matrix-class, X.snp.matrix-class

chopsticks documentation built on April 29, 2020, 5:24 a.m.