read.beadstudio: Read genotype calls and hybridization from Illumina...

Description Usage Arguments Details Value References

Description

Read genotype calls and hybridization from Illumina BeadStudio output.

Usage

1
2
read.beadstudio(prefix, snps, in.path = ".", keep.intensity = TRUE,
  colmap = NULL, verify = TRUE, checksum = TRUE, ...)

Arguments

prefix

filename prefix, without working directory: the * in *_FinalReport.zip

snps

dataframe containing marker map for this array, in PLINK's *.bim format (chromosome, marker name, cM position, bp position); rownames should be set to marker names, and those names should match those in the BeadStudio output.

in.path

directory in which to search for input files

keep.intensity

should hybridization intensities be kept in addition to genotype calls?

colmap

named character vector mapping column names in *FinalReport to required columns for argyle (see Details)

verify

logical; if TRUE, check that FinalReport file is of expected size

checksum

logical; if TRUE, generate an md5 checksum for the result

...

ignored

Details

This function initializes a genotypes object from Illumina BeadStudio output. (For an example of the format, see the files in this package's data/ directory.) The two relevant files are Sample_Map.zip and *FinalReport.zip, which contain the sample manifest and genotype/intensity data, respectively. On platforms with unzip available on the command line, files will be unzipped on the fly. Otherwise FinalReport.zip (but not Sample_Map.zip) must be unzipped first. This is due to the use of data.table to handle the usually very large genotypes file.

Use the colmap vector to assign column names in the *FinalReport file to the required columns for argyle. The required columns are iid (individual ID), marker (SNP/marker name), call1 (allele 1, in the same strand as in the marker map), call2 (allele 2, in the same strand as in the marker map), x (hybridization x-intensity) and y (hybridization y-intensity). The default column mapping is:

Note that colmap must be a named character vector, with old column headers in the names() and new column names in the vector itself: eg. write colmap = setNames( new, old ). An error will be thrown if the column mapping does not provide enough information to read the input properly. Particular attention should be paid to the encoding of the alleles in the snps object, which will be platform-specific. For users of the Mouse Universal Genotyping Array series from Neogen Inc, alleles A1,A2 in snps will be on the forward strand, so columns Allele * - Forward (not Allele * - Top or Allele * - AB) are the ones to use.

The behavior of this function with respect to missing data in the genotypes versus the contents of snps is asymmetric. Markers in snps which are absent in the input files will be present in the output, but with missing calls and intensities. Markers in the input files which are missing from snps will simply be dropped. If that occurs, check that the marker names in snps match exactly those in the input file.

Provenance of the resulting object can be traced by checking attr(,"source"). For the paranoid, a timestamp and checksum are provided in attr(,"timestamp") and attr(,"md5").

Value

A genotypes object with genotype calls, marker map, sample metadata and (as requested) intensity data.

References

Inspiration from Dan Gatti's DOQTL package: <https://github.com/dmgatti/DOQTL/blob/master/R/extract.raw.data.R>


andrewparkermorgan/argyle documentation built on May 10, 2019, 11:08 a.m.