Description Usage Arguments Details Value References
Read genotype calls and hybridization from Illumina BeadStudio output.
1 2 |
prefix |
filename prefix, without working directory: the |
snps |
dataframe containing marker map for this array, in PLINK's |
in.path |
directory in which to search for input files |
keep.intensity |
should hybridization intensities be kept in addition to genotype calls? |
colmap |
named character vector mapping column names in |
verify |
logical; if |
checksum |
logical; if |
... |
ignored |
This function initializes a genotypes object from Illumina BeadStudio output. (For an
example of the format, see the files in this package's data/ directory.) The two relevant
files are Sample_Map.zip and *FinalReport.zip, which contain the sample manifest
and genotype/intensity data, respectively. On platforms with unzip available on the
command line, files will be unzipped on the fly. Otherwise FinalReport.zip (but not
Sample_Map.zip) must be unzipped first. This is due to the use of data.table to
handle the usually very large genotypes file.
Use the colmap vector to assign column names in the *FinalReport file to the required
columns for argyle. The required columns are iid (individual ID), marker (SNP/marker name),
call1 (allele 1, in the same strand as in the marker map), call2 (allele 2, in the
same strand as in the marker map), x (hybridization x-intensity) and y (hybridization
y-intensity). The default column mapping is:
SNP Name = marker
Sample ID = iid
Allele1 - Forward = call1
Allele2 - Forward = call2
X = x
Y = y
Note that colmap must be a named character vector, with old column headers in the names()
and new column names in the vector itself: eg. write colmap = setNames( new, old ). An error
will be thrown if the column mapping does not provide enough information to read the input properly.
Particular attention should be paid to the encoding of the alleles in the snps object, which
will be platform-specific. For users of the Mouse Universal Genotyping Array series from Neogen Inc,
alleles A1,A2 in snps will be on the forward strand, so columns Allele * - Forward
(not Allele * - Top or Allele * - AB) are the ones to use.
The behavior of this function with respect to missing data in the genotypes versus the contents
of snps is asymmetric. Markers in snps which are absent in the input files will
be present in the output, but with missing calls and intensities. Markers in the input files
which are missing from snps will simply be dropped. If that occurs, check that the marker
names in snps match exactly those in the input file.
Provenance of the resulting object can be traced by checking attr(,"source"). For the paranoid,
a timestamp and checksum are provided in attr(,"timestamp") and attr(,"md5").
A genotypes object with genotype calls, marker map, sample metadata and (as requested)
intensity data.
Inspiration from Dan Gatti's DOQTL package: <https://github.com/dmgatti/DOQTL/blob/master/R/extract.raw.data.R>
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.