Description Usage Arguments Details Value References
Read genotype calls and hybridization from Illumina BeadStudio output.
1 2 |
prefix |
filename prefix, without working directory: the |
snps |
dataframe containing marker map for this array, in PLINK's |
in.path |
directory in which to search for input files |
keep.intensity |
should hybridization intensities be kept in addition to genotype calls? |
colmap |
named character vector mapping column names in |
verify |
logical; if |
checksum |
logical; if |
... |
ignored |
This function initializes a genotypes
object from Illumina BeadStudio output. (For an
example of the format, see the files in this package's data/
directory.) The two relevant
files are Sample_Map.zip
and *FinalReport.zip
, which contain the sample manifest
and genotype/intensity data, respectively. On platforms with unzip
available on the
command line, files will be unzipped on the fly. Otherwise FinalReport.zip
(but not
Sample_Map.zip
) must be unzipped first. This is due to the use of data.table
to
handle the usually very large genotypes file.
Use the colmap
vector to assign column names in the *FinalReport
file to the required
columns for argyle. The required columns are iid
(individual ID), marker
(SNP/marker name),
call1
(allele 1, in the same strand as in the marker map), call2
(allele 2, in the
same strand as in the marker map), x
(hybridization x-intensity) and y
(hybridization
y-intensity). The default column mapping is:
SNP Name
= marker
Sample ID
= iid
Allele1 - Forward
= call1
Allele2 - Forward
= call2
X
= x
Y
= y
Note that colmap
must be a named character vector, with old column headers in the names()
and new column names in the vector itself: eg. write colmap = setNames( new, old )
. An error
will be thrown if the column mapping does not provide enough information to read the input properly.
Particular attention should be paid to the encoding of the alleles in the snps
object, which
will be platform-specific. For users of the Mouse Universal Genotyping Array series from Neogen Inc,
alleles A1,A2
in snps
will be on the forward strand, so columns Allele * - Forward
(not Allele * - Top
or Allele * - AB
) are the ones to use.
The behavior of this function with respect to missing data in the genotypes versus the contents
of snps
is asymmetric. Markers in snps
which are absent in the input files will
be present in the output, but with missing calls and intensities. Markers in the input files
which are missing from snps
will simply be dropped. If that occurs, check that the marker
names in snps
match exactly those in the input file.
Provenance of the resulting object can be traced by checking attr(,"source")
. For the paranoid,
a timestamp and checksum are provided in attr(,"timestamp")
and attr(,"md5")
.
A genotypes
object with genotype calls, marker map, sample metadata and (as requested)
intensity data.
Inspiration from Dan Gatti's DOQTL package: <https://github.com/dmgatti/DOQTL/blob/master/R/extract.raw.data.R>
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.