Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/crlmm-illumina.R
Preprocessing and genotyping of Illumina Infinium II arrays.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | genotype.Illumina(sampleSheet=NULL, arrayNames=NULL, ids=NULL, path=".",
arrayInfoColNames=list(barcode="SentrixBarcode_A", position="SentrixPosition_A"),
highDensity=FALSE, sep="_", fileExt=list(green="Grn.idat", red="Red.idat"), XY=NULL, anno, genome,
call.method="crlmm", trueCalls=NULL, cdfName, copynumber=TRUE, batch=NULL, saveDate=FALSE, stripNorm=TRUE,
useTarget=TRUE, quantile.method="between", nopackage.norm="quantile", mixtureSampleSize=10^5, fitMixture=TRUE,
eps=0.1, verbose = TRUE, seed = 1, sns, probs = rep(1/3, 3), DF = 6, SNRMin = 5,
recallMin = 10, recallRegMin = 1000, gender = NULL, returnParams = TRUE, badSNP = 0.7)
crlmmIllumina(sampleSheet=NULL, arrayNames=NULL, ids=NULL, path=".",
arrayInfoColNames=list(barcode="SentrixBarcode_A", position="SentrixPosition_A"),
highDensity=FALSE, sep="_", fileExt=list(green="Grn.idat", red="Red.idat"), XY=NULL, anno, genome,
call.method="crlmm", trueCalls=NULL, cdfName, copynumber=TRUE, batch=NULL, saveDate=FALSE, stripNorm=TRUE,
useTarget=TRUE, quantile.method="between", nopackage.norm="quantile", mixtureSampleSize=10^5, fitMixture=TRUE,
eps=0.1, verbose = TRUE, seed = 1, sns, probs = rep(1/3, 3), DF = 6, SNRMin = 5,
recallMin = 10, recallRegMin = 1000, gender = NULL, returnParams = TRUE, badSNP = 0.7)
|
sampleSheet |
|
arrayNames |
character vector containing names of arrays to be
read in. If |
ids |
vector containing ids of probes to be read in. If
|
path |
character string specifying the location of files to be read by the function |
arrayInfoColNames |
(used when |
highDensity |
logical (used when |
sep |
character string specifying separator used in .idat file names. |
fileExt |
list containing elements 'Green' and 'Red' which specify the .idat file extension for the Cy3 and Cy5 channels. |
XY |
|
anno |
data.frame containing SNP annotation information from
manifest and additional columns 'isSnp', 'position', 'chromosome'
and 'featureNames'. For use when |
genome |
character string specifying which genome is used in annotation |
call.method |
character string specifying the genotype calling algorithm to use ('crlmm' or 'krlmm'). |
trueCalls |
matrix specifying known Genotype calls(can contain some NAs) for a subset of samples and features (1 - AA, 2 - AB, 3 - BB). |
cdfName |
annotation package (see also |
copynumber |
'logical.' Whether to store copy number intensities with SNP output. |
batch |
character vector indicating the batch variable. Must be the same length as the number of samples. See details. |
saveDate |
'logical'. Should the dates from each .idat be saved with sample information? |
stripNorm |
'logical'. Should the data be strip-level normalized? |
useTarget |
'logical' (only used when |
quantile.method |
character string specifying the quantile normalization method to use ('within' or 'between' channels). |
nopackage.norm |
character string specifying normalization to be used when |
mixtureSampleSize |
Sample size to be use when fitting the mixture model. |
fitMixture |
'logical.' Whether to fit per-array mixture model. |
eps |
Stop criteria. |
verbose |
'logical.' Whether to print descriptive messages during processing. |
seed |
Seed to be used when sampling. Useful for reproducibility |
sns |
The sample identifiers. If missing, the default sample names are |
probs |
'numeric' vector with priors for AA, AB and BB. |
DF |
'integer' with number of degrees of freedom to use with t-distribution. |
SNRMin |
'numeric' scalar defining the minimum SNR used to filter out samples. |
recallMin |
Minimum number of samples for recalibration. |
recallRegMin |
Minimum number of SNP's for regression. |
gender |
integer vector ( male = 1, female = 2 ) or missing, with same length as filenames. If missing, the gender is predicted. |
returnParams |
'logical'. Return recalibrated parameters from crlmm. |
badSNP |
'numeric'. Threshold to flag as bad SNP (affects batchQC) |
genotype.Illumina
(or equivalently crlmmIllumina
)
is a wrapper of the crlmm
function for genotyping.
Differences include (1) that the copy number probes (if present)
are also quantile-normalized and (2) the class of object returned
by this function, CNSet
, is needed for subsequent copy number
estimation. Note that the batch variable (a character string) has
no effect on the normalization or genotyping steps. Rather, batch
is required in order to initialize a CNSet
container with the
appropriate dimensions.
The new 'krlmm' option is available for certain chip types. Optional
argument trueCalls
matrix contains known Genotype calls
(1 - AA, 2 - AB, 3 - BB) for a subset of samples and features. This
will used to compute KRLMM coefficients by calling vglm
function
from VGAM
package.
The 'krlmm' method makes use of functions provided in parallel
package to speed up the process. It by default initialises up to
8 clusters. This is configurable by setting up an option named
"krlmm.cores", e.g. options("krlmm.cores" = 16).
In general, a chip specific annotation package is required to use the
genotype.Illumina
function. If this is not available (newer chip
types or custom chips often don't have a chip-specific package available
on Bioconductor), consider using cdfName
='nopackage' and specifying
anno
and genome
, which runs 'krlmm' on the samples available.
Here anno
is a data.frame read in from the relevant chip-specific
manifest, which must have additional columns 'isSnp' which is a logical that
indicates whether a probe is polymorphic or not, 'position', 'chromosome' and
'featureNames' that give the location on the chromosome and SNP name.
A SnpSuperSet
instance.
Matt Ritchie, Cynthia Liu, Zhiyin Dai
Ritchie ME, Carvalho BS, Hetrick KN, Tavar\'e S, Irizarry RA. R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips. Bioinformatics. 2009 Oct 1;25(19):2621-3.
Liu R, Dai Z, Yeager M, Irizarry RA1, Ritchie ME. KRLMM: an adaptive genotype calling method for common and low frequency variants. BMC Bioinformatics. 2014 May 23;15:158.
Carvalho B, Bengtsson H, Speed TP, Irizarry RA. Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics. 2007 Apr;8(2):485-99. Epub 2006 Dec 22. PMID: 17189563.
Carvalho BS, Louis TA, Irizarry RA. Quantifying uncertainty in genotype calls. Bioinformatics. 2010 Jan 15;26(2):242-9.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | ## Not run:
# example for 'crlmm' option
library(ff)
library(crlmm)
## to enable paralellization, set to TRUE
if(FALSE){
library(snow)
library(doSNOW)
## with 10 workers
cl <- makeCluster(10, type="SOCK")
registerDoSNOW(cl)
}
## path to idat files
datadir <- "/thumper/ctsa/snpmicroarray/illumina/IDATS/370k"
## read in your samplesheet
samplesheet = read.csv(file.path(datadir, "HumanHap370Duo_Sample_Map.csv"), header=TRUE, as.is=TRUE)
samplesheet <- samplesheet[-c(28:46,61:75,78:79), ]
arrayNames <- file.path(datadir, unique(samplesheet[, "SentrixPosition"]))
arrayInfo <- list(barcode=NULL, position="SentrixPosition")
cnSet <- genotype.Illumina(sampleSheet=samplesheet,
arrayNames=arrayNames,
arrayInfoColNames=arrayInfo,
cdfName="human370v1c",
batch=rep("1", nrow(samplesheet)))
## End(Not run)
## Not run:
# example for 'krlmm' option
library(crlmm)
library(ff)
# line below is an optional step for krlmm to initialise 16 workers
# options("krlmm.cores" = 16)
# read in raw X and Y intensities output by GenomeStudio's GenCall genotyping module
XY = readGenCallOutput(c("HumanOmni2-5_4v1_FinalReport_83TUSCAN.csv","HumanOmni2-5_4v1_FinalReport_88CHB-JPT.csv"),
cdfName="humanomni25quadv1b",
verbose=TRUE)
krlmmResult = genotype.Illumina(XY=XY,
cdfName=ThiscdfName,
call.method="krlmm",
verbose=TRUE)
# example for 'krlmm' option with known genotype call for some SNPs and samples
library(VGAM)
hapmapCalls = load("hapmapCalls.rda")
# hapmapCalls should have rownames and colnames corresponding to XY featureNames and sampleNames
krlmmResult = genotype.Illumina(XY=XY,
cdfName=ThiscdfName,
call.method="krlmm",
trueCalls=hapmapCalls,
verbose=TRUE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.