genotype: Preprocessing and genotyping of Affymetrix arrays.

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/cnrma-functions.R

Description

Preprocessing and genotyping of Affymetrix arrays.

Usage

1
2
3
4
genotype(filenames, cdfName, batch, mixtureSampleSize = 10^5, eps =0.1,
         verbose = TRUE, seed = 1, sns, probs = rep(1/3, 3),
         DF = 6, SNRMin = 5, recallMin = 10, recallRegMin = 1000,
         gender = NULL, returnParams = TRUE, badSNP = 0.7,  genome=c("hg19", "hg18"))

Arguments

filenames

complete path to CEL files

cdfName

annotation package (see also validCdfNames)

batch

vector of class character denoting the batch for each sample in filenames. The batch vector must be the same length as the number of samples. See details.

mixtureSampleSize

Sample size to be use when fitting the mixture model.

eps

Stop criteria.

verbose

Logical. Whether to print descriptive messages during processing.

seed

Seed to be used when sampling. Useful for reproducibility

sns

The sample identifiers. If missing, the default sample names are basename(filenames)

probs

'numeric' vector with priors for AA, AB and BB.

DF

'integer' with number of degrees of freedom to use with t-distribution.

SNRMin

'numeric' scalar defining the minimum SNR used to filter out samples.

recallMin

Minimum number of samples for recalibration.

recallRegMin

Minimum number of SNP's for regression.

gender

integer vector ( male = 1, female =2 ) or missing, with same length as filenames. If missing, the gender is predicted.

returnParams

'logical'. Return recalibrated parameters from crlmm.

badSNP

'numeric'. Threshold to flag as bad SNP (affects batchQC)

genome

character string indicating the UCSC genome build for the SNP annotation

Details

For large datasets it is important to utilize the large data support by installing and loading the ff package before calling the genotype function. In previous versions of the crlmm package, we useed different functions for genotyping depending on whether the ff package is loaded, namely genotype and genotype2. The genotype function now handles both instances.

genotype is essentially a wrapper of the crlmm function for genotyping. Differences include (1) that the copy number probes (if present) are also quantile-normalized and (2) the class of object returned by this function, CNSet, is needed for subsequent copy number estimation. Note that the batch variable that must be passed to this function has no effect on the normalization or genotyping steps. Rather, batch is required in order to initialize a CNSet container with the appropriate dimensions and is used directly when estimating copy number.

Value

A SnpSuperSet instance.

Note

For large datasets, load the 'ff' package prior to genotyping – this will greatly reduce the RAM required for big jobs. See ldPath and ocSamples.

Author(s)

R. Scharpf

References

Carvalho B, Bengtsson H, Speed TP, Irizarry RA. Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics. 2007 Apr;8(2):485-99. Epub 2006 Dec 22. PMID: 17189563.

Carvalho BS, Louis TA, Irizarry RA. Quantifying uncertainty in genotype calls. Bioinformatics. 2010 Jan 15;26(2):242-9.

See Also

snprma, crlmm, ocSamples, ldOpts, batch, crlmmCopynumber

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
if (require(ff) & require(genomewidesnp6Crlmm) & require(hapmapsnp6)){
  ldPath(tempdir())
  path <- system.file("celFiles", package="hapmapsnp6")
  ## the filenames with full path...
  ## very useful when genotyping samples not in the working directory
  cels <- list.celfiles(path, full.names=TRUE)
  ## Note: one would need at least 10 CEL files for copy number estimation
  ## To use less RAM, specify a smaller argument to ocProbesets
  ocProbesets(50e3)
  batch <- rep("A", length(cels))
  (cnSet <- genotype(cels, cdfName="genomewidesnp6", batch=batch))

##Segment faults that occur with the above step can often be traced to a
##corrupt cel file. To check if any of the files are corrupt, try
##reading the files in one at a time:

## Not run: 
require(affyio)
validCEL(cels)

## End(Not run)

  ## when gender is not specified (as in the above example), crlmm tries
  ## to predict the gender from SNPs on chromosome X
  cnSet$gender

  ## If gender is known, one should check that the assigned gender is
  ## correct. Alternatively, one can pass gender as an argument to the
  ## genotype function.
  gender <- c("female", "female", "male")
  gender[gender == "female"] <- 2
  gender[gender == "male"] <- 1
  dim(cnSet)
  table(isSnp(cnSet))
}

crlmm documentation built on Nov. 8, 2020, 4:55 p.m.