crlmm: Genotype oligonucleotide arrays with CRLMM

Description Usage Arguments Details Value References See Also Examples

Description

This is a faster and more efficient implementation of the CRLMM algorithm, especially designed for Affymetrix SNP 5 and 6 arrays (to be soon extended to other platforms).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
crlmm(filenames, row.names=TRUE, col.names=TRUE,
      probs=c(1/3, 1/3, 1/3), DF=6, SNRMin=5,
      gender=NULL, save.it=FALSE, load.it=FALSE,
      intensityFile, mixtureSampleSize=10^5,
      eps=0.1, verbose=TRUE, cdfName, sns, recallMin=10,
      recallRegMin=1000, returnParams=FALSE, badSNP=0.7)
crlmm2(filenames, row.names=TRUE, col.names=TRUE,
      probs=c(1/3, 1/3, 1/3), DF=6, SNRMin=5,
      gender=NULL, save.it=FALSE, load.it=FALSE,
      intensityFile, mixtureSampleSize=10^5,
      eps=0.1, verbose=TRUE, cdfName, sns, recallMin=10,
      recallRegMin=1000, returnParams=FALSE, badSNP=0.7)

Arguments

filenames

'character' vector with CEL files to be genotyped.

row.names

'logical'. Use rownames - SNP names?

col.names

'logical'. Use colnames - Sample names?

probs

'numeric' vector with priors for AA, AB and BB.

DF

'integer' with number of degrees of freedom to use with t-distribution.

SNRMin

'numeric' scalar defining the minimum SNR used to filter out samples.

gender

'integer' vector, with same length as 'filenames', defining sex. (1 - male; 2 - female)

save.it

'logical'. Save preprocessed data?

load.it

'logical'. Load preprocessed data to speed up analysis?

intensityFile

'character' with filename to be saved/loaded - preprocessed data.

mixtureSampleSize

Number of SNP's to be used with the mixture model.

eps

Minimum change for mixture model.

verbose

'logical'.

cdfName

'character' defining the CDF name to use ('GenomeWideSnp5', 'GenomeWideSnp6')

sns

'character' vector with sample names to be used.

recallMin

Minimum number of samples for recalibration.

recallRegMin

Minimum number of SNP's for regression.

returnParams

'logical'. Return recalibrated parameters.

badSNP

'numeric'. Threshold to flag as bad SNP (affects batchQC)

Details

'crlmm2' allows one to genotype very large datasets (via ff package) and also permits the use of clusters or multiple cores (via snow package) to speed up genotyping.

As noted above, the call probabilities are stored using an integer representation to reduce file size using the transformation 'round(-1000*log2(1-p))', where p is the probability. The function i2P can be used to convert the integers back to the scale of probabilities.

Value

A SnpSet object.

calls

Genotype calls (1 - AA, 2 - AB, 3 - BB)

confs

Confidence scores 'round(-1000*log2(1-p))'

SNPQC

SNP Quality Scores

batchQC

Batch Quality Score

params

Recalibrated parameters

References

Carvalho B, Bengtsson H, Speed TP, Irizarry RA. Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics. 2007 Apr;8(2):485-99. Epub 2006 Dec 22. PMID: 17189563.

Carvalho BS, Louis TA, Irizarry RA. Quantifying uncertainty in genotype calls. Bioinformatics. 2010 Jan 15;26(2):242-9.

See Also

i2p, snpCall, snpCallProbability

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
## this can be slow
library(oligoClasses)
if (require(genomewidesnp6Crlmm) & require(hapmapsnp6)){
  path <- system.file("celFiles", package="hapmapsnp6")

  ## the filenames with full path...
  ## very useful when genotyping samples not in the working directory
  cels <- list.celfiles(path, full.names=TRUE)
  (crlmmOutput <- crlmm(cels))
  ## If gender is known, one should check that the assigned gender is
  ## correct, or pass the integer coding of gender as an argument to the
  ## crlmm function as done below
}

## Not run: 
## HPC Example
library(ff)
library(snow)
library(crlmm)
## genotype 50K SNPs at a time
ocProbesets(50000)
## setup cluster - 8 cores on the machine
library(doSNOW)
cl <- makeCluster(8, "SOCK")
registerDoSNOW(cl)
##setCluster(8, "SOCK")

path <- system.file("celFiles", package="hapmapsnp6")
cels <- list.celfiles(path, full.names=TRUE)
crlmmOutput <- crlmm2(cels)

## End(Not run)

benilton/crlmmOld documentation built on May 12, 2019, 10:59 a.m.