read.SnpSetIllumina: Read Experimental Data, Featuredata and Phenodata into an...

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/methods-SnpSetIllumina.R

Description

A SnpSetIllumina object is created from the textfiles created by the Illumina GenCall or BeadStudio software.

Usage

1
2
3
   read.SnpSetIllumina(samplesheet, manifestpath=NULL, reportpath=NULL,
     rawdatapath=NULL, reportfile=NULL, briefOPAinfo=TRUE, readTIF=FALSE, 
     nochecks=FALSE, sepreport="\t", essentialOnly=FALSE, ...)

Arguments

samplesheet

a data.frame or filename, contains the sample sheet

manifestpath

a character string for the path containing the manifests / OPA definition files, defaults to path of samplesheet

reportpath

a character string for the path containing the report files, defaults to path of samplesheet

rawdatapath

a character string for the path containing the intensity data files, defaults to path of samplesheet

reportfile

a character string for the name of BeadStudio reportfile

briefOPAinfo

logical, if TRUE then only the SNP name, Illumi code, chromosome and basepair position are put into the featureData slot of the result, else all information from the OPA file is put into the featureData slot

readTIF

logical, uses beadarray package and raw TIF files to read data

nochecks

logical, limited validity checks on beadstudio report files. See details

sepreport

character, field separator character for beadstudio report files

essentialOnly

logical, if TRUE then only the essential columns from a reportfile are included into the result. See details

...

arguments are forwarded to readIllumina and can be used to perform bead-level normalization

Details

The text files from Illumina software are imported to a SnpSetIllumina object. Both result files from GenCall and BeadStudio can be used. In both cases the sample sheets from the experiments are used to select the proper data from the report or data files. The following columns from the sample sheet file are used for this purpose: ‘Sample_Name’, ‘Sentrix_Position’, and ‘Pool_ID’. The values in columns ‘Sample_Plate’, ‘Pool_ID’, and ‘Sentrix_ID’ should be the same for all samples in the file, as this is the case for processed experiments. The contents of the sample sheet are put into the phenoData slot.

Zero values in the raw data signals are set to NA

Ideally the OPA manifest file containing SNP annotation should be available, these files are provided by Illumina. Columns ‘IllCode’, ‘CHR’, and ‘MapInfo’ are put into the featureData slot.

GenCall Data

In order to process experiments that were genotyped using the GenCall software, the arrays should be scanned with the setting <SaveTextFiles>true</SaveTextFiles> in the Illumina configuration file Settings.XML. 3 Types of files need to be present in the same folder: The sample sheet, .csv files containing signal intensity data, and the report file that contains the genotype information. For each sample in the sample sheet there should be a .csv file with the following file mask: [sam_id]_R00[yy]_C00[xx].csv, where sam_id is the Illumina ID for the SAM, and xx and yy are the column and row number respectively. From the report files the file with mask [Pool_ID]_LocusByDNA[_ExpName].csv is used. ‘Pool_ID’ is the OPA panel used, and ‘_ExpName’ is optional.

BeadStudio Data

To process experiments that were processed with BeadStudio, only two files are needed. The sample sheet and the Final Report file. The sample sheet must contain the same columns as for GenCall, the report file should contain the following columns: ‘SNP Name’, ‘Sample ID’, ‘GC Score’, ‘Allele1 - AB’, ‘Allele2 - AB’, ‘GT Score’, ‘X Raw’, and ‘Y Raw’. ‘SNP Name’ and ‘Sample ID’ are used to form rows and columns in the experimental data, ‘GC Score’ is put in the callProbability matrix, ‘Allele1 - AB’ and ‘Allele2 - AB’ are combined into the call matrix, ‘GT Score’ is added to the featureData slot, ‘X Raw’ is put in the R matrix and ‘Y Raw’ in the G matrix. Other columns in the report file are added as matrices in the assayData slot, or columns in the featureData slot if values are identical for all samples in the reportfile. When nochecks is TRUE then only the ‘SNP Name’ and ‘Sample ID’ columns are required. The resulting object is now of class MultiSet

Sample sheets

To help generate a sample sheet for BeadStudio data a Sample_Map.txt file can be converted to a sample sheet with the Sample_Map2Samplesheet function. For Beadstudio reportfiles it is also possible to set samplesheet=NULL. In this case the phenoData slot will be fabricated from the sample names in the reportfile.

Manifest/OPA/annotation files

For BeadStudio reportfiles it is not necessary to have a Manifest file if the columns ‘Chr’ and ‘Position’ are available in the report file. Currently this is the only way to import data from Infinium arrays, because Illumina does not supply Manifest files for these arrays.

Value

This function returns an SnpSetIllumina object, or a MultiSet object when nochecks is TRUE.

Author(s)

Jan Oosting

See Also

SnpSetIllumina-class, Sample_Map2Samplesheet, readIllumina

Examples

1
2
3
# read a SnpSetIllumina object using example textfiles in data directory
datadir <- system.file("testdata", package="beadarraySNP")
SNPdata <- read.SnpSetIllumina(paste(datadir,"4samples_opa4.csv",sep="/"),datadir)

Example output

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: quantsmooth
Loading required package: quantreg
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

Loading required package: grid

beadarraySNP documentation built on Nov. 8, 2020, 7:21 p.m.